Genome project
View article: NCBI prokaryotic genome annotation pipeline
NCBI prokaryotic genome annotation pipeline Open
Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our kn…
View article
RepeatModeler2 for automated genomic discovery of transposable element families Open
The accelerating pace of genome sequencing throughout the tree of life is driving the need for improved unsupervised annotation of genome components such as transposable elements (TEs). Because the types and sequences of TEs are highly var…
View article
GENCODE reference annotation for the human and mouse genomes Open
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE…
View article
The complete sequence of a human genome Open
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomer…
View article
Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper Open
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available throug…
View article
PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements Open
This FAIRsharing record describes: The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System is a unique resource that classifies genes by their functions, using published scientific experimental evidence and …
View article
GENCODE 2021 Open
The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make …
View article
The Ensembl gene annotation system Open
The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE ge…
View article
Ensembl 2020 Open
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotati…
View article
DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication Open
Summary We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7000 jobs have been pr…
View article
IRscope: an online program to visualize the junction sites of chloroplast genomes Open
Motivation Genome plotting is performed using a wide range of visualizations tools each with emphasis on a different informative dimension of the genome. These tools can provide a deeper insight into the genomic structure of the organism. …
View article
Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification Open
Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines h…
View article
RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation Open
The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Geno…
View article
RefSeq: an update on prokaryotic genome annotation and curation Open
The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contaminat…
View article
COG database update: focus on microbial diversity, model organisms, and widespread pathogens Open
The Clusters of Orthologous Genes (COG) database, also referred to as the Clusters of Orthologous Groups of proteins, was created in 1997 and went through several rounds of updates, most recently, in 2014. The current update, available at …
View article
Improved annotation of protein-coding genes boundaries in metazoan mitochondrial genomes Open
With the rapid increase of sequenced metazoan mitochondrial genomes, a detailed manual annotation is becoming more and more infeasible. While it is easy to identify the approximate location of protein-coding genes within mitogenomes, the p…
View article
The UCSC Genome Browser database: 2021 update Open
For more than two decades, the UCSC Genome Browser database (https://genome.ucsc.edu) has provided high-quality genomics data visualization and genome annotations to the research community. As the field of genomics grows and more data beco…
View article
The <i>Sorghum bicolor</i> reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization Open
Summary Sorghum bicolor is a drought tolerant C4 grass used for the production of grain, forage, sugar, and lignocellulosic biomass and a genetic model for C4 grasses due to its relatively small genome (approximately 800 Mbp), diploid gene…
View article
The UCSC Genome Browser database: 2018 update Open
The UCSC Genome Browser (https://genome.ucsc.edu) provides a web interface for exploring annotated genome assemblies. The assemblies and annotation tracks are updated on an ongoing basis—12 assemblies and more than 28 tracks were added in …
View article
Optical maps refine the bread wheat <i>Triticum aestivum</i> cv. Chinese Spring genome assembly Open
Summary Until recently, achieving a reference‐quality genome sequence for bread wheat was long thought beyond the limits of genome sequencing and assembly technology, primarily due to the large genome size and > 80% repetitive sequence con…
View article
MapMan4: A Refined Protein Classification and Annotation Framework Applicable to Multi-Omics Data Analysis Open
Genome sequences from over 200 plant species have already been published, with this number expected to increase rapidly due to advances in sequencing technologies. Once a new genome has been assembled and the genes identified, the function…
View article
Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species Open
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). To…
View article
IMG/M: integrated genome and metagenome comparative data analysis system Open
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single …
View article
PHASTEST: faster than PHASTER, better than PHAST Open
PHASTEST (PHAge Search Tool with Enhanced Sequence Translation) is the successor to the PHAST and PHASTER prophage finding web servers. PHASTEST is designed to support the rapid identification, annotation and visualization of prophage sequ…
View article
An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations Open
Advances in genome sequencing and assembly technologies are generating many high-quality genome sequences, but assemblies of large, repeat-rich polyploid genomes, such as that of bread wheat, remain fragmented and incomplete. We have gener…
View article
Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation Open
Viruses are the most abundant and diverse biological entities on earth, and while most of this diversity remains completely unexplored, advances in genome sequencing have provided unprecedented glimpses into the virosphere. The Prokaryotic…
View article
An improved pig reference genome sequence to enable pig genetics and genomics research Open
Background The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa…
View article
Ensembl Genomes 2022: an expanding genome resource for non-vertebrates Open
Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present g…
View article
BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA Open
Gene prediction has remained an active area of bioinformatics research for a long time. Still, gene prediction in large eukaryotic genomes presents a challenge that must be addressed by new algorithms. The amount and significance of the ev…
View article
ISEScan: automated identification of insertion sequence elements in prokaryotic genomes Open
Motivation The insertion sequence (IS) elements are the smallest but most abundant autonomous transposable elements in prokaryotic genomes, which play a key role in prokaryotic genome organization and evolution. With the fast growing genom…