View article: STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets Open
Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein …
View article
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update Open
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of …
View article
The mutational constraint spectrum quantified from variation in 141,456 humans Open
Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in na…
View article
KEGG: new perspectives on genomes, pathways, diseases and drugs Open
KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an encyclopedia of genes and genomes. Assigning functional meanings to genes and genomes both at the molecular and higher levels is the primary objective of the KEGG database proj…
View article
Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads Open
The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long re…
View article
Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies Open
The recent advent of DNA sequencing technologies facilitates the use of genome sequencing data that provide means for more informative and precise classification and identification of members of the Bacteria and Archaea. Because the curren…
View article
The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest Open
Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scat…
View article
NCBI prokaryotic genome annotation pipeline Open
Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our kn…
View article
Toward understanding the origin and evolution of cellular organisms Open
In this era of high‐throughput biology, bioinformatics has become a major discipline for making sense out of large‐scale datasets. Bioinformatics is usually considered as a practical field developing databases and software tools for suppor…
View article
KEGG for taxonomy-based analysis of pathways and genomes Open
KEGG (https://www.kegg.jp) is a manually curated database resource integrating various biological objects categorized into systems, genomic, chemical and health information. Each object (database entry) is identified by the KEGG identifier…
View article
Maftools: efficient and comprehensive analysis of somatic variants in cancer Open
Numerous large-scale genomic studies of matched tumor-normal samples have established the somatic landscapes of most cancer types. However, the downstream analysis of data from somatic mutations entails a number of computational and statis…
View article
GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database Open
Summary The Genome Taxonomy Database Toolkit (GTDB-Tk) provides objective taxonomic assignments for bacterial and archaeal genomes based on the GTDB. GTDB-Tk is computationally efficient and able to classify thousands of draft genomes in p…
View article
miRBase: from microRNA sequences to function Open
This FAIRsharing record describes: The miRBase database is a searchable database of published miRNA sequences and annotation. Each entry in miRBase represents a predicted hairpin portion of a miRNA transcript (termed mir in the database), …
View article
metaSPAdes: a new versatile metagenomic assembler Open
While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacter…
View article
eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale Open
Even though automated functional annotation of genes represents a fundamental step in most genomic and metagenomic workflows, it remains challenging at large scales. Here, we describe a major upgrade to eggNOG-mapper, a tool for functional…
View article
CADD: predicting the deleteriousness of variants throughout the human genome Open
Combined Annotation-Dependent Depletion (CADD) is a widely used measure of variant deleteriousness that can effectively prioritize causal variants in genetic analyses, particularly highly penetrant contributors to severe Mendelian disorder…
View article
PHASTER: a better, faster version of the PHAST phage search tool Open
PHASTER (PHAge Search Tool - Enhanced Release) is a significant upgrade to the popular PHAST web server for the rapid identification and annotation of prophage sequences within bacterial genomes and plasmids. Although the steps in the phag…
View article
RepeatModeler2 for automated genomic discovery of transposable element families Open
The accelerating pace of genome sequencing throughout the tree of life is driving the need for improved unsupervised annotation of genome components such as transposable elements (TEs). Because the types and sequences of TEs are highly var…
View article
The repertoire of mutational signatures in human cancer Open
Somatic mutations in cancer genomes are caused by multiple mutational processes, each of which generates a characteristic mutational signature 1 . Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium 2 of the Intern…
View article
MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies Open
We previously reported on MetaBAT, an automated metagenome binning software tool to reconstruct single genomes from microbial communities for subsequent analyses of uncultivated microbial species. MetaBAT has become one of the most popular…
View article
Gene Set Knowledge Discovery with Enrichr Open
Profiling samples from patients, tissues, and cells with genomics, transcriptomics, epigenomics, proteomics, and metabolomics ultimately produces lists of genes and proteins that need to be further analyzed and integrated in the context of…
View article
GENCODE reference annotation for the human and mouse genomes Open
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE…
View article
Nextstrain: real-time tracking of pathogen evolution Open
Summary Understanding the spread and evolution of pathogens is important for effective public health measures and surveillance. Nextstrain consists of a database of viral genomes, a bioinformatics pipeline for phylodynamics analysis, and a…
View article
Shifting the limits in wheat research and breeding using a fully annotated reference genome Open
Insights from the annotated wheat genome Wheat is one of the major sources of food for much of the world. However, because bread wheat's genome is a large hybrid mix of three separate subgenomes, it has been difficult to produce a high-qua…
View article
Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Open
A mysterious outbreak of atypical pneumonia in late 2019 was traced to a seafood wholesale market in Wuhan of China. Within a few weeks, a novel coronavirus tentatively named as 2019 novel coronavirus (2019-nCoV) was announced by the World…
View article
Mash: fast genome and metagenome distance estimation using MinHash Open
Mash extends the MinHash dimensionality-reduction technique to include a pairwise mutation distance and P value significance test, enabling the efficient clustering and search of massive sequence collections. Mash reduces large sequences a…
View article
PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools Open
PANTHER (Protein Analysis Through Evolutionary Relationships, http://pantherdb.org) is a resource for the evolutionary and functional classification of genes from organisms across the tree of life. We report the improvements we have made t…
View article
Fast and accurate de novo genome assembly from long uncorrected reads Open
The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction…
View article
Emerging coronaviruses: Genome structure, replication, and pathogenesis Open
The recent emergence of a novel coronavirus (2019‐nCoV), which is causing an outbreak of unusual viral pneumonia in patients in Wuhan, a central city in China, is another warning of the risk of CoVs posed to public health. In this minirevi…
View article
Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications Open
The PubMLST.org website hosts a collection of open-access, curated databases that integrate population sequence data with provenance and phenotype information for over 100 different microbial species and genera. Although the PubMLST websit…