MIT License ≈ MIT License
View article
MultiQC: summarize analysis results for multiple tools and samples in a single report Open
Motivation: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and …
View article
UniProt: a worldwide hub of protein knowledge Open
The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life. Detailed annotations extracted from the literature by expert curators have been collected for over half a mil…
View article
PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses Open
PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses. PartitionFinder 2 is substantially faster and more efficient than version 1, and incorporates many …
View article
GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database Open
Summary The Genome Taxonomy Database Toolkit (GTDB-Tk) provides objective taxonomic assignments for bacterial and archaeal genomes based on the GTDB. GTDB-Tk is computationally efficient and able to classify thousands of draft genomes in p…
View article
UpSetR: an R package for the visualization of intersecting sets and their properties Open
Motivation Venn and Euler diagrams are a popular yet inadequate solution for quantitative visualization of set intersections. A scalable alternative to Venn and Euler diagrams for visualizing intersecting sets and their properties is neede…
View article
NanoPack: visualizing and processing long-read sequencing data Open
Summary Here we describe NanoPack, a set of tools developed for visualization and processing of long-read sequencing data from Oxford Nanopore Technologies and Pacific Biosciences. Availability and implementation The NanoPack tools are wri…
View article
YaHS: yet another Hi-C scaffolding tool Open
Summary We present YaHS, a user-friendly command-line tool for the construction of chromosome-scale scaffolds from Hi-C data. It can be run with a single-line command, requires minimal input from users (an assembly file and an alignment fi…
View article
Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced\n Datasets in Machine Learning Open
Imbalanced-learn is an open-source python toolbox aiming at providing a wide\nrange of methods to cope with the problem of imbalanced dataset frequently\nencountered in machine learning and pattern recognition. The implemented\nstate-of-th…
View article
clinker & clustermap.js: automatic generation of gene cluster comparison figures Open
Summary Genes involved in biological pathways are often collocalised in gene clusters, the comparison of which can give valuable insights into their function and evolutionary history. However, comparison and visualization of gene cluster s…
View article
Mosdepth: quick coverage calculation for genomes and exomes Open
Summary Mosdepth is a new command-line tool for rapidly calculating genome-wide sequencing coverage. It measures depth from BAM or CRAM files at either each nucleotide position in a genome or for sets of genomic regions. Genomic regions ma…
View article
DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication Open
Summary We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7000 jobs have been pr…
View article
The VIA Annotation Software for Images, Audio and Video Open
In this paper, we introduce a simple and standalone manual annotation tool for images, audio and video: the VGG Image Annotator (VIA). This is a light weight, standalone and offline software package that does not require any installation o…
View article
karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data Open
Motivation Data visualization is a crucial tool for data exploration, analysis and interpretation. For the visualization of genomic data there lacks a tool to create customizable non-circular plots of whole genomes from any species. Result…
View article
ampvis2: an R package to analyse and visualise 16S rRNA amplicon data Open
Summary Microbial community analysis using 16S rRNA gene amplicon sequencing is the backbone of many microbial ecology studies. Several approaches and pipelines exist for processing the raw data generated through DNA sequencing and convert…
View article
heatmaply: an R package for creating interactive cluster heatmaps for online publishing Open
Summary heatmaply is an R package for easily creating interactive cluster heatmaps that can be shared online as a stand-alone HTML file. Interactivity includes a tooltip display of values when hovering over cells, as well as the ability to…
View article
NanoPack2: population-scale evaluation of long-read sequencing data Open
Summary Increases in the cohort size in long-read sequencing projects necessitate more efficient software for quality assessment and processing of sequencing data from Oxford Nanopore Technologies and Pacific Biosciences. Here, we describe…
View article
KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies Open
Motivation De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in …
View article
Horovod: fast and easy distributed deep learning in TensorFlow Open
Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the tr…
View article
NGL viewer: web-based molecular graphics for large complexes Open
Motivation The interactive visualization of very large macromolecular complexes on the web is becoming a challenging problem as experimental techniques advance at an unprecedented rate and deliver structures of increasing size. Results We …
View article
The Python ARM Radar Toolkit (Py-ART), a Library for Working with Weather Radar Data in the Python Programming Language Open
The Python ARM Radar Toolkit is a package for reading, visualizing, correcting and analysing data from weather radars. Development began to meet the needs of the Atmospheric Radiation Measurement Climate Research Facility and has since exp…
View article
AnnotSV: an integrated tool for structural variations annotation Open
Summary Structural Variations (SV) are a major source of variability in the human genome that shaped its actual structure during evolution. Moreover, many human diseases are caused by SV, highlighting the need to accurately detect those ge…
View article
igv.js: an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV) Open
Summary igv.js is an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV). It can be easily dropped into any web page with a single line of code and has no external dependencies. The viewer runs completely in the w…
View article
Tslearn, A Machine Learning Toolkit for Time Series Data Open
International audience
View article
Pharokka: a fast scalable bacteriophage annotation tool Open
Summary In recent years, there has been an increasing interest in bacteriophages, which has led to growing numbers of bacteriophage genomic sequences becoming available. Consequently, there is a need for a rapid and consistent genomic anno…
View article
ggVennDiagram: An Intuitive, Easy-to-Use, and Highly Customizable R Package to Generate Venn Diagram Open
Venn diagrams are widely used diagrams to show the set relationships in biomedical studies. In this study, we developed ggVennDiagram, an R package that could automatically generate high-quality Venn diagrams with two to seven sets. The gg…
View article
TFBSTools: an R/bioconductor package for transcription factor binding site analysis Open
Summary: The ability to efficiently investigate transcription factor binding sites (TFBSs) genome-wide is central to computational studies of gene regulation. TFBSTools is an R/Bioconductor package for the analysis and manipulation of TFBS…
View article
bioBakery: a meta’omic analysis environment Open
Summary bioBakery is a meta’omic analysis environment and collection of individual software tools with the capacity to process raw shotgun sequencing data into actionable microbial community feature profiles, summary reports, and publicati…
View article
ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions Open
Summary We describe a novel computational method for genotyping repeats using sequence graphs. This method addresses the long-standing need to accurately genotype medically important loci containing repeats adjacent to other variants or im…
View article
Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models Open
Even todays most advanced machine learning models are easily fooled by almost imperceptible perturbations of their inputs. Foolbox is a new Python package to generate such adversarial perturbations and to quantify and compare the robustnes…
View article
Use ggbreak to Effectively Utilize Plotting Space to Deal With Large Datasets and Outliers Open
With the rapid increase of large-scale datasets, biomedical data visualization is facing challenges. The data may be large, have different orders of magnitude, contain extreme values, and the data distribution is not clear. Here we present…