Data mining ≈ Data mining
View article: fastp: an ultra-fast all-in-one FASTQ preprocessor
fastp: an ultra-fast all-in-one FASTQ preprocessor Open
Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality fi…
View article
MEGA11: Molecular Evolutionary Genetics Analysis Version 11 Open
The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive tool for …
View article
STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets Open
Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein …
View article
Metascape provides a biologist-oriented resource for the analysis of systems-level datasets Open
A critical component in the interpretation of systems-level studies is the inference of enriched biological pathways and protein complexes contained within OMICs datasets. Successful analysis requires the integration of a broad set of curr…
View article
SWISS-MODEL: homology modelling of protein structures and complexes Open
Homology modelling has matured into an important technique in structural biology, significantly contributing to narrowing the gap between known protein sequences and experimentally determined structures. Fully automated workflows and serve…
View article
A survey on Image Data Augmentation for Deep Learning Open
Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a fun…
View article
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update Open
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of …
View article
VSEARCH: a versatile open source tool for metagenomics Open
Background VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USE…
View article
Complex heatmaps reveal patterns and correlations in multidimensional genomic data Open
Summary: Parallel heatmaps with carefully designed annotation graphics are powerful for efficient visualization of patterns and relationships among high dimensional genomic data. Here we present the ComplexHeatmap package that provides ric…
View article
MultiQC: summarize analysis results for multiple tools and samples in a single report Open
Motivation: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and …
View article
<span>UCSF ChimeraX</span>: Structure visualization for researchers, educators, and developers Open
UCSF ChimeraX is the next‐generation interactive visualization program from the Resource for Biocomputing, Visualization, and Informatics (RBVI), following UCSF Chimera. ChimeraX brings (a) significant performance and graphics enhancements…
View article
Principal component analysis: a review and recent developments Open
Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing in…
View article
ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R Open
Summary After more than fifteen years of existence, the R package ape has continuously grown its contents, and has been used by a growing community of users. The release of version 5.0 has marked a leap towards a modern software for evolut…
View article
deepTools2: a next generation web server for deep-sequencing data analysis Open
We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizatio…
View article
MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization Open
This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs …
View article
The MR-Base platform supports systematic causal inference across the human phenome Open
Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, …
View article
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models Open
The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unpre…
View article
DnaSP 6: DNA Sequence Polymorphism Analysis of Large Data Sets Open
We present version 6 of the DNA Sequence Polymorphism (DnaSP) software, a new version of the popular tool for performing exhaustive population genetic analyses on multiple sequence alignments. This major upgrade incorporates novel function…
View article
Using PLS path modeling in new technology research: updated guidelines Open
Purpose – Partial least squares (PLS) path modeling is a variance-based structural equation modeling (SEM) technique that is widely applied in business and social sciences. Its ability to model composites and factors makes it a formidable …
View article
A survey of transfer learning Open
Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the…
View article
Deep Learning with Differential Privacy Open
Machine learning techniques based on neural networks are achieving remarkable\nresults in a wide variety of domains. Often, the training of models requires\nlarge, representative datasets, which may be crowdsourced and contain sensitive\ni…
View article
Communication-Efficient Learning of Deep Networks from Decentralized Data Open
Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image …
View article
Maftools: efficient and comprehensive analysis of somatic variants in cancer Open
Numerous large-scale genomic studies of matched tumor-normal samples have established the somatic landscapes of most cancer types. However, the downstream analysis of data from somatic mutations entails a number of computational and statis…
View article
How to perform a meta-analysis with R: a practical tutorial Open
Objective Meta-analysis is of fundamental importance to obtain an unbiased assessment of the available evidence. In general, the use of meta-analysis has been increasing over the last three decades with mental health as a major research to…
View article
<span>ggtree</span> : an <span>r</span> package for visualization and annotation of phylogenetic trees with their covariates and other associated data Open
Summary We present an r package, ggtree , which provides programmable visualization and annotation of phylogenetic trees. ggtree can read more tree file formats than other softwares, including newick , nexus , NHX , phylip and jplace forma…
View article
MolProbity: More and better reference data for improved all‐atom structure validation Open
This paper describes the current update on macromolecular model validation services that are provided at the MolProbity website, emphasizing changes and additions since the previous review in 2010. There have been many infrastructure impro…
View article
<i>Mercury 4.0</i>: from visualization to analysis, design and prediction Open
The program Mercury , developed at the Cambridge Crystallographic Data Centre, was originally designed primarily as a crystal structure visualization tool. Over the years the fields and scientific communities of chemical crystallography an…
View article
Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation Open
Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the s…
View article
On the Dangers of Stochastic Parrots Open
The past 3 years of work in NLP have been characterized by the development and deployment of ever larger language models, especially for English. BERT, its variants, GPT-2/3, and others, most recently Switch-C, have pushed the boundaries o…
View article
Comparative Protein Structure Modeling Using MODELLER Open
Comparative protein structure modeling predicts the three‐dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists o…