Rob Patro
YOU?
Author Swipe
View article: mim: A lightweight auxiliary index to enable fast, parallel, gzipped FASTQ parsing
mim: A lightweight auxiliary index to enable fast, parallel, gzipped FASTQ parsing Open
The FASTQ file format is the lingua franca of primary data distribution and processing across most of bioinformatics. Over time, the compression, storage, transmission, and decompression of gzip compressed fastq.gz files has become a subst…
View article: Tree-based differential testing using inferential uncertainty for RNA-seq
Tree-based differential testing using inferential uncertainty for RNA-seq Open
Identifying differentially expressed transcripts poses a crucial yet challenging problem in transcriptomics. Substantial uncertainty is associated with the abundance estimates of certain transcripts which, if ignored, can lead to the exagg…
View article: Alevin-fry-atac enables rapid and memory frugal mapping of single-cell ATAC-seq data using virtual colors for accurate genomic pseudoalignment
Alevin-fry-atac enables rapid and memory frugal mapping of single-cell ATAC-seq data using virtual colors for accurate genomic pseudoalignment Open
Summary Ultrafast mapping of short reads via lightweight mapping techniques such as pseudoalignment has significantly accelerated transcriptomic and metagenomic analyses with minimal accuracy loss compared to alignment-based methods. Howev…
View article: <tt>Oarfish</tt>: enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification
Oarfish: enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification Open
Motivation Long-read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which …
View article: QCatch: A framework for quality control assessment and analysis of single-cell sequencing data
QCatch: A framework for quality control assessment and analysis of single-cell sequencing data Open
Motivation Single-cell sequencing data analysis requires robust quality control (QC) to mitigate technical artifacts and ensure reliable downstream results. While tools like alevin-fry and simpleaf (and augmented execution context for the …
View article: Kaminari: a resource-frugal index for approximate colored <i>k</i> -mer queries
Kaminari: a resource-frugal index for approximate colored <i>k</i> -mer queries Open
Motivation The problem of identifying the set of textual documents from a given database containing a query string has been studied in various fields of computing, e.g., in Information Retrieval, Databases, and Computational Biology. We co…
View article: Contributions of the Petabyte Scale Sequence Search Codeathon toward efforts to scale sequence-based searches on SRA
Contributions of the Petabyte Scale Sequence Search Codeathon toward efforts to scale sequence-based searches on SRA Open
The volume of biological data being generated by the scientific community is growing exponentially, reflecting technological advances and research activities. The National Institutes of Health's (NIH) Sequence Read Archive (SRA), which is …
View article: U-index: A Universal Indexing Framework for Matching Long Patterns
U-index: A Universal Indexing Framework for Matching Long Patterns Open
Text indexing is a fundamental and well-studied problem. Classic solutions either replace the original text with a compressed representation, e.g., the FM-index and its variants, or keep it uncompressed but attach some redundancy - an inde…
View article: Fast and Scalable Parallel External-Memory Construction of Colored Compacted de Bruijn Graphs with Cuttlefish 3
Fast and Scalable Parallel External-Memory Construction of Colored Compacted de Bruijn Graphs with Cuttlefish 3 Open
The rapid growth of genomic data over the past decade has made scalable and efficient sequence analysis algorithms, particularly for constructing de Bruijn graphs and their colored and compacted variants critical components of many bioinfo…
View article: Alevin-fry-atac enables rapid and memory frugal mapping of single-cell ATAC-seq data using virtual colors for accurate genomic pseudoalignment
Alevin-fry-atac enables rapid and memory frugal mapping of single-cell ATAC-seq data using virtual colors for accurate genomic pseudoalignment Open
Ultrafast mapping of short reads via lightweight mapping techniques such as pseudoalignment has significantly accelerated transcriptomic and metagenomic analyses, often with minimal accuracy loss compared to alignment-based methods. Howeve…
View article: Collapsible tree: interactive web app to present collapsible hierarchies
Collapsible tree: interactive web app to present collapsible hierarchies Open
Motivation A crucial component of intuitive data visualization is presenting a hierarchical tree structure with interactive functions. For example, single-cell transcriptomics studies may generate gene expression values with developmental …
View article: Where the Patterns Are: Repetition-Aware Compression for Colored de Bruijn Graphs <sup>*</sup>
Where the Patterns Are: Repetition-Aware Compression for Colored de Bruijn Graphs <sup>*</sup> Open
We describe lossless compressed data structures for the colored de Bruijn graph (or c-dBG). Given a collection of reference sequences, a c-dBG can be essentially regarded as a map from k-mers to their color sets. The color set of a k-mer i…
View article: Integrating Robotics for Enhanced Business Operations
Integrating Robotics for Enhanced Business Operations Open
This research study delves into the strategic incorporation of robotics into global business operations, stressing the interaction between technological progress and geopolitical and economic factors. Through an analysis of historical even…
View article: A renewed call for open artificial intelligence in biomedicine
A renewed call for open artificial intelligence in biomedicine Open
The excitement around and usage of artificial intelligence (AI) tools in scientific research is increasing across fields, but lax publication standards are resulting in papers "like grand mansions of straw, rather than sturdy houses of bri…
View article: A replicable and modular benchmark for long-read transcript quantification methods
A replicable and modular benchmark for long-read transcript quantification methods Open
We provide a replicable benchmark for long-read transcript quantification, and evaluate the performance of some recently-introduced tools on several synthetic long-read RNA-seq datasets. This benchmark is designed to allow the results to b…
View article: A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs
A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs Open
Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results is directly dependent on the quality of these…
View article: Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification
Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification Open
Detection of differential transcript usage (DTU) from RNA-seq data is an important bioinformatic analysis that complements differential gene expression analysis. Here we present a simple workflow using a set of existing R/Bioconductor pack…
View article: Where the patterns are: repetition-aware compression for colored de Bruijn graphs<sup>⋆</sup>
Where the patterns are: repetition-aware compression for colored de Bruijn graphs<sup>⋆</sup> Open
We describe lossless compressed data structures for the colored de Bruijn graph (or, c-dBG). Given a collection of reference sequences, a c-dBG can be essentially regarded as a map from k -mers to their color sets . The color set of a k -m…
View article: Identification of intracellular bacteria from multiple single-cell RNA-seq platforms using CSI-Microbes
Identification of intracellular bacteria from multiple single-cell RNA-seq platforms using CSI-Microbes Open
The study of the tumor microbiome has been garnering increased attention. We developed a computational pipeline (CSI-Microbes) for identifying microbial reads from single-cell RNA sequencing (scRNA-seq) data and for analyzing differential …
View article: <i>DifferentialRegulation</i> : a Bayesian hierarchical approach to identify differentially regulated genes
<i>DifferentialRegulation</i> : a Bayesian hierarchical approach to identify differentially regulated genes Open
Summary Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and change…
View article: Fast, parallel, and cache-friendly suffix array construction
Fast, parallel, and cache-friendly suffix array construction Open
Purpose String indexes such as the suffix array ( sa ) and the closely related longest common prefix ( lcp ) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance in practice, few…
View article: <tt>Forseti</tt> : a mechanistic and predictive model of the splicing status of scRNA-seq reads
Forseti : a mechanistic and predictive model of the splicing status of scRNA-seq reads Open
Motivation Short-read single-cell RNA-sequencing (scRNA-seq) has been used to study cellular heterogeneity, cellular fate, and transcriptional dynamics. Modeling splicing dynamics in scRNA-seq data is challenging, with inherent difficulty …
View article: Designing efficient randstrobes for sequence similarity analyses
Designing efficient randstrobes for sequence similarity analyses Open
Motivation Substrings of length k, commonly referred to as k-mers, play a vital role in sequence analysis. However, k-mers are limited to exact matches between sequences leading to alternative constructs. We recently introduced a class of …
View article: Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification
Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification Open
Motivation: Long read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which…
View article: Forseti: A mechanistic and predictive model of the splicing status of scRNA-seq reads
Forseti: A mechanistic and predictive model of the splicing status of scRNA-seq reads Open
Motivation Short-read single-cell RNA-sequencing (scRNA-seq) has been used to study cellular heterogeneity, cellular fate, and transcriptional dynamics. Modeling splicing dynamics in scRNA-seq data is challenging, with inherent difficulty …
View article: scCensus: Off-target scRNA-seq reads reveal meaningful biology
scCensus: Off-target scRNA-seq reads reveal meaningful biology Open
Single-cell RNA-sequencing (scRNA-seq) provides unprecedented insights into cellular heterogeneity. Although scRNA-seq reads from most prevalent and popular tagged-end protocols are expected to arise from the 3′ end of polyadenylated RNAs,…
View article: Supplementary files for "scCensus: Off-target scRNA-seq reads reveal meaningful biology"
Supplementary files for "scCensus: Off-target scRNA-seq reads reveal meaningful biology" Open
The Supplementary files for the manuscript "scCensus: Off-target scRNA-seq reads reveal meaningful biology"
View article: Supplementary files for "scCensus: Off-target scRNA-seq reads reveal meaningful biology"
Supplementary files for "scCensus: Off-target scRNA-seq reads reveal meaningful biology" Open
The Supplementary files for the manuscript "scCensus: Off-target scRNA-seq reads reveal meaningful biology"