Ava P. Amini
YOU?
Author Swipe
View article: Adaptive resampling for improved machine learning in imbalanced single-cell datasets
Adaptive resampling for improved machine learning in imbalanced single-cell datasets Open
While machine learning models trained on single-cell transcriptomics data have shown great promise in providing biological insights, existing tools struggle to effectively model underrepresented and out-of-distribution cellular features or…
View article: The Dayhoff Atlas: scaling sequence diversity for improved protein generation
The Dayhoff Atlas: scaling sequence diversity for improved protein generation Open
Modern biology is powered by the organization of biological information, a framework pioneered in 1965 by Margaret Dayhoff’s Atlas of Protein Sequence and Structure. Databases descended from this common ancestor power computational methods…
View article: Trainable subnetworks reveal insights into structure knowledge organization in protein language models
Trainable subnetworks reveal insights into structure knowledge organization in protein language models Open
Protein language models (PLMs) pretrained via a masked language modeling objective have proven effective across a range of structure-related tasks, including high-resolution structure prediction. However, it remains unclear to what extent …
View article: Hierarchical cross-entropy loss improves atlas-scale single-cell annotation models
Hierarchical cross-entropy loss improves atlas-scale single-cell annotation models Open
Accurately annotating cell types is essential for extracting biological insight from single-cell RNA-seq data. Although cell types are naturally organized into hierarchical ontologies, most computational models do not explicitly incorporat…
View article: Zero-shot evaluation reveals limitations of single-cell foundation models
Zero-shot evaluation reveals limitations of single-cell foundation models Open
Foundation models such as scGPT and Geneformer have not been rigorously evaluated in a setting where they are used without any further training (i.e., zero-shot). Understanding the performance of models in zero-shot settings is critical to…
View article: Causal integration of chemical structures improves representations of microscopy images for morphological profiling
Causal integration of chemical structures improves representations of microscopy images for morphological profiling Open
Recent advances in self-supervised deep learning have improved our ability to quantify cellular morphological changes in high-throughput microscopy screens, a process known as morphological profiling. However, most current methods only lea…
View article: ProtNote: a multimodal method for protein–function annotation
ProtNote: a multimodal method for protein–function annotation Open
Motivation Understanding the protein sequence–function relationship is essential for advancing protein biology and engineering. However, <1% of known protein sequences have human-verified functions. While deep-learning methods have demo…
View article: Deep learning guided design of protease substrates
Deep learning guided design of protease substrates Open
Proteases, a class of enzymes that play critical roles in health and disease, exert their function through the cleavage of peptide bonds. Identifying substrates that are efficiently and selectively cleaved by target proteases is essential …
View article: Consequences of training data composition for deep learning models in single-cell biology
Consequences of training data composition for deep learning models in single-cell biology Open
Foundation models for single-cell transcriptomics have the potential to augment (or replace) purpose-built tools for a variety of common analyses, especially when data are sparse. Recent work with large language models has shown that train…
View article: Toward deep learning sequence–structure co-generation for protein design
Toward deep learning sequence–structure co-generation for protein design Open
Deep generative models that learn from the distribution of natural protein sequences and structures may enable the design of new proteins with valuable functions. While the majority of today's models focus on generating either sequences or…
View article: Benchmarking uncertainty quantification for protein engineering
Benchmarking uncertainty quantification for protein engineering Open
Machine learning sequence-function models for proteins could enable significant advances in protein engineering, especially when paired with state-of-the-art methods to select new sequences for property optimization and/or model improvemen…
View article: Evaluating the role of pre-training dataset size and diversity on single-cell foundation model performance
Evaluating the role of pre-training dataset size and diversity on single-cell foundation model performance Open
The success of transformer-based foundation models on natural language and images has motivated their use in single-cell biology. Single-cell foundation models have been trained on increasingly larger transcriptomic datasets, scaling from …
View article: ProtNote: a multimodal method for protein-function annotation
ProtNote: a multimodal method for protein-function annotation Open
Understanding the protein sequence-function relationship is essential for advancing protein biology and engineering. However, fewer than 1% of known protein sequences have human-verified functions. While deep learning methods have demonstr…
View article: Mutation and cell state compatibility is required and targetable in Ph<i>+</i>acute lymphoblastic leukemia minimal residual disease
Mutation and cell state compatibility is required and targetable in Ph<i>+</i>acute lymphoblastic leukemia minimal residual disease Open
SUMMARY Efforts to cure BCR::ABL1 B cell acute lymphoblastic leukemia (Ph+ ALL) solely through inhibition of ABL1 kinase activity have thus far been insufficient despite the availability of tyrosine kinase inhibitors (TKIs) with broad acti…
View article: A knockoff calibration method to avoid over-clustering in single-cell RNA-sequencing
A knockoff calibration method to avoid over-clustering in single-cell RNA-sequencing Open
Standard single-cell RNA-sequencing (scRNA-seq) pipelines nearly always include unsupervised clustering as a key step in identifying biologically distinct cell types. A follow-up step in these pipelines is to test for differential expressi…
View article: Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data
Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data Open
Clustering is commonly used in single-cell RNA-sequencing (scRNA-seq) pipelines to characterize cellular heterogeneity. However, current methods face two main limitations. First, they require user-specified heuristics which add time and co…
View article: Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models
Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models Open
Large pretrained protein language models (PLMs) have improved protein property and structure prediction from sequences via transfer learning, in which weights and representations from PLMs are repurposed for downstream tasks. Although PLMs…
View article: Protein structure generation via folding diffusion
Protein structure generation via folding diffusion Open
The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction,…
View article: Priming agents transiently reduce the clearance of cell-free DNA to improve liquid biopsies
Priming agents transiently reduce the clearance of cell-free DNA to improve liquid biopsies Open
Liquid biopsies enable early detection and monitoring of diseases such as cancer, but their sensitivity remains limited by the scarcity of analytes such as cell-free DNA (cfDNA) in blood. Improvements to sensitivity have primarily relied o…
View article: Assessing the limits of zero-shot foundation models in single-cell biology
Assessing the limits of zero-shot foundation models in single-cell biology Open
The advent and success of foundation models such as GPT has sparked growing interest in their application to single-cell biology. Models like Geneformer and scGPT have emerged with the promise of serving as versatile tools for this special…
View article: FoldingDiff generated structures (n=780, main results) and associated metadata
FoldingDiff generated structures (n=780, main results) and associated metadata Open
Backbone structures generated by FoldingDiff spanning lengths [50, 128). Each length has 10 randomly sampled structures for a total of 780 backbone structures. These were used to derive all results in our manuscript's main results section.…
View article: FoldingDiff generated structures (n=780, main results) and associated metadata
FoldingDiff generated structures (n=780, main results) and associated metadata Open
Backbone structures generated by FoldingDiff spanning lengths [50, 128). Each length has 10 randomly sampled structures for a total of 780 backbone structures. These were used to derive all results in our manuscript's main results section.…
View article: Protein generation with evolutionary diffusion: sequence is all you need
Protein generation with evolutionary diffusion: sequence is all you need Open
Deep generative models are increasingly powerful tools for the in silico design of novel proteins. Recently, a family of generative models called diffusion models has demonstrated the ability to generate biologically plausible proteins tha…
View article: Continuous Time Evidential Distributions for Irregular Time Series
Continuous Time Evidential Distributions for Irregular Time Series Open
Prevalent in many real-world settings such as healthcare, irregular time series are challenging to formulate predictions from. It is difficult to infer the value of a feature at any given time when observations are sporadic, as it could ta…
View article: Protein generation with evolutionary diffusion
Protein generation with evolutionary diffusion Open
Model checkpoints and generated sequences saved as csv files, as referenced in github
View article: Protein generation with evolutionary diffusion
Protein generation with evolutionary diffusion Open
Model checkpoints and generated sequences saved as csv files, as referenced in github
View article: Generation of Protein Sequences and Evolutionary Alignments via Discrete Diffusion Models
Generation of Protein Sequences and Evolutionary Alignments via Discrete Diffusion Models Open
Model checkpoints and generated sequences in the form of FASTA files referenced in github
View article: Deep self-supervised learning for biosynthetic gene cluster detection and product classification
Deep self-supervised learning for biosynthetic gene cluster detection and product classification Open
Natural products are chemical compounds that form the basis of many therapeutics used in the pharmaceutical industry. In microbes, natural products are synthesized by groups of colocalized genes called biosynthetic gene clusters (BGCs). Wi…