Theofanis Karaletsos
YOU?
Author Swipe
View article: VariantFormer: A hierarchical transformer integrating DNA sequences with genetic variations and regulatory landscapes for personalized gene expression prediction
VariantFormer: A hierarchical transformer integrating DNA sequences with genetic variations and regulatory landscapes for personalized gene expression prediction Open
1 Abstract Accurately predicting gene expression from DNA sequence remains a central challenge in human genetics. Current sequence-based models overlook natural genetic variation across individuals, while population-based models are restri…
View article: BayesRVAT enhances rare-variant association testing through Bayesian aggregation of functional annotations
BayesRVAT enhances rare-variant association testing through Bayesian aggregation of functional annotations Open
Gene-level rare variant association tests (RVATs) are essential for uncovering disease mechanisms and identifying therapeutic targets. Advances in sequence-based machine learning have generated diverse variant pathogenicity scores, creatin…
View article: rbio1 - training scientific reasoning LLMs with biological world models as soft verifiers
rbio1 - training scientific reasoning LLMs with biological world models as soft verifiers Open
Reasoning Models are typically trained against verification mechanisms in formally specified systems such as code or symbolic math. However, in open domains like biology, we do not generally have access to exact rules facilitating formal v…
View article: GREmLN: A Cellular Graph Structure Aware Transcriptomics Foundation Model
GREmLN: A Cellular Graph Structure Aware Transcriptomics Foundation Model Open
A bstract The ever-increasing availability of large-scale single-cell profiles presents an opportunity to develop foundation models to capture cell properties and behavior. However, standard language models such as transformers benefits fr…
View article: A Cross-Species Generative Cell Atlas Across 1.5 Billion Years of Evolution: The TranscriptFormer Single-cell Model
A Cross-Species Generative Cell Atlas Across 1.5 Billion Years of Evolution: The TranscriptFormer Single-cell Model Open
Single-cell transcriptomics has revolutionized our understanding of cellular diversity, yet our understanding of the transcriptional programs across the tree of life remains limited. Here we present TranscriptFormer, a family of generative…
View article: Variational Control for Guidance in Diffusion Models
Variational Control for Guidance in Diffusion Models Open
Diffusion models exhibit excellent sample quality, but existing guidance methods often require additional model training or are limited to specific tasks. We revisit guidance in diffusion models from the perspective of variational inferenc…
View article: Pitfalls in performing genome-wide association studies on ratio traits
Pitfalls in performing genome-wide association studies on ratio traits Open
Genome-wide association studies (GWASs) are often performed on ratios composed of a numerator trait divided by a denominator trait. Examples include body mass index (BMI) and the waist-to-hip ratio, among many others. Explicitly or implici…
View article: SubCell: Proteome-aware vision foundation models for microscopy capture single-cell biology
SubCell: Proteome-aware vision foundation models for microscopy capture single-cell biology Open
Cell morphology and subcellular protein organization provide important insights into cellular function and behavior. These features of cells can be studied using large-scale protein fluorescence microscopy, and machine learning has become …
View article: How to build the virtual cell with artificial intelligence: Priorities and opportunities
How to build the virtual cell with artificial intelligence: Priorities and opportunities Open
The cell is arguably the most fundamental unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artifi…
View article: AI: A transformative opportunity in cell biology
AI: A transformative opportunity in cell biology Open
The success of artificial intelligence (AI) algorithms in predicting protein structure and more recently, protein interactions, demonstrates the power and potential of machine learning and AI for advancing and accelerating biomedical resea…
View article: scGenePT: Is language all you need for modeling single-cell perturbations?
scGenePT: Is language all you need for modeling single-cell perturbations? Open
Modeling single-cell perturbations is a crucial task in the field of single-cell biology. Predicting the effect of up or down gene regulation or drug treatment on the gene expression profile of a cell can open avenues in understanding biol…
View article: Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI
Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI Open
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude o…
View article: Deep Learning Analysis on Images of iPSC-derived Motor Neurons Carrying fALS-genetics Reveals Disease-Relevant Phenotypes
Deep Learning Analysis on Images of iPSC-derived Motor Neurons Carrying fALS-genetics Reveals Disease-Relevant Phenotypes Open
Summary Amyotrophic lateral sclerosis (ALS) is a devastating condition with very limited treatment options. It is a heterogeneous disease with complex genetics and unclear etiology, making the discovery of disease-modifying interventions v…
View article: EmbedGEM: a framework to evaluate the utility of embeddings for genetic discovery
EmbedGEM: a framework to evaluate the utility of embeddings for genetic discovery Open
Summary Machine learning-derived embeddings are a compressed representation of high content data modalities. Embeddings can capture detailed information about disease states and have been qualitatively shown to be useful in genetic discove…
View article: EmbedGEM: A framework to evaluate the utility of embeddings for genetic discovery
EmbedGEM: A framework to evaluate the utility of embeddings for genetic discovery Open
Machine learning (ML)-derived embeddings are a compressed representation of high content data modalities. Embeddings can capture detailed information about disease states and have been qualitatively shown to be useful in genetic discovery.…
View article: Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder
Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder Open
Generative models of observations under interventions have been a vibrant topic of interest across machine learning and the sciences in recent years. For example, in drug discovery, there is a need to model the effects of diverse intervent…
View article: Pitfalls in performing genome-wide association studies on ratio traits
Pitfalls in performing genome-wide association studies on ratio traits Open
Genome-wide association studies (GWAS) are often performed on ratios composed of a numerator trait divided by a denominator trait. Examples include body mass index (BMI) and the waist-to-hip ratio, among many others. Explicitly or implicit…
View article: Compositional Deep Probabilistic Models of DNA Encoded Libraries
Compositional Deep Probabilistic Models of DNA Encoded Libraries Open
DNA-Encoded Library (DEL) has proven to be a powerful tool that utilizes combinatorially constructed small molecules to facilitate highly-efficient screening assays. These selection experiments, involving multiple stages of washing, elutio…
View article: Channel Vision Transformers: An Image Is Worth 1 x 16 x 16 Words
Channel Vision Transformers: An Image Is Worth 1 x 16 x 16 Words Open
Vision Transformer (ViT) has emerged as a powerful architecture in the realm of modern computer vision. However, its application in certain imaging fields, such as microscopy and satellite imaging, presents unique challenges. In these doma…
View article: An allelic-series rare-variant association test for candidate-gene discovery
An allelic-series rare-variant association test for candidate-gene discovery Open
View article: Contextual Vision Transformers for Robust Representation Learning
Contextual Vision Transformers for Robust Representation Learning Open
We introduce Contextual Vision Transformers (ContextViT), a method designed to generate robust image representations for datasets experiencing shifts in latent factors across various groups. Derived from the concept of in-context learning,…
View article: An allelic series rare variant association test for candidate gene discovery
An allelic series rare variant association test for candidate gene discovery Open
Allelic series are of candidate therapeutic interest due to the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a gene in which increas…
View article: Black-box Coreset Variational Inference
Black-box Coreset Variational Inference Open
Recent advances in coreset methods have shown that a selection of representative datapoints can replace massive volumes of data for Bayesian inference, preserving the relevant statistical information and significantly accelerating subseque…
View article: TyXe: Pyro-based Bayesian neural nets for Pytorch
TyXe: Pyro-based Bayesian neural nets for Pytorch Open
We introduce TyXe, a Bayesian neural network library built on top of Pytorch and Pyro. Our leading design principle is to cleanly separate architecture, prior, inference and likelihood specification, allowing for a flexible workflow where …
View article: Localized Uncertainty Attacks
Localized Uncertainty Attacks Open
The susceptibility of deep learning models to adversarial perturbations has stirred renewed attention in adversarial examples resulting in a number of attacks. However, most of these attacks fail to encompass a large spectrum of adversaria…
View article: Stochastic Aggregation in Graph Neural Networks
Stochastic Aggregation in Graph Neural Networks Open
Graph neural networks (GNNs) manifest pathologies including over-smoothing and limited discriminating power as a result of suboptimally expressive aggregating mechanisms. We herein present a unifying framework for stochastic aggregation (S…
View article: Variational Auto-Regressive Gaussian Processes for Continual Learning
Variational Auto-Regressive Gaussian Processes for Continual Learning Open
Through sequential construction of posteriors on observing data online, Bayes' theorem provides a natural framework for continual learning. We develop Variational Auto-Regressive Gaussian Processes (VAR-GPs), a principled posterior updatin…
View article: Generalized Hidden Parameter MDPs:Transferable Model-Based RL in a Handful of Trials
Generalized Hidden Parameter MDPs:Transferable Model-Based RL in a Handful of Trials Open
There is broad interest in creating RL agents that can solve many (related) tasks and adapt to new tasks and environments after initial training. Model-based RL leverages learned surrogate models that describe dynamics and rewards of indiv…
View article: Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights
Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights Open
Probabilistic neural networks are typically modeled with independent weight priors, which do not capture weight correlations in the prior and do not provide a parsimonious interface to express properties in function space. A desirable clas…
View article: Generalized Hidden Parameter MDPs Transferable Model-based RL in a Handful of Trials
Generalized Hidden Parameter MDPs Transferable Model-based RL in a Handful of Trials Open
There is broad interest in creating RL agents that can solve many (related) tasks and adapt to new tasks and environments after initial training. Model-based RL leverages learned surrogate models that describe dynamics and rewards of indiv…