Pascal Notin
YOU?
Author Swipe
View article: Sampling Protein Language Models for Functional Protein Design
Sampling Protein Language Models for Functional Protein Design Open
Protein language models have emerged as powerful tools for learning rich protein representations, improving performance in tasks like structure prediction, mutation effect estimation, and homology detection. Their ability to model complex …
View article: Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction
Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction Open
Retrieving homologous protein sequences is essential for a broad range of protein modeling tasks such as fitness prediction, protein design, structure modeling, and protein-protein interactions. Traditional workflows have relied on a two-s…
View article: Computationally designed proteins mimic antibody immune evasion in viral evolution
Computationally designed proteins mimic antibody immune evasion in viral evolution Open
Recurrent waves of viral infection necessitate vaccines and therapeutics that remain effective against emerging viruses. Our ability to evaluate interventions is currently limited to assessments against past or circulating variants, which …
View article: Large-scale discovery, analysis, and design of protein energy landscapes
Large-scale discovery, analysis, and design of protein energy landscapes Open
All folded proteins continuously fluctuate between their low-energy native structures and higher energy conformations that can be partially or fully unfolded. These rare states influence protein function, interactions, aggregation, and imm…
View article: Epigenomic insights into extreme longevity in the world’s oldest terrestrial animal, Jonathan
Epigenomic insights into extreme longevity in the world’s oldest terrestrial animal, Jonathan Open
Giant tortoises exhibit exceptional longevity, often exceeding the human lifespan. To understand the genomic and epigenomic basis of their longevity, awe analyzed the DNA sequence and methylome of Jonathan, an Aldabra giant tortoise (Aldab…
View article: Multi-megabase scale genome interpretation with genetic language models
Multi-megabase scale genome interpretation with genetic language models Open
Understanding how molecular changes caused by genetic variation drive disease risk is crucial for deciphering disease mechanisms. However, interpreting genome sequences is challenging because of the vast size of the human genome, and becau…
View article: A Genomic Language Model for Zero-Shot Prediction of Promoter Variant Effects
A Genomic Language Model for Zero-Shot Prediction of Promoter Variant Effects Open
Disease-associated genetic variants occur extensively in noncoding regions like promoters, but current methods focus primarily on single nucleotide variants (SNVs) that typically have small regulatory effect sizes. Expanding beyond single …
View article: Evolutionary-Scale Enzymology Enables Biochemical Constant Prediction Across a Multi-Peaked Catalytic Landscape
Evolutionary-Scale Enzymology Enables Biochemical Constant Prediction Across a Multi-Peaked Catalytic Landscape Open
Quantitatively mapping enzyme sequence-catalysis landscapes remains a critical challenge in understanding enzyme function, evolution, and design. Here, we expand an emerging microfluidic platform to measure catalytic constants— k cat and K…
View article: Multi-Scale Representation Learning for Protein Fitness Prediction
Multi-Scale Representation Learning for Protein Fitness Prediction Open
Designing novel functional proteins crucially depends on accurately modeling their fitness landscape. Given the limited availability of functional annotations from wet-lab experiments, previous methods have primarily relied on self-supervi…
View article: ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction
ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction Open
Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite …
View article: DiscoBAX: Discovery of Optimal Intervention Sets in Genomic Experiment Design
DiscoBAX: Discovery of Optimal Intervention Sets in Genomic Experiment Design Open
The discovery of therapeutics to treat genetically-driven pathologies relies on identifying genes involved in the underlying disease mechanisms. Existing approaches search over the billions of potential interventions to maximize the expect…
View article: ProteinNPT: Improving Protein Property Prediction and Design with Non-Parametric Transformers
ProteinNPT: Improving Protein Property Prediction and Design with Non-Parametric Transformers Open
Protein design holds immense potential for optimizing naturally occurring proteins, with broad applications in drug discovery, material design, and sustainability. How-ever, computational methods for protein engineering are confronted with…
View article: DiscoBAX_GeneDisco_datasets
DiscoBAX_GeneDisco_datasets Open
This file contains the GeneDisco datasets and clusters used in the DiscoBAX paper experiments.
View article: DiscoBAX_GeneDisco_datasets
DiscoBAX_GeneDisco_datasets Open
This file contains the GeneDisco datasets and clusters used in the DiscoBAX paper experiments.
View article: Protein design for evaluating vaccines against future viral variation
Protein design for evaluating vaccines against future viral variation Open
Recurrent waves of SARS-CoV-2 infection, driven by the periodic emergence of new viral variants, highlight the need for vaccines and therapeutics that remain effective against future strains. Yet, our ability to proactively evaluate such t…
View article: The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data
The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data Open
In drug discovery, mapping interactions between genes within cellular systems is a crucial early step. Such maps are not only foundational for understanding the molecular mechanisms underlying disease biology but also pivotal for formulati…
View article: TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction
TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction Open
Modeling the fitness landscape of protein sequences has historically relied on training models on family-specific sets of homologous sequences called Multiple Sequence Alignments. Many proteins are however difficult to align or have shallo…
View article: Learning from pre-pandemic data to forecast viral escape
Learning from pre-pandemic data to forecast viral escape Open
Summary Effective pandemic preparedness relies on anticipating viral mutations that are able to evade host immune responses in order to facilitate vaccine and therapeutic design. However, current strategies for viral evolution prediction a…
View article: Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval
Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval Open
The ability to accurately model the fitness landscape of protein sequences is critical to a wide range of applications, from quantifying the effects of human variants on disease likelihood, to predicting immune-escape mutations in viruses …
View article: RITA: a Study on Scaling Up Generative Protein Sequence Models
RITA: a Study on Scaling Up Generative Protein Sequence Models Open
In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database. Such generative model…
View article: Mixtures of large-scale dynamic functional brain network modes
Mixtures of large-scale dynamic functional brain network modes Open
Accurate temporal modelling of functional brain networks is essential in the quest for understanding how such networks facilitate cognition. Researchers are beginning to adopt time-varying analyses for electrophysiological data that captur…
View article: GeneDisco: A Benchmark for Experimental Design in Drug Discovery
GeneDisco: A Benchmark for Experimental Design in Drug Discovery Open
In vitro cellular experimentation with genetic interventions, using for example CRISPR technologies, is an essential step in early-stage drug discovery and target validation that serves to assess initial hypotheses about causal association…
View article: OATML-Markslab/EVE: OATML-Markslab release
OATML-Markslab/EVE: OATML-Markslab release Open
OATML-Markslab release - Sep 2nd, 2021
View article: Improving black-box optimization in VAE latent space using decoder uncertainty
Improving black-box optimization in VAE latent space using decoder uncertainty Open
Optimization in the latent space of variational autoencoders is a promising approach to generate high-dimensional discrete objects that maximize an expensive black-box property (e.g., drug-likeness in molecular generation, function approxi…
View article: Large-scale clinical interpretation of genetic variants using evolutionary data and deep learning
Large-scale clinical interpretation of genetic variants using evolutionary data and deep learning Open
Quantifying the pathogenicity of protein variants in human disease-related genes would have a profound impact on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences 1–3 . In princi…
View article: Improving compute efficacy frontiers with SliceOut
Improving compute efficacy frontiers with SliceOut Open
Pushing forward the compute efficacy frontier in deep learning is critical for tasks that require frequent model re-training or workloads that entail training a large number of models. We introduce SliceOut -- a dropout-inspired scheme des…
View article: SliceOut: Training Transformers and CNNs faster while using less memory.
SliceOut: Training Transformers and CNNs faster while using less memory. Open
We demonstrate 10-40% speedups and memory reduction with Wide ResNets, EfficientNets, and Transformer models, with minimal to no loss in accuracy, using SliceOut---a new dropout scheme designed to take advantage of GPU memory layout. By dr…