Explanipedia

Deep Evolutionary Fitness Inference for Variant Nomination from Directed Evolution Open

Zuo‐Jun Max Shen, Nathaniel Diamant, Christina Helmling, Raymond Newland, Ziqing Lu , et al. · 2025

Iterative screening techniques, such as directed evolution, enable high-throughput affinity maturation to optimize binders to molecular interfaces. However, the decision problem of selecting variants from rich, evolved populations to enter…

Tokenized and continuous embedding compressions of protein sequence and structure Open

Amy X. Lu, Wilson Yan, Kevin Yang, Vladimir Gligorijević, Kyunghyun Cho , et al. · 2025

llome_ehrlich_benchmark_data_package Open

Angelica Chen, Samuel C. Stanton, Robert G. Alberstein, Andrew M. Watkins, Richard Bonneau , et al. · 2025

Although large language models (LLMs) have shown promise in biomolecule optimization problems, they incur heavy computational costs and struggle to satisfy precise constraints. On the other hand, specialized solvers like LaMBO-2 offer effi…

Lab-in-the-loop therapeutic antibody design with deep learning Open

Nathan C. Frey, Isidro Hötzel, Samuel Stanton, Ryan L. Kelly, Robert G. Alberstein , et al. · 2025

Therapeutic antibody design is a complex multi-property optimization problem with substantial promise for improvement with the application of machine-learning methods. Towards realizing that promise, we introduce “Lab-in-the-loop,” a new a…

DyAb: sequence-based antibody design and property prediction in a low-data regime Open

Joshua Yao-Yu Lin, Jennifer L. Hofmann, Andrew Leaver‐Fay, Wei‐Ching Liang, Stefania Vasilaki , et al. · 2025

Protein therapeutic design and property prediction are frequently hampered by data scarcity. Here we propose a new model, DyAb, that addresses these issues by leveraging a pair-wise representation to predict differences in protein properti…

All-Atom Protein Generation with Latent Diffusion Open

Amy X. Lu, Wilson Yan, Sarah A. Robinson, Kevin Yang, Vladimir Gligorijević , et al. · 2024

While generative models hold immense promise for protein design, existing models are typically backbone-only, despite the indispensable role that sidechain atoms play in mediating function. As prerequisite knowledge, all-atom 3D structure …

Concept Bottleneck Language Models For protein design Open

Aya Abdelsalam Ismail, Tuomas Oikarinen, A. Wang, Julius Adebayo, Samuel Stanton , et al. · 2024

We introduce Concept Bottleneck Protein Language Models (CB-pLM), a generative masked language model with a layer where each neuron corresponds to an interpretable concept. Our architecture offers three key benefits: i) Control: We can int…

Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure Open

Amy X. Lu, Wilson Yan, Kevin Yang, Vladimir Gligorijević, Kyunghyun Cho , et al. · 2024

Existing protein machine learning representations typically model either the sequence or structure distribution, with the other modality implicit. The latent space of sequence-to-structure prediction models such as ESMFold represents the j…

Antibody DomainBed: Out-of-Distribution Generalization in Therapeutic Protein Design Open

Nataša Tagasovska, Ji Won Park, Matthieu Kirchmeyer, Nathan C. Frey, Andrew M. Watkins , et al. · 2024

Machine learning (ML) has demonstrated significant promise in accelerating drug design. Active ML-guided optimization of therapeutic molecules typically relies on a surrogate model predicting the target property of interest. The model pred…

Closed-Form Test Functions for Biophysical Sequence Optimization Algorithms Open

Samuel C. Stanton, Robert Alberstein, Nathan C. Frey, Andrew M. Watkins, Kyunghyun Cho · 2024

There is a growing body of work seeking to replicate the success of machine learning (ML) on domains like computer vision (CV) and natural language processing (NLP) to applications involving biophysical data. One of the key ingredients of …

Cramming Protein Language Model Training in 24 GPU Hours Open

Nathan C. Frey, Taylor Joren, Aya Abdelsalam Ismail, Allen C. Goodman, Richard Bonneau , et al. · 2024

Protein language models (pLMs) are ubiquitous across biological machine learning research, but state-of-the-art models like ESM2 take hundreds of thousands of GPU hours to pre-train on the vast protein universe. Resource requirements for s…

Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure Open

Amy X. Lu, Wilson Yan, Kevin Yang, Vladimir Gligorijević, Kyunghyun Cho , et al. · 2024

Neural scaling of deep chemical models Open

Nathan C. Frey, Ryan Soklaski, Simon Axelrod, Siddharth Samsi, Rafael Gómez‐Bombarelli , et al. · 2023

Massive scale, in terms of both data availability and computation, enables important breakthroughs in key application areas of deep learning such as natural language processing and computer vision. There is emerging evidence that scale may…

Synthesis of Mo4VAlC4 MAX Phase and Two-Dimensional Mo4VC4 MXene with Five Atomic Layers of Transition Metals Open

Grayson Deysher, Christopher E. Shuck, Kanit Hantanasirisakul, Nathan C. Frey, Alexandre C. Foucher , et al. · 2023

MXenes are a family of two-dimensional (2D) transition\nmetal carbides,\nnitrides, and carbonitrides with a general formula of M_n+1X_nT_x, in which two, three, or four atomic layers of a …

Protein Discovery with Discrete Walk-Jump Sampling Open

Nathan C. Frey, Daniel Berenberg, Karina Zadorozhny, Joseph Kleinhenz, Julien Lafrance‐Vanasse , et al. · 2023

We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the tr…

Protein Design with Guided Discrete Diffusion Open

Nate Gruver, Samuel C. Stanton, Nathan C. Frey, Tim G. J. Rudner, Isidro Hötzel , et al. · 2023

A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with …

SupSiam: Non-contrastive Auxiliary Loss for Learning from Molecular Conformers Open

Michael Maser, Ji Won Park, Joshua Yao-Yu Lin, Jae Hyeon Lee, Nathan C. Frey , et al. · 2023

We investigate Siamese networks for learning related embeddings for augmented samples of molecular conformers. We find that a non-contrastive (positive-pair only) auxiliary task aids in supervised training of Euclidean neural networks (E3N…

Graph Contrastive Learning for Materials Open

Teddy Koker, Keegan Quigley, Will Spaeth, Nathan C. Frey, Lin Li · 2022

Recent work has shown the potential of graph neural networks to efficiently predict material properties, enabling high-throughput screening of materials. Training these models, however, often requires large quantities of labelled data, obt…

Efficient catalyst screening using graph neural networks to predict strain effects on adsorption energy Open

Christopher C. Price, Akash Singh, Nathan C. Frey, Vivek B. Shenoy · 2022

Small-molecule adsorption energies correlate with energy barriers of catalyzed intermediate reaction steps, determining the dominant microkinetic mechanism. Straining the catalyst can alter adsorption energies and break scaling relationshi…

A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences Open

Nataša Tagasovska, Nathan C. Frey, Andreas Loukas, Isidro Hötzel, Julien Lafrance‐Vanasse , et al. · 2022

Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in a…

EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation Open

Jae Hyeon Lee, Payman Yadollahpour, Andrew M. Watkins, Nathan C. Frey, Andrew Leaver‐Fay , et al. · 2022

Designing proteins to achieve specific functions often requires in silico modeling of their properties at high throughput scale and can significantly benefit from fast and accurate protein structure prediction. We introduce EquiFold, a new…

Roughness of molecular property landscapes and its impact on modellability Open

Matteo Aldeghi, David Graff, Nathan C. Frey, Joseph A. Morrone, Edward O. Pyzer‐Knapp , et al. · 2022

In molecular discovery and drug design, structure-property relationships and activity landscapes are often qualitatively or quantitatively analyzed to guide the navigation of chemical space. The roughness (or smoothness) of these molecular…

Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models Open

J.C. McDonald, Baolin Li, Nathan C. Frey, Devesh Tiwari, Vijay Gadepally , et al. · 2022

The energy requirements of current natural language processing models continue to grow at a rapid, unsustainable pace. Recent works highlighting this problem conclude there is an urgent need for methods that reduce the energy needs of NLP …

Neural Scaling of Deep Chemical Models Open

Nathan C. Frey, Ryan Soklaski, Simon Axelrod, Siddharth Samsi, Rafael Gómez‐Bombarelli , et al. · 2022

Massive scale, both in terms of data availability and computation, enables significant breakthroughs in key application areas of deep learning such as natural language processing (NLP) and computer vision. There is emerging evidence that s…

A Green(er) World for A.I. Open

Dan Zhao, Nathan C. Frey, Joseph McDonald, Matthew Hubbell, David Bestor , et al. · 2022

As research and practice in artificial intelligence (A.I.) grow in leaps and bounds, the resources necessary to sustain and support their operations also grow at an increasing pace. While innovations and applications from A.I. have brought…

The MIT Supercloud Workload Classification Challenge Open

Benny J. Tang, Qiqi Chen, Matthew L. Weiss, Nathan C. Frey, Joseph McDonald , et al. · 2022

High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly larg…

SELFIES and the future of molecular string representations Open

Mario Krenn, Qianxiang Ai, Senja Barthel, Nessa Carson, Angelo Frei , et al. · 2022

Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction…

Predicting Surface Strain Effects on Adsorption Energy with Graph Neural Networks Open

Christopher C. Price, Akash Singh, Nathan C. Frey, Vivek B. Shenoy · 2022

Modifying the adsorption energies of reaction intermediates on different material surfaces can significantly improve heterogeneous catalysis by reducing energy barriers for intermediate elementary reaction steps. Surface strain can increas…

FastFlows: Flow-Based Models for Molecular Graph Generation Open

Nathan C. Frey, Vijay Gadepally, Bharath Ramsundar · 2022

We propose a framework using normalizing-flow based models, SELF-Referencing Embedded Strings, and multi-objective optimization that efficiently generates small molecules. With an initial training set of only 100 small molecules, FastFlows…

Benchmarking Resource Usage for Efficient Distributed Deep Learning Open

Nathan C. Frey, Baolin Li, J.C. McDonald, Dan Zhao, Michael Jones , et al. · 2022

Deep learning (DL) workflows demand an ever-increasing budget of compute and energy in order to achieve outsized gains. Neural architecture searches, hyperparameter sweeps, and rapid prototyping consume immense resources that can prevent r…

Nathan C. Frey YOU? Author Swipe