Nathan C. Frey
YOU?
Author Swipe
View article: Deep Evolutionary Fitness Inference for Variant Nomination from Directed Evolution
Deep Evolutionary Fitness Inference for Variant Nomination from Directed Evolution Open
Iterative screening techniques, such as directed evolution, enable high-throughput affinity maturation to optimize binders to molecular interfaces. However, the decision problem of selecting variants from rich, evolved populations to enter…
View article: Tokenized and continuous embedding compressions of protein sequence and structure
Tokenized and continuous embedding compressions of protein sequence and structure Open
View article: llome_ehrlich_benchmark_data_package
llome_ehrlich_benchmark_data_package Open
Although large language models (LLMs) have shown promise in biomolecule optimization problems, they incur heavy computational costs and struggle to satisfy precise constraints. On the other hand, specialized solvers like LaMBO-2 offer effi…
View article: Lab-in-the-loop therapeutic antibody design with deep learning
Lab-in-the-loop therapeutic antibody design with deep learning Open
Therapeutic antibody design is a complex multi-property optimization problem with substantial promise for improvement with the application of machine-learning methods. Towards realizing that promise, we introduce “Lab-in-the-loop,” a new a…
View article: DyAb: sequence-based antibody design and property prediction in a low-data regime
DyAb: sequence-based antibody design and property prediction in a low-data regime Open
Protein therapeutic design and property prediction are frequently hampered by data scarcity. Here we propose a new model, DyAb, that addresses these issues by leveraging a pair-wise representation to predict differences in protein properti…
View article: All-Atom Protein Generation with Latent Diffusion
All-Atom Protein Generation with Latent Diffusion Open
While generative models hold immense promise for protein design, existing models are typically backbone-only, despite the indispensable role that sidechain atoms play in mediating function. As prerequisite knowledge, all-atom 3D structure …
View article: Concept Bottleneck Language Models For protein design
Concept Bottleneck Language Models For protein design Open
We introduce Concept Bottleneck Protein Language Models (CB-pLM), a generative masked language model with a layer where each neuron corresponds to an interpretable concept. Our architecture offers three key benefits: i) Control: We can int…
View article: Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure
Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure Open
Existing protein machine learning representations typically model either the sequence or structure distribution, with the other modality implicit. The latent space of sequence-to-structure prediction models such as ESMFold represents the j…
View article: Antibody DomainBed: Out-of-Distribution Generalization in Therapeutic Protein Design
Antibody DomainBed: Out-of-Distribution Generalization in Therapeutic Protein Design Open
Machine learning (ML) has demonstrated significant promise in accelerating drug design. Active ML-guided optimization of therapeutic molecules typically relies on a surrogate model predicting the target property of interest. The model pred…
View article: Closed-Form Test Functions for Biophysical Sequence Optimization Algorithms
Closed-Form Test Functions for Biophysical Sequence Optimization Algorithms Open
There is a growing body of work seeking to replicate the success of machine learning (ML) on domains like computer vision (CV) and natural language processing (NLP) to applications involving biophysical data. One of the key ingredients of …
View article: Cramming Protein Language Model Training in 24 GPU Hours
Cramming Protein Language Model Training in 24 GPU Hours Open
Protein language models (pLMs) are ubiquitous across biological machine learning research, but state-of-the-art models like ESM2 take hundreds of thousands of GPU hours to pre-train on the vast protein universe. Resource requirements for s…
View article: Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure
Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure Open
View article: Neural scaling of deep chemical models
Neural scaling of deep chemical models Open
Massive scale, in terms of both data availability and computation, enables important breakthroughs in key application areas of deep learning such as natural language processing and computer vision. There is emerging evidence that scale may…
View article: Synthesis of Mo4VAlC4 MAX Phase and Two-Dimensional Mo4VC4 MXene with Five Atomic Layers of Transition Metals
Synthesis of Mo4VAlC4 MAX Phase and Two-Dimensional Mo4VC4 MXene with Five Atomic Layers of Transition Metals Open
MXenes are a family of two-dimensional (2D) transition\nmetal carbides,\nnitrides, and carbonitrides with a general formula of Mn+1XnTx, in which two, three, or four atomic layers of a …
View article: Protein Discovery with Discrete Walk-Jump Sampling
Protein Discovery with Discrete Walk-Jump Sampling Open
We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the tr…
View article: Protein Design with Guided Discrete Diffusion
Protein Design with Guided Discrete Diffusion Open
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with …
View article: SupSiam: Non-contrastive Auxiliary Loss for Learning from Molecular Conformers
SupSiam: Non-contrastive Auxiliary Loss for Learning from Molecular Conformers Open
We investigate Siamese networks for learning related embeddings for augmented samples of molecular conformers. We find that a non-contrastive (positive-pair only) auxiliary task aids in supervised training of Euclidean neural networks (E3N…
View article: Graph Contrastive Learning for Materials
Graph Contrastive Learning for Materials Open
Recent work has shown the potential of graph neural networks to efficiently predict material properties, enabling high-throughput screening of materials. Training these models, however, often requires large quantities of labelled data, obt…
View article: Efficient catalyst screening using graph neural networks to predict strain effects on adsorption energy
Efficient catalyst screening using graph neural networks to predict strain effects on adsorption energy Open
Small-molecule adsorption energies correlate with energy barriers of catalyzed intermediate reaction steps, determining the dominant microkinetic mechanism. Straining the catalyst can alter adsorption energies and break scaling relationshi…
View article: A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences
A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences Open
Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in a…
View article: EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation
EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation Open
Designing proteins to achieve specific functions often requires in silico modeling of their properties at high throughput scale and can significantly benefit from fast and accurate protein structure prediction. We introduce EquiFold, a new…
View article: Roughness of molecular property landscapes and its impact on modellability
Roughness of molecular property landscapes and its impact on modellability Open
In molecular discovery and drug design, structure-property relationships and activity landscapes are often qualitatively or quantitatively analyzed to guide the navigation of chemical space. The roughness (or smoothness) of these molecular…
View article: Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models
Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models Open
The energy requirements of current natural language processing models continue to grow at a rapid, unsustainable pace. Recent works highlighting this problem conclude there is an urgent need for methods that reduce the energy needs of NLP …
View article: Neural Scaling of Deep Chemical Models
Neural Scaling of Deep Chemical Models Open
Massive scale, both in terms of data availability and computation, enables significant breakthroughs in key application areas of deep learning such as natural language processing (NLP) and computer vision. There is emerging evidence that s…
View article: A Green(er) World for A.I.
A Green(er) World for A.I. Open
As research and practice in artificial intelligence (A.I.) grow in leaps and bounds, the resources necessary to sustain and support their operations also grow at an increasing pace. While innovations and applications from A.I. have brought…
View article: The MIT Supercloud Workload Classification Challenge
The MIT Supercloud Workload Classification Challenge Open
High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly larg…
View article: SELFIES and the future of molecular string representations
SELFIES and the future of molecular string representations Open
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction…
View article: Predicting Surface Strain Effects on Adsorption Energy with Graph Neural Networks
Predicting Surface Strain Effects on Adsorption Energy with Graph Neural Networks Open
Modifying the adsorption energies of reaction intermediates on different material surfaces can significantly improve heterogeneous catalysis by reducing energy barriers for intermediate elementary reaction steps. Surface strain can increas…
View article: FastFlows: Flow-Based Models for Molecular Graph Generation
FastFlows: Flow-Based Models for Molecular Graph Generation Open
We propose a framework using normalizing-flow based models, SELF-Referencing Embedded Strings, and multi-objective optimization that efficiently generates small molecules. With an initial training set of only 100 small molecules, FastFlows…
View article: Benchmarking Resource Usage for Efficient Distributed Deep Learning
Benchmarking Resource Usage for Efficient Distributed Deep Learning Open
Deep learning (DL) workflows demand an ever-increasing budget of compute and energy in order to achieve outsized gains. Neural architecture searches, hyperparameter sweeps, and rapid prototyping consume immense resources that can prevent r…