Robert Verkuil
YOU?
Author Swipe
Simulating 500 million years of evolution with a language model Open
More than three billion years of evolution have produced an image of biology encoded into the space of natural proteins. Here we show that language models trained on tokens generated by evolution can act as evolutionary simulators to gener…
View article: Evolutionary-scale prediction of atomic-level protein structure with a language model
Evolutionary-scale prediction of atomic-level protein structure with a language model Open
Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a …
Language models generalize beyond natural proteins Open
Learning the design patterns of proteins from sequences across evolution may have promise toward generative protein design. However it is unknown whether language models, trained on sequences of natural proteins, will be capable of more th…
View article: ESM Atlas v0 random sample of high confidence predicted protein structures
ESM Atlas v0 random sample of high confidence predicted protein structures Open
A random sample out of the 225M high confidence predictions in the ESM Atlas v0 dataset introduced in "Evolutionary-scale prediction of atomic level protein structure with a language model.". All predictions can be accessed in the ESM Meta…
View article: ESM Atlas v0 representative random sample of predicted protein structures
ESM Atlas v0 representative random sample of predicted protein structures Open
A representative random sample of the ESM Atlas v0 dataset introduced in "Evolutionary-scale prediction of atomic level protein structure with a language model.". All predictions can be accessed in the ESM Metagenomic Atlas (https://esmatl…
View article: ESM Atlas v0 random sample of high confidence predicted protein structures
ESM Atlas v0 random sample of high confidence predicted protein structures Open
A random sample out of the 225M high confidence predictions in the ESM Atlas v0 dataset introduced in "Evolutionary-scale prediction of atomic level protein structure with a language model.". All predictions can be accessed in the ESM Meta…
View article: ESM Atlas v0 representative random sample of predicted protein structures
ESM Atlas v0 representative random sample of predicted protein structures Open
A representative random sample of the ESM Atlas v0 dataset introduced in "Evolutionary-scale prediction of atomic level protein structure with a language model.". All predictions can be accessed in the ESM Metagenomic Atlas (https://esmatl…
View article: Evolutionary-scale prediction of atomic level protein structure with a language model
Evolutionary-scale prediction of atomic level protein structure with a language model Open
Artificial intelligence has the potential to open insight into the structure of proteins at the scale of evolution. It has only recently been possible to extend protein structure prediction to two hundred million cataloged proteins. Charac…
View article: Learning inverse folding from millions of predicted structures
Learning inverse folding from millions of predicted structures Open
We consider the problem of predicting a protein sequence from its backbone atom coordinates. Machine learning approaches to this problem to date have been limited by the number of available experimentally determined protein structures. We …
Language models enable zero-shot prediction of the effects of mutations on protein function Open
Modeling the effect of sequence variation on function is a fundamental problem for understanding and designing proteins. Since evolution encodes information about function into patterns in protein sequences, unsupervised models of variant …
Neural Potts Model Open
A bstract We propose the Neural Potts Model objective as an amortized optimization problem. The objective enables training a single model with shared parameters to explicitly model energy landscapes across multiple protein families. Given …
MSA Transformer Open
Unsupervised protein language models trained across millions of diverse sequences learn structure and function of proteins. Protein language models studied to date have been trained to perform inference from individual sequences. The longs…
Applicability of deep learning approaches to non-convex optimization for trajectory-based policy search Open
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019