Zeming Lin
YOU?
Author Swipe
Simulating 500 million years of evolution with a language model Open
More than three billion years of evolution have produced an image of biology encoded into the space of natural proteins. Here we show that language models trained on tokens generated by evolution can act as evolutionary simulators to gener…
Systematic identification and validation of the reference genes from 447 transcriptome datasets of moso bamboo (Phyllostachys edulis) Open
Bamboo was one of the first plants to be cultivated in China and is widely used in industry and daily life. The study of gene function has become an important part of bamboo breeding, whereas quantitative real-time PCR (qRT-PCR) is a power…
Evolutionary relationship of moso bamboo forms and a multihormone regulatory cascade involving culm shape variation Open
Summary Moso bamboo ( Phyllostachys edulis ) known as Mao Zhu (MZ) in Chinese exhibits various forms with distinct morphological characteristics. However, the evolutionary relationship among MZ forms and the mechanisms of culm shape variat…
View article: A bamboo ‘<scp>PeSAPK4‐PeMYB99‐</scp><i>PeTIP4‐3</i>’ regulatory model involved in water transport
A bamboo ‘<span>PeSAPK4‐PeMYB99‐</span><i>PeTIP4‐3</i>’ regulatory model involved in water transport Open
Summary Water plays crucial roles in expeditious growth and osmotic stress of bamboo. Nevertheless, the molecular mechanism of water transport remains unclear. In this study, an aquaporin gene, PeTIP4‐3 , was identified through a joint ana…
View article: Evolutionary-scale prediction of atomic-level protein structure with a language model
Evolutionary-scale prediction of atomic-level protein structure with a language model Open
Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a …
A high-level programming language for generative protein design Open
Combining a basic set of building blocks into more complex forms is a universal design principle. Most protein designs have proceeded from a manual bottom-up approach using parts created by nature, but top-down design of proteins is fundam…
View article: ESM Atlas v0 random sample of high confidence predicted protein structures
ESM Atlas v0 random sample of high confidence predicted protein structures Open
A random sample out of the 225M high confidence predictions in the ESM Atlas v0 dataset introduced in "Evolutionary-scale prediction of atomic level protein structure with a language model.". All predictions can be accessed in the ESM Meta…
View article: ESM Atlas v0 representative random sample of predicted protein structures
ESM Atlas v0 representative random sample of predicted protein structures Open
A representative random sample of the ESM Atlas v0 dataset introduced in "Evolutionary-scale prediction of atomic level protein structure with a language model.". All predictions can be accessed in the ESM Metagenomic Atlas (https://esmatl…
View article: ESM Atlas v0 random sample of high confidence predicted protein structures
ESM Atlas v0 random sample of high confidence predicted protein structures Open
A random sample out of the 225M high confidence predictions in the ESM Atlas v0 dataset introduced in "Evolutionary-scale prediction of atomic level protein structure with a language model.". All predictions can be accessed in the ESM Meta…
View article: ESM Atlas v0 representative random sample of predicted protein structures
ESM Atlas v0 representative random sample of predicted protein structures Open
A representative random sample of the ESM Atlas v0 dataset introduced in "Evolutionary-scale prediction of atomic level protein structure with a language model.". All predictions can be accessed in the ESM Metagenomic Atlas (https://esmatl…
View article: Evolutionary-scale prediction of atomic level protein structure with a language model
Evolutionary-scale prediction of atomic level protein structure with a language model Open
Artificial intelligence has the potential to open insight into the structure of proteins at the scale of evolution. It has only recently been possible to extend protein structure prediction to two hundred million cataloged proteins. Charac…
View article: Learning inverse folding from millions of predicted structures
Learning inverse folding from millions of predicted structures Open
We consider the problem of predicting a protein sequence from its backbone atom coordinates. Machine learning approaches to this problem to date have been limited by the number of available experimentally determined protein structures. We …
STARDATA: A StarCraft AI Research Dataset Open
We release a dataset of 65646 StarCraft replays that contains 1535 million frames and 496 million player actions. We provide full game state data along with the original replays that can be viewed in StarCraft. The game state data was reco…
Neural Potts Model Open
A bstract We propose the Neural Potts Model objective as an amortized optimization problem. The objective enables training a single model with shared parameters to explicitly model energy landscapes across multiple protein families. Given …
View article: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences Open
Significance Learning biological properties from sequence data is a logical step toward generative and predictive artificial intelligence for biology. Here, we propose scaling a deep contextual language model with unsupervised learning to …
View article: Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching Open
Bienvenidos to the proceedings of the fifth edition of the workshop on computational approaches for linguistic code-switching (CALCS-2021)!Code-switching is this very interesting phenomenon where multilingual speakers communicate by moving…
View article: PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library Open
Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style …
Growing Action Spaces Open
In complex tasks, such as those with large combinatorial action spaces, random exploration may be too inefficient to achieve meaningful learning progress. In this work, we use a curriculum of progressively growing action spaces to accelera…
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences Open
In the field of artificial intelligence, a combination of scale in data and model capacity enabled by un-supervised learning has led to major advances in representation learning and statistical generation. In the life sciences, the anticip…
Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger Open
We formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games. We propose to employ encoder-decoder neural networks for this task, and i…
Value Propagation Networks Open
We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to general…
Value Propagation Networks. Open
We present Value Propagation (VProp), a parameter-efficient differentiable planning module built on Value Iteration which can successfully be trained in a reinforcement learning fashion to solve unseen tasks, has the capability to generali…
An Analysis of Model-Based Heuristic Search Techniques for StarCraft Combat Scenarios Open
Real-Time Strategy games have become a popular test-bed for modern AI system due to their real-time computational constraints, complex multi-unit control problems, and imperfect information. One of the most important aspects of any RTS AI …
STARDATA: A StarCraft AI Research Dataset Open
We release a dataset of 65646 StarCraft replays that contains 1535 million frames and 496 million player actions. We provide full game state data along with the original replays that can be viewed in StarCraft. The game state data was reco…
Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play Open
We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and t…
DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples Open
Recent studies have shown that deep neural networks (DNN) are vulnerable to adversarial samples: maliciously-perturbed samples crafted to yield incorrect model outputs. Such attacks can severely undermine DNN systems, particularly in secur…
TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games Open
We present TorchCraft, a library that enables deep learning research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by making it easier to control these games from a machine learning framework, here Torch. This white paper…
Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks Open
We consider scenarios from the real-time strategy game StarCraft as new benchmarks for reinforcement learning algorithms. We propose micromanagement tasks, which present the problem of the short-term, low-level control of army members duri…
MUST-CNN: A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-based Protein Structure Prediction Open
Predicting protein properties such as solvent accessibility and secondary structure from its primary amino acid sequence is an important task in bioinformatics. Recently, a few deep learning models have surpassed the traditional window bas…
Deep Motif: Visualizing Genomic Sequence Classifications Open
This paper applies a deep convolutional/highway MLP framework to classify genomic sequences on the transcription factor binding site task. To make the model understandable, we propose an optimization driven strategy to extract "motifs", or…