Nature Communications • Vol 13 • No 1
Deciphering microbial gene function using natural language processing
September 2022 • Danielle Miller, Adi Stern, David Burstein
Abstract Revealing the function of uncharacterized genes is a fundamental challenge in an era of ever-increasing volumes of sequencing data. Here, we present a concept for tackling this challenge using deep learning methodologies adopted from natural language processing (NLP). We repurpose NLP algorithms to model “gene semantics” based on a biological corpus of more than 360 million microbial genes within their genomic context. We use the language models to predict functional categories for 56,617 genes and find t…