ProtNote: a multimodal method for protein–function annotation Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.1093/bioinformatics/btaf170
· OA: W4409525624
Motivation Understanding the protein sequence–function relationship is essential for advancing protein biology and engineering. However, <1% of known protein sequences have human-verified functions. While deep-learning methods have demonstrated promise for protein–function prediction, current models are limited to predicting only those functions on which they were trained. Results Here, we introduce ProtNote, a multimodal deep-learning model that leverages free-form text to enable both supervised and zero-shot protein–function prediction. ProtNote not only maintains near state-of-the-art performance for annotations in its training set but also generalizes to unseen and novel functions in zero-shot test settings. ProtNote demonstrates superior performance in the prediction of novel Gene Ontology annotations and Enzyme Commission numbers compared to baseline models by capturing nuanced sequence–function relationships that unlock a range of biological use cases inaccessible to prior models. We envision that ProtNote will enhance protein–function discovery by enabling scientists to use free text inputs without restriction to predefined labels—a necessary capability for navigating the dynamic landscape of protein biology. Availability and Implementation The code is available on GitHub: https://github.com/microsoft/protnote; model weights, datasets, and evaluation metrics are provided via Zenodo: https://zenodo.org/records/13897920.