A Large-Scale Foundation Model for RNA Enables Diverse Function and Structure Prediction Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.21203/rs.3.rs-6445344/v1
Accurately predicting RNA structures and functions from nucleotide sequences, or conversely, designing sequences to meet structural and functional requirements, remains a fundamental challenge in RNA biology, largely due to limited annotated data and the poor efficiency of \textit{ab initio} modeling approaches. Here, we introduce AIDO.RNA, a large-scale RNA foundation model that leverages self-supervised pre-training to learn general and effective RNA representations, which can be transferred to tackle a wide range of RNA prediction and design tasks. AIDO.RNA is a 1.6-billion-parameter transformer-based language model, pre-trained on 42 million non-coding RNA (ncRNA) sequences at single-nucleotide resolution. It can be adapted to achieve state-of-the-art performance on 26 out of 28 diverse tasks, including RNA structure and function prediction, mRNA expression modeling, multi-modal RNA isoform expression prediction, and RNA inverse folding, demonstrating its effectiveness and versatility across the board. We find that beyond excelling in ncRNA-related tasks that directly reside in the pre-training data space, AIDO.RNA can be efficiently adapted to new domains with continued domain-specific pre-training to generalize toward untranslated regions and coding regions of mRNA, suggesting a promising pathway to continue to level up biological foundation models in general. We make AIDO.RNA open source and release the utility of the model in AIDO.ModelGenerator, a Python package enabling easy reproduction, application, and extension of our results.
Related Topics To Compare & Contrast
- Type
- preprint
- Language
- en
- Landing Page
- https://doi.org/10.21203/rs.3.rs-6445344/v1
- https://www.researchsquare.com/article/rs-6445344/latest.pdf
- OA Status
- gold
- Cited By
- 3
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4410165516