arXiv (Cornell University)
Long-range gene expression prediction with token alignment of large language model
October 2024 • E Honig, Huixin Zhan, Ying Wu, Zhe Zhang
Gene expression is a cellular process that plays a fundamental role in human phenotypical variations and diseases. Despite advances of deep learning models for gene expression prediction, recent benchmarks have revealed their inability to learn distal regulatory grammar. Here, we address this challenge by leveraging a pretrained large language model to enhance gene expression prediction. We introduce Genetic sequence Token Alignment (GTA), which aligns genetic sequence features with natural language tokens, allowi…