Leveraging protein language models to identify complex trait associations with previously inaccessible classes of functional rare variants Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.1016/j.xgen.2025.101068
· OA: W4416372998
Protein language models (PLMs) improve variant effect predictions, but their role in gene discovery for complex traits remains unclear. We introduce an allelic series-based regression test that uses PLM-derived variant effect predictions as proxies for effect sizes, identifying ∼46% more associations than standard burden tests. Extending this to isoform-level analysis, we find 26 gene-trait pairs with stronger associations in non-canonical versus canonical transcripts, highlighting isoform-specific effects. Finally, we identify evolutionary plausible variants (EPVs), missense variants assigned higher likelihoods than the wild-type alleles by PLMs, representing 0.45% of missense variants. EPVs show higher allele frequencies than synonymous variants, consistent with differential selection pressures, and are linked to nine traits, including protective associations with low-density lipoprotein (LDL) and bone mineral density. Together, our results demonstrate how PLMs can enhance rare-variant interpretation and gene-trait association discovery in exome data.