Large language models identify causal genes in complex trait GWAS Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.7490/f1000research.1120394.1
Pinpointing causal genes at genome-wide association study (GWAS) loci remains a major bottleneck. Existing literature-mining approaches are often limited in accuracy and scalability. We show that large language models (LLMs) can accurately prioritize likely causal genes at GWAS loci. We systematically evaluated several widely available general-purpose LLMs against benchmark datasets of high-confidence causal genes, including a unique set from 23 unpublished GWAS. Our results demonstrate that LLMs outperform or match current state-of-the-art methods and, crucially, exhibit robust performance on novel loci not previously linked to traits, underscoring their generalizability. Moreover, when integrated with existing methods, LLMs substantially enhance overall performance. This work establishes LLMs as an accurate, scalable, and broadly generalizable approach to accelerate causal gene identification in complex traits.
Related Topics
- Type
- article
- Landing Page
- https://doi.org/10.7490/f1000research.1120394.1
- OA Status
- gold
- OpenAlex ID
- https://openalex.org/W4416409711
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416409711Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.7490/f1000research.1120394.1Digital Object Identifier
- Title
-
Large language models identify causal genes in complex trait GWASWork title
- Type
-
articleOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-11-16Full publication date if available
- Authors
-
Wei Wang, Sotiris Karagounis, Xin Wang, Anna C. Reisetter, Adam AutonList of authors in order
- Landing page
-
https://doi.org/10.7490/f1000research.1120394.1Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.7490/f1000research.1120394.1Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416409711 |
|---|---|
| doi | https://doi.org/10.7490/f1000research.1120394.1 |
| ids.doi | https://doi.org/10.7490/f1000research.1120394.1 |
| ids.openalex | https://openalex.org/W4416409711 |
| fwci | |
| type | article |
| title | Large language models identify causal genes in complex trait GWAS |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | |
| locations[0].id | doi:10.7490/f1000research.1120394.1 |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | acceptedVersion |
| locations[0].raw_type | posted-content |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.7490/f1000research.1120394.1 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5100392302 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4377-5060 |
| authorships[0].author.display_name | Wei Wang |
| authorships[0].author_position | middle |
| authorships[0].raw_author_name | Wei Wang |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5098978394 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Sotiris Karagounis |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Sotiris Karagounis |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5061307983 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-7242-357X |
| authorships[2].author.display_name | Xin Wang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Xin Wang |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5006410381 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-8332-4585 |
| authorships[3].author.display_name | Anna C. Reisetter |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Anna Reisetter |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5047245152 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1630-1225 |
| authorships[4].author.display_name | Adam Auton |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Adam Auton |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.7490/f1000research.1120394.1 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-11-20T00:00:00 |
| display_name | Large language models identify causal genes in complex trait GWAS |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T13:21:21.559545 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.7490/f1000research.1120394.1 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | acceptedVersion |
| best_oa_location.raw_type | posted-content |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.7490/f1000research.1120394.1 |
| primary_location.id | doi:10.7490/f1000research.1120394.1 |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | acceptedVersion |
| primary_location.raw_type | posted-content |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.7490/f1000research.1120394.1 |
| publication_date | 2025-11-16 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 10, 55 |
| abstract_inverted_index.23 | 59 |
| abstract_inverted_index.We | 23, 39 |
| abstract_inverted_index.an | 105 |
| abstract_inverted_index.as | 104 |
| abstract_inverted_index.at | 3, 36 |
| abstract_inverted_index.in | 19, 117 |
| abstract_inverted_index.of | 50 |
| abstract_inverted_index.on | 78 |
| abstract_inverted_index.or | 68 |
| abstract_inverted_index.to | 84, 112 |
| abstract_inverted_index.Our | 62 |
| abstract_inverted_index.and | 21, 108 |
| abstract_inverted_index.are | 16 |
| abstract_inverted_index.can | 30 |
| abstract_inverted_index.not | 81 |
| abstract_inverted_index.set | 57 |
| abstract_inverted_index.GWAS | 37 |
| abstract_inverted_index.LLMs | 46, 66, 95, 103 |
| abstract_inverted_index.This | 100 |
| abstract_inverted_index.and, | 73 |
| abstract_inverted_index.from | 58 |
| abstract_inverted_index.gene | 115 |
| abstract_inverted_index.loci | 8, 80 |
| abstract_inverted_index.show | 24 |
| abstract_inverted_index.that | 25, 65 |
| abstract_inverted_index.when | 90 |
| abstract_inverted_index.with | 92 |
| abstract_inverted_index.work | 101 |
| abstract_inverted_index.GWAS. | 61 |
| abstract_inverted_index.genes | 2, 35 |
| abstract_inverted_index.large | 26 |
| abstract_inverted_index.loci. | 38 |
| abstract_inverted_index.major | 11 |
| abstract_inverted_index.match | 69 |
| abstract_inverted_index.novel | 79 |
| abstract_inverted_index.often | 17 |
| abstract_inverted_index.study | 6 |
| abstract_inverted_index.their | 87 |
| abstract_inverted_index.(GWAS) | 7 |
| abstract_inverted_index.(LLMs) | 29 |
| abstract_inverted_index.causal | 1, 34, 52, 114 |
| abstract_inverted_index.genes, | 53 |
| abstract_inverted_index.likely | 33 |
| abstract_inverted_index.linked | 83 |
| abstract_inverted_index.models | 28 |
| abstract_inverted_index.robust | 76 |
| abstract_inverted_index.unique | 56 |
| abstract_inverted_index.widely | 43 |
| abstract_inverted_index.against | 47 |
| abstract_inverted_index.broadly | 109 |
| abstract_inverted_index.complex | 118 |
| abstract_inverted_index.current | 70 |
| abstract_inverted_index.enhance | 97 |
| abstract_inverted_index.exhibit | 75 |
| abstract_inverted_index.limited | 18 |
| abstract_inverted_index.methods | 72 |
| abstract_inverted_index.overall | 98 |
| abstract_inverted_index.remains | 9 |
| abstract_inverted_index.results | 63 |
| abstract_inverted_index.several | 42 |
| abstract_inverted_index.traits, | 85 |
| abstract_inverted_index.Existing | 13 |
| abstract_inverted_index.accuracy | 20 |
| abstract_inverted_index.approach | 111 |
| abstract_inverted_index.datasets | 49 |
| abstract_inverted_index.existing | 93 |
| abstract_inverted_index.language | 27 |
| abstract_inverted_index.methods, | 94 |
| abstract_inverted_index.Moreover, | 89 |
| abstract_inverted_index.accurate, | 106 |
| abstract_inverted_index.available | 44 |
| abstract_inverted_index.benchmark | 48 |
| abstract_inverted_index.evaluated | 41 |
| abstract_inverted_index.including | 54 |
| abstract_inverted_index.scalable, | 107 |
| abstract_inverted_index.accelerate | 113 |
| abstract_inverted_index.accurately | 31 |
| abstract_inverted_index.approaches | 15 |
| abstract_inverted_index.crucially, | 74 |
| abstract_inverted_index.integrated | 91 |
| abstract_inverted_index.outperform | 67 |
| abstract_inverted_index.previously | 82 |
| abstract_inverted_index.prioritize | 32 |
| abstract_inverted_index.association | 5 |
| abstract_inverted_index.bottleneck. | 12 |
| abstract_inverted_index.demonstrate | 64 |
| abstract_inverted_index.establishes | 102 |
| abstract_inverted_index.genome-wide | 4 |
| abstract_inverted_index.performance | 77 |
| abstract_inverted_index.unpublished | 60 |
| abstract_inverted_index.performance. | 99 |
| abstract_inverted_index.scalability. | 22 |
| abstract_inverted_index.underscoring | 86 |
| abstract_inverted_index.generalizable | 110 |
| abstract_inverted_index.substantially | 96 |
| abstract_inverted_index.identification | 116 |
| abstract_inverted_index.systematically | 40 |
| abstract_inverted_index.general-purpose | 45 |
| abstract_inverted_index.high-confidence | 51 |
| abstract_inverted_index.traits.</ns3:p> | 119 |
| abstract_inverted_index.state-of-the-art | 71 |
| abstract_inverted_index.generalizability. | 88 |
| abstract_inverted_index.literature-mining | 14 |
| abstract_inverted_index.<ns3:p>Pinpointing | 0 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |