Large Language Models for Zero-shot Inference of Causal Structures in Biology Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2503.04347
Genes, proteins and other biological entities influence one another via causal molecular networks. Causal relationships in such networks are mediated by complex and diverse mechanisms, through latent variables, and are often specific to cellular context. It remains challenging to characterise such networks in practice. Here, we present a novel framework to evaluate large language models (LLMs) for zero-shot inference of causal relationships in biology. In particular, we systematically evaluate causal claims obtained from an LLM using real-world interventional data. This is done over one hundred variables and thousands of causal hypotheses. Furthermore, we consider several prompting and retrieval-augmentation strategies, including large, and potentially conflicting, collections of scientific articles. Our results show that with tailored augmentation and prompting, even relatively small LLMs can capture meaningful aspects of causal structure in biological systems. This supports the notion that LLMs could act as orchestration tools in biological discovery, by helping to distil current knowledge in ways amenable to downstream analysis. Our approach to assessing LLMs with respect to experimental data is relevant for a broad range of problems at the intersection of causal learning, LLMs and scientific discovery.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2503.04347
- https://arxiv.org/pdf/2503.04347
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416113187
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416113187Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2503.04347Digital Object Identifier
- Title
-
Large Language Models for Zero-shot Inference of Causal Structures in BiologyWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-03-06Full publication date if available
- Authors
-
Richard Moulange, Nan Rosemary Ke, Sach MukherjeeList of authors in order
- Landing page
-
https://arxiv.org/abs/2503.04347Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2503.04347Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2503.04347Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416113187 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2503.04347 |
| ids.doi | https://doi.org/10.48550/arxiv.2503.04347 |
| ids.openalex | https://openalex.org/W4416113187 |
| fwci | |
| type | preprint |
| title | Large Language Models for Zero-shot Inference of Causal Structures in Biology |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2503.04347 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2503.04347 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2503.04347 |
| locations[1].id | doi:10.48550/arxiv.2503.04347 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2503.04347 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5093038741 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-1827-0941 |
| authorships[0].author.display_name | Richard Moulange |
| authorships[0].author_position | last |
| authorships[0].raw_author_name | Moulange, Richard |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5102922377 |
| authorships[1].author.orcid | https://orcid.org/0009-0003-7647-8449 |
| authorships[1].author.display_name | Nan Rosemary Ke |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ke, Nan Rosemary |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5112866063 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Sach Mukherjee |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Mukherjee, Sach |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2503.04347 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Large Language Models for Zero-shot Inference of Causal Structures in Biology |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T05:25:41.770364 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2503.04347 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2503.04347 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2503.04347 |
| primary_location.id | pmh:oai:arXiv.org:2503.04347 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2503.04347 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2503.04347 |
| publication_date | 2025-03-06 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 47, 170 |
| abstract_inverted_index.In | 64 |
| abstract_inverted_index.It | 35 |
| abstract_inverted_index.an | 73 |
| abstract_inverted_index.as | 139 |
| abstract_inverted_index.at | 175 |
| abstract_inverted_index.by | 20, 145 |
| abstract_inverted_index.in | 15, 42, 62, 128, 142, 151 |
| abstract_inverted_index.is | 80, 167 |
| abstract_inverted_index.of | 59, 88, 105, 125, 173, 178 |
| abstract_inverted_index.to | 32, 38, 50, 147, 154, 159, 164 |
| abstract_inverted_index.we | 45, 66, 92 |
| abstract_inverted_index.LLM | 74 |
| abstract_inverted_index.Our | 108, 157 |
| abstract_inverted_index.act | 138 |
| abstract_inverted_index.and | 2, 22, 28, 86, 96, 101, 115, 182 |
| abstract_inverted_index.are | 18, 29 |
| abstract_inverted_index.can | 121 |
| abstract_inverted_index.for | 56, 169 |
| abstract_inverted_index.one | 7, 83 |
| abstract_inverted_index.the | 133, 176 |
| abstract_inverted_index.via | 9 |
| abstract_inverted_index.LLMs | 120, 136, 161, 181 |
| abstract_inverted_index.This | 79, 131 |
| abstract_inverted_index.data | 166 |
| abstract_inverted_index.done | 81 |
| abstract_inverted_index.even | 117 |
| abstract_inverted_index.from | 72 |
| abstract_inverted_index.over | 82 |
| abstract_inverted_index.show | 110 |
| abstract_inverted_index.such | 16, 40 |
| abstract_inverted_index.that | 111, 135 |
| abstract_inverted_index.ways | 152 |
| abstract_inverted_index.with | 112, 162 |
| abstract_inverted_index.Here, | 44 |
| abstract_inverted_index.broad | 171 |
| abstract_inverted_index.could | 137 |
| abstract_inverted_index.data. | 78 |
| abstract_inverted_index.large | 52 |
| abstract_inverted_index.novel | 48 |
| abstract_inverted_index.often | 30 |
| abstract_inverted_index.other | 3 |
| abstract_inverted_index.range | 172 |
| abstract_inverted_index.small | 119 |
| abstract_inverted_index.tools | 141 |
| abstract_inverted_index.using | 75 |
| abstract_inverted_index.(LLMs) | 55 |
| abstract_inverted_index.Causal | 13 |
| abstract_inverted_index.Genes, | 0 |
| abstract_inverted_index.causal | 10, 60, 69, 89, 126, 179 |
| abstract_inverted_index.claims | 70 |
| abstract_inverted_index.distil | 148 |
| abstract_inverted_index.large, | 100 |
| abstract_inverted_index.latent | 26 |
| abstract_inverted_index.models | 54 |
| abstract_inverted_index.notion | 134 |
| abstract_inverted_index.another | 8 |
| abstract_inverted_index.aspects | 124 |
| abstract_inverted_index.capture | 122 |
| abstract_inverted_index.complex | 21 |
| abstract_inverted_index.current | 149 |
| abstract_inverted_index.diverse | 23 |
| abstract_inverted_index.helping | 146 |
| abstract_inverted_index.hundred | 84 |
| abstract_inverted_index.present | 46 |
| abstract_inverted_index.remains | 36 |
| abstract_inverted_index.respect | 163 |
| abstract_inverted_index.results | 109 |
| abstract_inverted_index.several | 94 |
| abstract_inverted_index.through | 25 |
| abstract_inverted_index.amenable | 153 |
| abstract_inverted_index.approach | 158 |
| abstract_inverted_index.biology. | 63 |
| abstract_inverted_index.cellular | 33 |
| abstract_inverted_index.consider | 93 |
| abstract_inverted_index.context. | 34 |
| abstract_inverted_index.entities | 5 |
| abstract_inverted_index.evaluate | 51, 68 |
| abstract_inverted_index.language | 53 |
| abstract_inverted_index.mediated | 19 |
| abstract_inverted_index.networks | 17, 41 |
| abstract_inverted_index.obtained | 71 |
| abstract_inverted_index.problems | 174 |
| abstract_inverted_index.proteins | 1 |
| abstract_inverted_index.relevant | 168 |
| abstract_inverted_index.specific | 31 |
| abstract_inverted_index.supports | 132 |
| abstract_inverted_index.systems. | 130 |
| abstract_inverted_index.tailored | 113 |
| abstract_inverted_index.analysis. | 156 |
| abstract_inverted_index.articles. | 107 |
| abstract_inverted_index.assessing | 160 |
| abstract_inverted_index.framework | 49 |
| abstract_inverted_index.including | 99 |
| abstract_inverted_index.inference | 58 |
| abstract_inverted_index.influence | 6 |
| abstract_inverted_index.knowledge | 150 |
| abstract_inverted_index.learning, | 180 |
| abstract_inverted_index.molecular | 11 |
| abstract_inverted_index.networks. | 12 |
| abstract_inverted_index.practice. | 43 |
| abstract_inverted_index.prompting | 95 |
| abstract_inverted_index.structure | 127 |
| abstract_inverted_index.thousands | 87 |
| abstract_inverted_index.variables | 85 |
| abstract_inverted_index.zero-shot | 57 |
| abstract_inverted_index.biological | 4, 129, 143 |
| abstract_inverted_index.discovery, | 144 |
| abstract_inverted_index.discovery. | 184 |
| abstract_inverted_index.downstream | 155 |
| abstract_inverted_index.meaningful | 123 |
| abstract_inverted_index.prompting, | 116 |
| abstract_inverted_index.real-world | 76 |
| abstract_inverted_index.relatively | 118 |
| abstract_inverted_index.scientific | 106, 183 |
| abstract_inverted_index.variables, | 27 |
| abstract_inverted_index.challenging | 37 |
| abstract_inverted_index.collections | 104 |
| abstract_inverted_index.hypotheses. | 90 |
| abstract_inverted_index.mechanisms, | 24 |
| abstract_inverted_index.particular, | 65 |
| abstract_inverted_index.potentially | 102 |
| abstract_inverted_index.strategies, | 98 |
| abstract_inverted_index.Furthermore, | 91 |
| abstract_inverted_index.augmentation | 114 |
| abstract_inverted_index.characterise | 39 |
| abstract_inverted_index.conflicting, | 103 |
| abstract_inverted_index.experimental | 165 |
| abstract_inverted_index.intersection | 177 |
| abstract_inverted_index.orchestration | 140 |
| abstract_inverted_index.relationships | 14, 61 |
| abstract_inverted_index.interventional | 77 |
| abstract_inverted_index.systematically | 67 |
| abstract_inverted_index.retrieval-augmentation | 97 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |