Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2505.01482
Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and problem-solving across various domains. However, their ability to perform complex, multi-step reasoning task-essential for applications in science, medicine, and law-remains an area of active investigation. This paper examines the reasoning capabilities of contemporary LLMs, analyzing their strengths, limitations, and potential for improvement. The study uses prompt engineering techniques on the Graduate-Level GoogleProof Q&A (GPQA) dataset to assess the scientific reasoning of GPT-4o. Five popular prompt engineering techniques and two tailored promptings were tested: baseline direct answer (zero-shot), chain-of-thought (CoT), zero-shot CoT, self-ask, self-consistency, decomposition, and multipath promptings. Our findings indicate that while LLMs exhibit emergent reasoning abilities, they often rely on pattern recognition rather than true logical inference, leading to inconsistencies in complex problem-solving. The results indicated that self-consistency outperformed the other prompt engineering technique with an accuracy of 52.99%, followed by direct answer (52.23%). Zero-shot CoT (50%) outperformed multipath (48.44%), decomposition (47.77%), self-ask (46.88%), and CoT (43.75%). Self-consistency performed the second worst in explaining the answers. Simple techniques such as direct answer, CoT, and zero-shot CoT have the best scientific reasoning. We propose a research agenda aimed at bridging these gaps by integrating structured reasoning frameworks, hybrid AI approaches, and human-in-the-loop methodologies. By critically evaluating the reasoning mechanisms of LLMs, this paper contributes to the ongoing discourse on the future of artificial general intelligence and the development of more robust, trustworthy AI systems.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2505.01482
- https://arxiv.org/pdf/2505.01482
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415026124
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415026124Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2505.01482Digital Object Identifier
- Title
-
Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the AnswersWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-05-02Full publication date if available
- Authors
-
Alice Rueda, Maha M. Hassan, Argyrios Perivolaris, Bazen Gashaw Teferra, Reza Samavi, Sirisha Rambhatla, Yucheng Wu, Yanbo Zhang, Bo Cao, Divya Sharma, Sridhar Krishnan, Venkat BhatList of authors in order
- Landing page
-
https://arxiv.org/abs/2505.01482Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2505.01482Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2505.01482Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415026124 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2505.01482 |
| ids.doi | https://doi.org/10.48550/arxiv.2505.01482 |
| ids.openalex | https://openalex.org/W4415026124 |
| fwci | |
| type | preprint |
| title | Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10215 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9359999895095825 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Semantic Web and Ontologies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2505.01482 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2505.01482 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2505.01482 |
| locations[1].id | doi:10.48550/arxiv.2505.01482 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2505.01482 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5052922837 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-9977-9653 |
| authorships[0].author.display_name | Alice Rueda |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Rueda, Alice |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5071251565 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-9756-451X |
| authorships[1].author.display_name | Maha M. Hassan |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Hassan, Mohammed S. |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5106588586 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Argyrios Perivolaris |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Perivolaris, Argyrios |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5067115403 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-5325-9639 |
| authorships[3].author.display_name | Bazen Gashaw Teferra |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Teferra, Bazen G. |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5026763818 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-6768-0168 |
| authorships[4].author.display_name | Reza Samavi |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Samavi, Reza |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5018625427 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-9389-727X |
| authorships[5].author.display_name | Sirisha Rambhatla |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Rambhatla, Sirisha |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5000234334 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-8231-8172 |
| authorships[6].author.display_name | Yucheng Wu |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Wu, Yuqi |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5100732248 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-7627-488X |
| authorships[7].author.display_name | Yanbo Zhang |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Zhang, Yanbo |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5053463786 |
| authorships[8].author.orcid | https://orcid.org/0000-0003-4623-7348 |
| authorships[8].author.display_name | Bo Cao |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Cao, Bo |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5100724599 |
| authorships[9].author.orcid | https://orcid.org/0000-0003-4777-8090 |
| authorships[9].author.display_name | Divya Sharma |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Sharma, Divya |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5086845888 |
| authorships[10].author.orcid | https://orcid.org/0000-0002-4659-564X |
| authorships[10].author.display_name | Sridhar Krishnan |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Krishnan, Sridhar |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5054349344 |
| authorships[11].author.orcid | https://orcid.org/0000-0002-8768-1173 |
| authorships[11].author.display_name | Venkat Bhat |
| authorships[11].author_position | last |
| authorships[11].raw_author_name | Bhat, Venkat |
| authorships[11].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2505.01482 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10215 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9359999895095825 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Semantic Web and Ontologies |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2505.01482 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2505.01482 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2505.01482 |
| primary_location.id | pmh:oai:arXiv.org:2505.01482 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2505.01482 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2505.01482 |
| publication_date | 2025-05-02 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 188 |
| abstract_inverted_index.AI | 202, 236 |
| abstract_inverted_index.By | 207 |
| abstract_inverted_index.We | 186 |
| abstract_inverted_index.an | 34, 140 |
| abstract_inverted_index.as | 174 |
| abstract_inverted_index.at | 192 |
| abstract_inverted_index.by | 145, 196 |
| abstract_inverted_index.in | 8, 29, 125, 167 |
| abstract_inverted_index.of | 36, 45, 74, 142, 213, 225, 232 |
| abstract_inverted_index.on | 62, 114, 222 |
| abstract_inverted_index.to | 21, 69, 123, 218 |
| abstract_inverted_index.CoT | 150, 160, 180 |
| abstract_inverted_index.Our | 101 |
| abstract_inverted_index.The | 56, 128 |
| abstract_inverted_index.and | 13, 32, 52, 81, 98, 159, 178, 204, 229 |
| abstract_inverted_index.for | 27, 54 |
| abstract_inverted_index.the | 42, 63, 71, 134, 164, 169, 182, 210, 219, 223, 230 |
| abstract_inverted_index.two | 82 |
| abstract_inverted_index.CoT, | 94, 177 |
| abstract_inverted_index.Five | 76 |
| abstract_inverted_index.LLMs | 106 |
| abstract_inverted_index.This | 39 |
| abstract_inverted_index.area | 35 |
| abstract_inverted_index.best | 183 |
| abstract_inverted_index.gaps | 195 |
| abstract_inverted_index.have | 4, 181 |
| abstract_inverted_index.more | 233 |
| abstract_inverted_index.rely | 113 |
| abstract_inverted_index.such | 173 |
| abstract_inverted_index.than | 118 |
| abstract_inverted_index.that | 104, 131 |
| abstract_inverted_index.they | 111 |
| abstract_inverted_index.this | 215 |
| abstract_inverted_index.true | 119 |
| abstract_inverted_index.uses | 58 |
| abstract_inverted_index.were | 85 |
| abstract_inverted_index.with | 139 |
| abstract_inverted_index.(50%) | 151 |
| abstract_inverted_index.LLMs, | 47, 214 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.aimed | 191 |
| abstract_inverted_index.often | 112 |
| abstract_inverted_index.other | 135 |
| abstract_inverted_index.paper | 40, 216 |
| abstract_inverted_index.study | 57 |
| abstract_inverted_index.their | 19, 49 |
| abstract_inverted_index.these | 194 |
| abstract_inverted_index.while | 105 |
| abstract_inverted_index.worst | 166 |
| abstract_inverted_index.(CoT), | 92 |
| abstract_inverted_index.(GPQA) | 67 |
| abstract_inverted_index.(LLMs) | 3 |
| abstract_inverted_index.Simple | 171 |
| abstract_inverted_index.across | 15 |
| abstract_inverted_index.active | 37 |
| abstract_inverted_index.agenda | 190 |
| abstract_inverted_index.answer | 89, 147 |
| abstract_inverted_index.assess | 70 |
| abstract_inverted_index.direct | 88, 146, 175 |
| abstract_inverted_index.future | 224 |
| abstract_inverted_index.hybrid | 201 |
| abstract_inverted_index.models | 2 |
| abstract_inverted_index.prompt | 59, 78, 136 |
| abstract_inverted_index.rather | 117 |
| abstract_inverted_index.second | 165 |
| abstract_inverted_index.52.99%, | 143 |
| abstract_inverted_index.GPT-4o. | 75 |
| abstract_inverted_index.Q&A | 66 |
| abstract_inverted_index.ability | 20 |
| abstract_inverted_index.answer, | 176 |
| abstract_inverted_index.complex | 126 |
| abstract_inverted_index.dataset | 68 |
| abstract_inverted_index.exhibit | 107 |
| abstract_inverted_index.general | 227 |
| abstract_inverted_index.leading | 122 |
| abstract_inverted_index.logical | 120 |
| abstract_inverted_index.natural | 9 |
| abstract_inverted_index.ongoing | 220 |
| abstract_inverted_index.pattern | 115 |
| abstract_inverted_index.perform | 22 |
| abstract_inverted_index.popular | 77 |
| abstract_inverted_index.propose | 187 |
| abstract_inverted_index.results | 129 |
| abstract_inverted_index.robust, | 234 |
| abstract_inverted_index.tested: | 86 |
| abstract_inverted_index.various | 16 |
| abstract_inverted_index.However, | 18 |
| abstract_inverted_index.accuracy | 141 |
| abstract_inverted_index.answers. | 170 |
| abstract_inverted_index.baseline | 87 |
| abstract_inverted_index.bridging | 193 |
| abstract_inverted_index.complex, | 23 |
| abstract_inverted_index.domains. | 17 |
| abstract_inverted_index.emergent | 108 |
| abstract_inverted_index.examines | 41 |
| abstract_inverted_index.findings | 102 |
| abstract_inverted_index.followed | 144 |
| abstract_inverted_index.indicate | 103 |
| abstract_inverted_index.language | 1, 10 |
| abstract_inverted_index.research | 189 |
| abstract_inverted_index.science, | 30 |
| abstract_inverted_index.self-ask | 157 |
| abstract_inverted_index.systems. | 237 |
| abstract_inverted_index.tailored | 83 |
| abstract_inverted_index.(43.75%). | 161 |
| abstract_inverted_index.(46.88%), | 158 |
| abstract_inverted_index.(47.77%), | 156 |
| abstract_inverted_index.(48.44%), | 154 |
| abstract_inverted_index.(52.23%). | 148 |
| abstract_inverted_index.Zero-shot | 149 |
| abstract_inverted_index.analyzing | 48 |
| abstract_inverted_index.discourse | 221 |
| abstract_inverted_index.indicated | 130 |
| abstract_inverted_index.medicine, | 31 |
| abstract_inverted_index.multipath | 99, 153 |
| abstract_inverted_index.performed | 163 |
| abstract_inverted_index.potential | 53 |
| abstract_inverted_index.reasoning | 25, 43, 73, 109, 199, 211 |
| abstract_inverted_index.self-ask, | 95 |
| abstract_inverted_index.technique | 138 |
| abstract_inverted_index.zero-shot | 93, 179 |
| abstract_inverted_index.abilities, | 110 |
| abstract_inverted_index.artificial | 226 |
| abstract_inverted_index.critically | 208 |
| abstract_inverted_index.evaluating | 209 |
| abstract_inverted_index.explaining | 168 |
| abstract_inverted_index.inference, | 121 |
| abstract_inverted_index.mechanisms | 212 |
| abstract_inverted_index.multi-step | 24 |
| abstract_inverted_index.promptings | 84 |
| abstract_inverted_index.reasoning, | 12 |
| abstract_inverted_index.reasoning. | 185 |
| abstract_inverted_index.remarkable | 6 |
| abstract_inverted_index.scientific | 72, 184 |
| abstract_inverted_index.strengths, | 50 |
| abstract_inverted_index.structured | 198 |
| abstract_inverted_index.techniques | 61, 80, 172 |
| abstract_inverted_index.GoogleProof | 65 |
| abstract_inverted_index.approaches, | 203 |
| abstract_inverted_index.contributes | 217 |
| abstract_inverted_index.development | 231 |
| abstract_inverted_index.engineering | 60, 79, 137 |
| abstract_inverted_index.frameworks, | 200 |
| abstract_inverted_index.integrating | 197 |
| abstract_inverted_index.law-remains | 33 |
| abstract_inverted_index.promptings. | 100 |
| abstract_inverted_index.recognition | 116 |
| abstract_inverted_index.trustworthy | 235 |
| abstract_inverted_index.(zero-shot), | 90 |
| abstract_inverted_index.applications | 28 |
| abstract_inverted_index.capabilities | 7, 44 |
| abstract_inverted_index.contemporary | 46 |
| abstract_inverted_index.demonstrated | 5 |
| abstract_inverted_index.improvement. | 55 |
| abstract_inverted_index.intelligence | 228 |
| abstract_inverted_index.limitations, | 51 |
| abstract_inverted_index.outperformed | 133, 152 |
| abstract_inverted_index.decomposition | 155 |
| abstract_inverted_index.Graduate-Level | 64 |
| abstract_inverted_index.decomposition, | 97 |
| abstract_inverted_index.investigation. | 38 |
| abstract_inverted_index.methodologies. | 206 |
| abstract_inverted_index.task-essential | 26 |
| abstract_inverted_index.understanding, | 11 |
| abstract_inverted_index.inconsistencies | 124 |
| abstract_inverted_index.problem-solving | 14 |
| abstract_inverted_index.Self-consistency | 162 |
| abstract_inverted_index.chain-of-thought | 91 |
| abstract_inverted_index.problem-solving. | 127 |
| abstract_inverted_index.self-consistency | 132 |
| abstract_inverted_index.human-in-the-loop | 205 |
| abstract_inverted_index.self-consistency, | 96 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 12 |
| citation_normalized_percentile |