Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers Article Swipe

PDF

Alice Rueda , Maha M. Hassan , Argyrios Perivolaris , Bazen Gashaw Teferra , Reza Samavi , Sirisha Rambhatla , Yucheng Wu , Yanbo Zhang , Bo Cao , Divya Sharma , Sridhar Krishnan , Venkat Bhat ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2505.01482

Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and problem-solving across various domains. However, their ability to perform complex, multi-step reasoning task-essential for applications in science, medicine, and law-remains an area of active investigation. This paper examines the reasoning capabilities of contemporary LLMs, analyzing their strengths, limitations, and potential for improvement. The study uses prompt engineering techniques on the Graduate-Level GoogleProof Q&A (GPQA) dataset to assess the scientific reasoning of GPT-4o. Five popular prompt engineering techniques and two tailored promptings were tested: baseline direct answer (zero-shot), chain-of-thought (CoT), zero-shot CoT, self-ask, self-consistency, decomposition, and multipath promptings. Our findings indicate that while LLMs exhibit emergent reasoning abilities, they often rely on pattern recognition rather than true logical inference, leading to inconsistencies in complex problem-solving. The results indicated that self-consistency outperformed the other prompt engineering technique with an accuracy of 52.99%, followed by direct answer (52.23%). Zero-shot CoT (50%) outperformed multipath (48.44%), decomposition (47.77%), self-ask (46.88%), and CoT (43.75%). Self-consistency performed the second worst in explaining the answers. Simple techniques such as direct answer, CoT, and zero-shot CoT have the best scientific reasoning. We propose a research agenda aimed at bridging these gaps by integrating structured reasoning frameworks, hybrid AI approaches, and human-in-the-loop methodologies. By critically evaluating the reasoning mechanisms of LLMs, this paper contributes to the ongoing discourse on the future of artificial general intelligence and the development of more robust, trustworthy AI systems.

Related Topics

Understanding Comics

Sociology Of Scientific Knowledge

Scientific Method

Scientific Romance

Scientific Controversy

Scientific Notation

Thermo Fisher Scientific

Scientific Revolution

Scientific Theory

A Certain Scientific Railgun

Scientific Management

Abductive Reasoning

Concepts

No concepts available.

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2505.01482
PDF: https://arxiv.org/pdf/2505.01482
OA Status: green
OpenAlex ID: https://openalex.org/W4415026124

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4415026124

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2505.01482

Digital Object Identifier
Title: Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2025

Year of publication
Publication date: 2025-05-02

Full publication date if available
Authors: Alice Rueda, Maha M. Hassan, Argyrios Perivolaris, Bazen Gashaw Teferra, Reza Samavi, Sirisha Rambhatla, Yucheng Wu, Yanbo Zhang, Bo Cao, Divya Sharma, Sridhar Krishnan, Venkat Bhat

List of authors in order
Landing page: https://arxiv.org/abs/2505.01482

Publisher landing page
PDF URL: https://arxiv.org/pdf/2505.01482

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2505.01482

Direct OA link when available
Cited by: 0

Total citation count in OpenAlex

Full payload

id	https://openalex.org/W4415026124
doi	https://doi.org/10.48550/arxiv.2505.01482
ids.doi	https://doi.org/10.48550/arxiv.2505.01482
ids.openalex	https://openalex.org/W4415026124
fwci
type	preprint
title	Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10215
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9359999895095825
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Semantic Web and Ontologies
is_xpac	False
apc_list
apc_paid
language	en
locations[0].id	pmh:oai:arXiv.org:2505.01482
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2505.01482
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2505.01482
locations[1].id	doi:10.48550/arxiv.2505.01482
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license	cc-by
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id	https://openalex.org/licenses/cc-by
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2505.01482
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5052922837
authorships[0].author.orcid	https://orcid.org/0000-0002-9977-9653
authorships[0].author.display_name	Alice Rueda
authorships[0].author_position	first
authorships[0].raw_author_name	Rueda, Alice
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5071251565
authorships[1].author.orcid	https://orcid.org/0000-0002-9756-451X
authorships[1].author.display_name	Maha M. Hassan
authorships[1].author_position	middle
authorships[1].raw_author_name	Hassan, Mohammed S.
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5106588586
authorships[2].author.orcid
authorships[2].author.display_name	Argyrios Perivolaris
authorships[2].author_position	middle
authorships[2].raw_author_name	Perivolaris, Argyrios
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5067115403
authorships[3].author.orcid	https://orcid.org/0000-0001-5325-9639
authorships[3].author.display_name	Bazen Gashaw Teferra
authorships[3].author_position	middle
authorships[3].raw_author_name	Teferra, Bazen G.
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5026763818
authorships[4].author.orcid	https://orcid.org/0000-0001-6768-0168
authorships[4].author.display_name	Reza Samavi
authorships[4].author_position	middle
authorships[4].raw_author_name	Samavi, Reza
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5018625427
authorships[5].author.orcid	https://orcid.org/0000-0002-9389-727X
authorships[5].author.display_name	Sirisha Rambhatla
authorships[5].author_position	middle
authorships[5].raw_author_name	Rambhatla, Sirisha
authorships[5].is_corresponding	False
authorships[6].author.id	https://openalex.org/A5000234334
authorships[6].author.orcid	https://orcid.org/0000-0002-8231-8172
authorships[6].author.display_name	Yucheng Wu
authorships[6].author_position	middle
authorships[6].raw_author_name	Wu, Yuqi
authorships[6].is_corresponding	False
authorships[7].author.id	https://openalex.org/A5100732248
authorships[7].author.orcid	https://orcid.org/0000-0002-7627-488X
authorships[7].author.display_name	Yanbo Zhang
authorships[7].author_position	middle
authorships[7].raw_author_name	Zhang, Yanbo
authorships[7].is_corresponding	False
authorships[8].author.id	https://openalex.org/A5053463786
authorships[8].author.orcid	https://orcid.org/0000-0003-4623-7348
authorships[8].author.display_name	Bo Cao
authorships[8].author_position	middle
authorships[8].raw_author_name	Cao, Bo
authorships[8].is_corresponding	False
authorships[9].author.id	https://openalex.org/A5100724599
authorships[9].author.orcid	https://orcid.org/0000-0003-4777-8090
authorships[9].author.display_name	Divya Sharma
authorships[9].author_position	middle
authorships[9].raw_author_name	Sharma, Divya
authorships[9].is_corresponding	False
authorships[10].author.id	https://openalex.org/A5086845888
authorships[10].author.orcid	https://orcid.org/0000-0002-4659-564X
authorships[10].author.display_name	Sridhar Krishnan
authorships[10].author_position	middle
authorships[10].raw_author_name	Krishnan, Sridhar
authorships[10].is_corresponding	False
authorships[11].author.id	https://openalex.org/A5054349344
authorships[11].author.orcid	https://orcid.org/0000-0002-8768-1173
authorships[11].author.display_name	Venkat Bhat
authorships[11].author_position	last
authorships[11].raw_author_name	Bhat, Venkat
authorships[11].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2505.01482
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10215
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9359999895095825
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Semantic Web and Ontologies
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2505.01482
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2505.01482
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2505.01482
primary_location.id	pmh:oai:arXiv.org:2505.01482
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2505.01482
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2505.01482
publication_date	2025-05-02
publication_year	2025
referenced_works_count	0
abstract_inverted_index.a	188
abstract_inverted_index.AI	202, 236
abstract_inverted_index.By	207
abstract_inverted_index.We	186
abstract_inverted_index.an	34, 140
abstract_inverted_index.as	174
abstract_inverted_index.at	192
abstract_inverted_index.by	145, 196
abstract_inverted_index.in	8, 29, 125, 167
abstract_inverted_index.of	36, 45, 74, 142, 213, 225, 232
abstract_inverted_index.on	62, 114, 222
abstract_inverted_index.to	21, 69, 123, 218
abstract_inverted_index.CoT	150, 160, 180
abstract_inverted_index.Our	101
abstract_inverted_index.The	56, 128
abstract_inverted_index.and	13, 32, 52, 81, 98, 159, 178, 204, 229
abstract_inverted_index.for	27, 54
abstract_inverted_index.the	42, 63, 71, 134, 164, 169, 182, 210, 219, 223, 230
abstract_inverted_index.two	82
abstract_inverted_index.CoT,	94, 177
abstract_inverted_index.Five	76
abstract_inverted_index.LLMs	106
abstract_inverted_index.This	39
abstract_inverted_index.area	35
abstract_inverted_index.best	183
abstract_inverted_index.gaps	195
abstract_inverted_index.have	4, 181
abstract_inverted_index.more	233
abstract_inverted_index.rely	113
abstract_inverted_index.such	173
abstract_inverted_index.than	118
abstract_inverted_index.that	104, 131
abstract_inverted_index.they	111
abstract_inverted_index.this	215
abstract_inverted_index.true	119
abstract_inverted_index.uses	58
abstract_inverted_index.were	85
abstract_inverted_index.with	139
abstract_inverted_index.(50%)	151
abstract_inverted_index.LLMs,	47, 214
abstract_inverted_index.Large	0
abstract_inverted_index.aimed	191
abstract_inverted_index.often	112
abstract_inverted_index.other	135
abstract_inverted_index.paper	40, 216
abstract_inverted_index.study	57
abstract_inverted_index.their	19, 49
abstract_inverted_index.these	194
abstract_inverted_index.while	105
abstract_inverted_index.worst	166
abstract_inverted_index.(CoT),	92
abstract_inverted_index.(GPQA)	67
abstract_inverted_index.(LLMs)	3
abstract_inverted_index.Simple	171
abstract_inverted_index.across	15
abstract_inverted_index.active	37
abstract_inverted_index.agenda	190
abstract_inverted_index.answer	89, 147
abstract_inverted_index.assess	70
abstract_inverted_index.direct	88, 146, 175
abstract_inverted_index.future	224
abstract_inverted_index.hybrid	201
abstract_inverted_index.models	2
abstract_inverted_index.prompt	59, 78, 136
abstract_inverted_index.rather	117
abstract_inverted_index.second	165
abstract_inverted_index.52.99%,	143
abstract_inverted_index.GPT-4o.	75
abstract_inverted_index.Q&A	66
abstract_inverted_index.ability	20
abstract_inverted_index.answer,	176
abstract_inverted_index.complex	126
abstract_inverted_index.dataset	68
abstract_inverted_index.exhibit	107
abstract_inverted_index.general	227
abstract_inverted_index.leading	122
abstract_inverted_index.logical	120
abstract_inverted_index.natural	9
abstract_inverted_index.ongoing	220
abstract_inverted_index.pattern	115
abstract_inverted_index.perform	22
abstract_inverted_index.popular	77
abstract_inverted_index.propose	187
abstract_inverted_index.results	129
abstract_inverted_index.robust,	234
abstract_inverted_index.tested:	86
abstract_inverted_index.various	16
abstract_inverted_index.However,	18
abstract_inverted_index.accuracy	141
abstract_inverted_index.answers.	170
abstract_inverted_index.baseline	87
abstract_inverted_index.bridging	193
abstract_inverted_index.complex,	23
abstract_inverted_index.domains.	17
abstract_inverted_index.emergent	108
abstract_inverted_index.examines	41
abstract_inverted_index.findings	102
abstract_inverted_index.followed	144
abstract_inverted_index.indicate	103
abstract_inverted_index.language	1, 10
abstract_inverted_index.research	189
abstract_inverted_index.science,	30
abstract_inverted_index.self-ask	157
abstract_inverted_index.systems.	237
abstract_inverted_index.tailored	83
abstract_inverted_index.(43.75%).	161
abstract_inverted_index.(46.88%),	158
abstract_inverted_index.(47.77%),	156
abstract_inverted_index.(48.44%),	154
abstract_inverted_index.(52.23%).	148
abstract_inverted_index.Zero-shot	149
abstract_inverted_index.analyzing	48
abstract_inverted_index.discourse	221
abstract_inverted_index.indicated	130
abstract_inverted_index.medicine,	31
abstract_inverted_index.multipath	99, 153
abstract_inverted_index.performed	163
abstract_inverted_index.potential	53
abstract_inverted_index.reasoning	25, 43, 73, 109, 199, 211
abstract_inverted_index.self-ask,	95
abstract_inverted_index.technique	138
abstract_inverted_index.zero-shot	93, 179
abstract_inverted_index.abilities,	110
abstract_inverted_index.artificial	226
abstract_inverted_index.critically	208
abstract_inverted_index.evaluating	209
abstract_inverted_index.explaining	168
abstract_inverted_index.inference,	121
abstract_inverted_index.mechanisms	212
abstract_inverted_index.multi-step	24
abstract_inverted_index.promptings	84
abstract_inverted_index.reasoning,	12
abstract_inverted_index.reasoning.	185
abstract_inverted_index.remarkable	6
abstract_inverted_index.scientific	72, 184
abstract_inverted_index.strengths,	50
abstract_inverted_index.structured	198
abstract_inverted_index.techniques	61, 80, 172
abstract_inverted_index.GoogleProof	65
abstract_inverted_index.approaches,	203
abstract_inverted_index.contributes	217
abstract_inverted_index.development	231
abstract_inverted_index.engineering	60, 79, 137
abstract_inverted_index.frameworks,	200
abstract_inverted_index.integrating	197
abstract_inverted_index.law-remains	33
abstract_inverted_index.promptings.	100
abstract_inverted_index.recognition	116
abstract_inverted_index.trustworthy	235
abstract_inverted_index.(zero-shot),	90
abstract_inverted_index.applications	28
abstract_inverted_index.capabilities	7, 44
abstract_inverted_index.contemporary	46
abstract_inverted_index.demonstrated	5
abstract_inverted_index.improvement.	55
abstract_inverted_index.intelligence	228
abstract_inverted_index.limitations,	51
abstract_inverted_index.outperformed	133, 152
abstract_inverted_index.decomposition	155
abstract_inverted_index.Graduate-Level	64
abstract_inverted_index.decomposition,	97
abstract_inverted_index.investigation.	38
abstract_inverted_index.methodologies.	206
abstract_inverted_index.task-essential	26
abstract_inverted_index.understanding,	11
abstract_inverted_index.inconsistencies	124
abstract_inverted_index.problem-solving	14
abstract_inverted_index.Self-consistency	162
abstract_inverted_index.chain-of-thought	91
abstract_inverted_index.problem-solving.	127
abstract_inverted_index.self-consistency	132
abstract_inverted_index.human-in-the-loop	205
abstract_inverted_index.self-consistency,	96
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	12
citation_normalized_percentile