Evaluating LLM Reasoning in the Operations Research Domain with ORQA Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.1609/aaai.v39i23.34673
In this paper, we introduce and apply Operations Research Question Answering (ORQA), a new benchmark, to assess the generalization capabilities of Large Language Models (LLMs) in the specialized technical domain of Operations Research (OR). This benchmark is designed to evaluate whether LLMs can emulate the knowledge and reasoning skills of OR experts when given diverse and complex optimization problems. The dataset, crafted by OR experts, presents real-world optimization problems that require multistep reasoning to build their mathematical models. Our evaluations of various open-source LLMs, such as LLaMA 3.1, DeepSeek, and Mixtral reveal their modest performance, indicating a gap in their aptitude to generalize to specialized technical domains. This work contributes to the ongoing discourse on LLMs’ generalization capabilities, providing insights for future research in this area. The dataset and evaluation code are publicly available.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1609/aaai.v39i23.34673
- https://ojs.aaai.org/index.php/AAAI/article/download/34673/36828
- OA Status
- diamond
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4409347925
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4409347925Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1609/aaai.v39i23.34673Digital Object Identifier
- Title
-
Evaluating LLM Reasoning in the Operations Research Domain with ORQAWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-04-11Full publication date if available
- Authors
-
Mahdi Mostajabdaveh, Timothy T. Yu, S. Dash, Rindra Ramamonjison, Jabo Serge Byusa, Giuseppe Carenini, Zirui Zhou, Yong ZhangList of authors in order
- Landing page
-
https://doi.org/10.1609/aaai.v39i23.34673Publisher landing page
- PDF URL
-
https://ojs.aaai.org/index.php/AAAI/article/download/34673/36828Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
diamondOpen access status per OpenAlex
- OA URL
-
https://ojs.aaai.org/index.php/AAAI/article/download/34673/36828Direct OA link when available
- Concepts
-
Domain (mathematical analysis), Computer science, Management science, Mathematics, Engineering, Mathematical analysisTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4409347925 |
|---|---|
| doi | https://doi.org/10.1609/aaai.v39i23.34673 |
| ids.doi | https://doi.org/10.1609/aaai.v39i23.34673 |
| ids.openalex | https://openalex.org/W4409347925 |
| fwci | 7.23333322 |
| type | article |
| title | Evaluating LLM Reasoning in the Operations Research Domain with ORQA |
| biblio.issue | 23 |
| biblio.volume | 39 |
| biblio.last_page | 24910 |
| biblio.first_page | 24902 |
| topics[0].id | https://openalex.org/T10703 |
| topics[0].field.id | https://openalex.org/fields/14 |
| topics[0].field.display_name | Business, Management and Accounting |
| topics[0].score | 0.9763000011444092 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1404 |
| topics[0].subfield.display_name | Management Information Systems |
| topics[0].display_name | Business Process Modeling and Analysis |
| topics[1].id | https://openalex.org/T10215 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.929099977016449 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Semantic Web and Ontologies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C36503486 |
| concepts[0].level | 2 |
| concepts[0].score | 0.5778422355651855 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q11235244 |
| concepts[0].display_name | Domain (mathematical analysis) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5361490249633789 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C539667460 |
| concepts[2].level | 1 |
| concepts[2].score | 0.34467262029647827 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q2414942 |
| concepts[2].display_name | Management science |
| concepts[3].id | https://openalex.org/C33923547 |
| concepts[3].level | 0 |
| concepts[3].score | 0.14703616499900818 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[3].display_name | Mathematics |
| concepts[4].id | https://openalex.org/C127413603 |
| concepts[4].level | 0 |
| concepts[4].score | 0.13777223229408264 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[4].display_name | Engineering |
| concepts[5].id | https://openalex.org/C134306372 |
| concepts[5].level | 1 |
| concepts[5].score | 0.0 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[5].display_name | Mathematical analysis |
| keywords[0].id | https://openalex.org/keywords/domain |
| keywords[0].score | 0.5778422355651855 |
| keywords[0].display_name | Domain (mathematical analysis) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5361490249633789 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/management-science |
| keywords[2].score | 0.34467262029647827 |
| keywords[2].display_name | Management science |
| keywords[3].id | https://openalex.org/keywords/mathematics |
| keywords[3].score | 0.14703616499900818 |
| keywords[3].display_name | Mathematics |
| keywords[4].id | https://openalex.org/keywords/engineering |
| keywords[4].score | 0.13777223229408264 |
| keywords[4].display_name | Engineering |
| language | en |
| locations[0].id | doi:10.1609/aaai.v39i23.34673 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4210191458 |
| locations[0].source.issn | 2159-5399, 2374-3468 |
| locations[0].source.type | conference |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 2159-5399 |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].source.host_organization | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| locations[0].license | |
| locations[0].pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/34673/36828 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].landing_page_url | https://doi.org/10.1609/aaai.v39i23.34673 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5042614566 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-2816-909X |
| authorships[0].author.display_name | Mahdi Mostajabdaveh |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Mahdi Mostajabdaveh |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5081986793 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-8758-0578 |
| authorships[1].author.display_name | Timothy T. Yu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Timothy Tin Long Yu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5026719382 |
| authorships[2].author.orcid | https://orcid.org/0009-0007-8887-9265 |
| authorships[2].author.display_name | S. Dash |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Samarendra Chandan Bindu Dash |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5072135170 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Rindra Ramamonjison |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Rindra Ramamonjison |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5115647467 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Jabo Serge Byusa |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Jabo Serge Byusa |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5049259877 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-4310-0119 |
| authorships[5].author.display_name | Giuseppe Carenini |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Giuseppe Carenini |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5029816914 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-1690-0161 |
| authorships[6].author.display_name | Zirui Zhou |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Zirui Zhou |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5070956153 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-6650-6790 |
| authorships[7].author.display_name | Yong Zhang |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Yong Zhang |
| authorships[7].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://ojs.aaai.org/index.php/AAAI/article/download/34673/36828 |
| open_access.oa_status | diamond |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Evaluating LLM Reasoning in the Operations Research Domain with ORQA |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10703 |
| primary_topic.field.id | https://openalex.org/fields/14 |
| primary_topic.field.display_name | Business, Management and Accounting |
| primary_topic.score | 0.9763000011444092 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1404 |
| primary_topic.subfield.display_name | Management Information Systems |
| primary_topic.display_name | Business Process Modeling and Analysis |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1609/aaai.v39i23.34673 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4210191458 |
| best_oa_location.source.issn | 2159-5399, 2374-3468 |
| best_oa_location.source.type | conference |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 2159-5399 |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.source.host_organization | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/34673/36828 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.landing_page_url | https://doi.org/10.1609/aaai.v39i23.34673 |
| primary_location.id | doi:10.1609/aaai.v39i23.34673 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4210191458 |
| primary_location.source.issn | 2159-5399, 2374-3468 |
| primary_location.source.type | conference |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 2159-5399 |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.source.host_organization | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| primary_location.license | |
| primary_location.pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/34673/36828 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.landing_page_url | https://doi.org/10.1609/aaai.v39i23.34673 |
| publication_date | 2025-04-11 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 12, 96 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.OR | 50, 63 |
| abstract_inverted_index.as | 85 |
| abstract_inverted_index.by | 62 |
| abstract_inverted_index.in | 25, 98, 123 |
| abstract_inverted_index.is | 36 |
| abstract_inverted_index.of | 20, 30, 49, 80 |
| abstract_inverted_index.on | 114 |
| abstract_inverted_index.to | 15, 38, 73, 101, 103, 110 |
| abstract_inverted_index.we | 3 |
| abstract_inverted_index.Our | 78 |
| abstract_inverted_index.The | 59, 126 |
| abstract_inverted_index.and | 5, 46, 55, 89, 128 |
| abstract_inverted_index.are | 131 |
| abstract_inverted_index.can | 42 |
| abstract_inverted_index.for | 120 |
| abstract_inverted_index.gap | 97 |
| abstract_inverted_index.new | 13 |
| abstract_inverted_index.the | 17, 26, 44, 111 |
| abstract_inverted_index.3.1, | 87 |
| abstract_inverted_index.LLMs | 41 |
| abstract_inverted_index.This | 34, 107 |
| abstract_inverted_index.code | 130 |
| abstract_inverted_index.such | 84 |
| abstract_inverted_index.that | 69 |
| abstract_inverted_index.this | 1, 124 |
| abstract_inverted_index.when | 52 |
| abstract_inverted_index.work | 108 |
| abstract_inverted_index.(OR). | 33 |
| abstract_inverted_index.LLMs, | 83 |
| abstract_inverted_index.LLaMA | 86 |
| abstract_inverted_index.Large | 21 |
| abstract_inverted_index.apply | 6 |
| abstract_inverted_index.area. | 125 |
| abstract_inverted_index.build | 74 |
| abstract_inverted_index.given | 53 |
| abstract_inverted_index.their | 75, 92, 99 |
| abstract_inverted_index.(LLMs) | 24 |
| abstract_inverted_index.Models | 23 |
| abstract_inverted_index.assess | 16 |
| abstract_inverted_index.domain | 29 |
| abstract_inverted_index.future | 121 |
| abstract_inverted_index.modest | 93 |
| abstract_inverted_index.paper, | 2 |
| abstract_inverted_index.reveal | 91 |
| abstract_inverted_index.skills | 48 |
| abstract_inverted_index.(ORQA), | 11 |
| abstract_inverted_index.LLMs’ | 115 |
| abstract_inverted_index.Mixtral | 90 |
| abstract_inverted_index.complex | 56 |
| abstract_inverted_index.crafted | 61 |
| abstract_inverted_index.dataset | 127 |
| abstract_inverted_index.diverse | 54 |
| abstract_inverted_index.emulate | 43 |
| abstract_inverted_index.experts | 51 |
| abstract_inverted_index.models. | 77 |
| abstract_inverted_index.ongoing | 112 |
| abstract_inverted_index.require | 70 |
| abstract_inverted_index.various | 81 |
| abstract_inverted_index.whether | 40 |
| abstract_inverted_index.Language | 22 |
| abstract_inverted_index.Question | 9 |
| abstract_inverted_index.Research | 8, 32 |
| abstract_inverted_index.aptitude | 100 |
| abstract_inverted_index.dataset, | 60 |
| abstract_inverted_index.designed | 37 |
| abstract_inverted_index.domains. | 106 |
| abstract_inverted_index.evaluate | 39 |
| abstract_inverted_index.experts, | 64 |
| abstract_inverted_index.insights | 119 |
| abstract_inverted_index.presents | 65 |
| abstract_inverted_index.problems | 68 |
| abstract_inverted_index.publicly | 132 |
| abstract_inverted_index.research | 122 |
| abstract_inverted_index.Answering | 10 |
| abstract_inverted_index.DeepSeek, | 88 |
| abstract_inverted_index.benchmark | 35 |
| abstract_inverted_index.discourse | 113 |
| abstract_inverted_index.introduce | 4 |
| abstract_inverted_index.knowledge | 45 |
| abstract_inverted_index.multistep | 71 |
| abstract_inverted_index.problems. | 58 |
| abstract_inverted_index.providing | 118 |
| abstract_inverted_index.reasoning | 47, 72 |
| abstract_inverted_index.technical | 28, 105 |
| abstract_inverted_index.Operations | 7, 31 |
| abstract_inverted_index.available. | 133 |
| abstract_inverted_index.benchmark, | 14 |
| abstract_inverted_index.evaluation | 129 |
| abstract_inverted_index.generalize | 102 |
| abstract_inverted_index.indicating | 95 |
| abstract_inverted_index.real-world | 66 |
| abstract_inverted_index.contributes | 109 |
| abstract_inverted_index.evaluations | 79 |
| abstract_inverted_index.open-source | 82 |
| abstract_inverted_index.specialized | 27, 104 |
| abstract_inverted_index.capabilities | 19 |
| abstract_inverted_index.mathematical | 76 |
| abstract_inverted_index.optimization | 57, 67 |
| abstract_inverted_index.performance, | 94 |
| abstract_inverted_index.capabilities, | 117 |
| abstract_inverted_index.generalization | 18, 116 |
| cited_by_percentile_year.max | 95 |
| cited_by_percentile_year.min | 91 |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| citation_normalized_percentile.value | 0.92561983 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |