Diverse Inference and Verification for Advanced Reasoning Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2502.09955
Reasoning LLMs such as OpenAI o1, o3 and DeepSeek R1 have made significant progress in mathematics and coding, yet find challenging advanced tasks such as International Mathematical Olympiad (IMO) combinatorics problems, Abstraction and Reasoning Corpus (ARC) puzzles, and Humanity's Last Exam (HLE) questions. We use a diverse inference approach that combines multiple models and methods at test time. We find that verifying mathematics and code problems, and rejection sampling on other problems is simple and effective. We automatically verify correctness of solutions to IMO problems by Lean, and ARC puzzles by code, and find that best-of-N effectively answers HLE questions. Our approach increases answer accuracy on IMO combinatorics problems from 33.3% to 77.8%, accuracy on HLE questions from 8% to 37%, and solves 80% of ARC puzzles that 948 humans could not and 26.5% of ARC puzzles that o3 high compute does not. Test-time simulations, reinforcement learning, and meta-learning with inference feedback improve generalization by adapting agent graph representations and varying prompts, code, and datasets. Our approach is reliable, robust, and scalable, and in the spirit of reproducible research, we will make it publicly available upon publication.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2502.09955
- https://arxiv.org/pdf/2502.09955
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4407632465
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4407632465Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2502.09955Digital Object Identifier
- Title
-
Diverse Inference and Verification for Advanced ReasoningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-14Full publication date if available
- Authors
-
Iddo Drori, Gaston Longhitano, Mao Mao, Sang Won Hyun, Yuke Zhang, Sung‐Jun Park, Zachary Meeks, Xinyu Zhang, Ben Segev, Howard Yong, Nakul Verma, Avi Shporer, Alon Amit, Madeleine UdellList of authors in order
- Landing page
-
https://arxiv.org/abs/2502.09955Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2502.09955Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2502.09955Direct OA link when available
- Concepts
-
Inference, Computer science, Artificial intelligenceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4407632465 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2502.09955 |
| ids.doi | https://doi.org/10.48550/arxiv.2502.09955 |
| ids.openalex | https://openalex.org/W4407632465 |
| fwci | |
| type | preprint |
| title | Diverse Inference and Verification for Advanced Reasoning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10215 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.694100022315979 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Semantic Web and Ontologies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2776214188 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7591409683227539 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q408386 |
| concepts[0].display_name | Inference |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.49105381965637207 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.411739706993103 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| keywords[0].id | https://openalex.org/keywords/inference |
| keywords[0].score | 0.7591409683227539 |
| keywords[0].display_name | Inference |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.49105381965637207 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.411739706993103 |
| keywords[2].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2502.09955 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2502.09955 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2502.09955 |
| locations[1].id | doi:10.48550/arxiv.2502.09955 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2502.09955 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5019248564 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-9797-3885 |
| authorships[0].author.display_name | Iddo Drori |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Drori, Iddo |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5107671969 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Gaston Longhitano |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Longhitano, Gaston |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5101510091 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-7103-8298 |
| authorships[2].author.display_name | Mao Mao |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Mao, Mao |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5114102664 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Sang Won Hyun |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Hyun, Seunghwan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5019021341 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-5253-5478 |
| authorships[4].author.display_name | Yuke Zhang |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Zhang, Yuke |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100749212 |
| authorships[5].author.orcid | https://orcid.org/0009-0002-7498-6602 |
| authorships[5].author.display_name | Sung‐Jun Park |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Park, Sungjun |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5107671970 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Zachary Meeks |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Meeks, Zachary |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5104267472 |
| authorships[7].author.orcid | https://orcid.org/0009-0008-7764-4265 |
| authorships[7].author.display_name | Xinyu Zhang |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Zhang, Xin-Yu |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5107671968 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Ben Segev |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Segev, Ben |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5040952485 |
| authorships[9].author.orcid | |
| authorships[9].author.display_name | Howard Yong |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Yong, Howard |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5019307205 |
| authorships[10].author.orcid | |
| authorships[10].author.display_name | Nakul Verma |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Verma, Nakul |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5064326052 |
| authorships[11].author.orcid | https://orcid.org/0000-0002-1836-3120 |
| authorships[11].author.display_name | Avi Shporer |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | Shporer, Avi |
| authorships[11].is_corresponding | False |
| authorships[12].author.id | https://openalex.org/A5043070484 |
| authorships[12].author.orcid | |
| authorships[12].author.display_name | Alon Amit |
| authorships[12].author_position | middle |
| authorships[12].raw_author_name | Amit, Alon |
| authorships[12].is_corresponding | False |
| authorships[13].author.id | https://openalex.org/A5084564811 |
| authorships[13].author.orcid | https://orcid.org/0000-0002-3985-915X |
| authorships[13].author.display_name | Madeleine Udell |
| authorships[13].author_position | last |
| authorships[13].raw_author_name | Udell, Madeleine |
| authorships[13].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2502.09955 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Diverse Inference and Verification for Advanced Reasoning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10215 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.694100022315979 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Semantic Web and Ontologies |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2502.09955 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2502.09955 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2502.09955 |
| primary_location.id | pmh:oai:arXiv.org:2502.09955 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2502.09955 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2502.09955 |
| publication_date | 2025-02-14 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 45 |
| abstract_inverted_index.8% | 118 |
| abstract_inverted_index.R1 | 9 |
| abstract_inverted_index.We | 43, 58, 76 |
| abstract_inverted_index.as | 3, 24 |
| abstract_inverted_index.at | 55 |
| abstract_inverted_index.by | 85, 90, 154 |
| abstract_inverted_index.in | 14, 173 |
| abstract_inverted_index.is | 72, 167 |
| abstract_inverted_index.it | 182 |
| abstract_inverted_index.o3 | 6, 138 |
| abstract_inverted_index.of | 80, 124, 134, 176 |
| abstract_inverted_index.on | 69, 105, 114 |
| abstract_inverted_index.to | 82, 111, 119 |
| abstract_inverted_index.we | 179 |
| abstract_inverted_index.80% | 123 |
| abstract_inverted_index.948 | 128 |
| abstract_inverted_index.ARC | 88, 125, 135 |
| abstract_inverted_index.HLE | 98, 115 |
| abstract_inverted_index.IMO | 83, 106 |
| abstract_inverted_index.Our | 100, 165 |
| abstract_inverted_index.and | 7, 16, 32, 37, 53, 63, 66, 74, 87, 92, 121, 132, 147, 159, 163, 170, 172 |
| abstract_inverted_index.not | 131 |
| abstract_inverted_index.o1, | 5 |
| abstract_inverted_index.the | 174 |
| abstract_inverted_index.use | 44 |
| abstract_inverted_index.yet | 18 |
| abstract_inverted_index.37%, | 120 |
| abstract_inverted_index.Exam | 40 |
| abstract_inverted_index.LLMs | 1 |
| abstract_inverted_index.Last | 39 |
| abstract_inverted_index.code | 64 |
| abstract_inverted_index.does | 141 |
| abstract_inverted_index.find | 19, 59, 93 |
| abstract_inverted_index.from | 109, 117 |
| abstract_inverted_index.have | 10 |
| abstract_inverted_index.high | 139 |
| abstract_inverted_index.made | 11 |
| abstract_inverted_index.make | 181 |
| abstract_inverted_index.not. | 142 |
| abstract_inverted_index.such | 2, 23 |
| abstract_inverted_index.test | 56 |
| abstract_inverted_index.that | 49, 60, 94, 127, 137 |
| abstract_inverted_index.upon | 185 |
| abstract_inverted_index.will | 180 |
| abstract_inverted_index.with | 149 |
| abstract_inverted_index.(ARC) | 35 |
| abstract_inverted_index.(HLE) | 41 |
| abstract_inverted_index.(IMO) | 28 |
| abstract_inverted_index.26.5% | 133 |
| abstract_inverted_index.33.3% | 110 |
| abstract_inverted_index.Lean, | 86 |
| abstract_inverted_index.agent | 156 |
| abstract_inverted_index.code, | 91, 162 |
| abstract_inverted_index.could | 130 |
| abstract_inverted_index.graph | 157 |
| abstract_inverted_index.other | 70 |
| abstract_inverted_index.tasks | 22 |
| abstract_inverted_index.time. | 57 |
| abstract_inverted_index.77.8%, | 112 |
| abstract_inverted_index.Corpus | 34 |
| abstract_inverted_index.OpenAI | 4 |
| abstract_inverted_index.answer | 103 |
| abstract_inverted_index.humans | 129 |
| abstract_inverted_index.models | 52 |
| abstract_inverted_index.simple | 73 |
| abstract_inverted_index.solves | 122 |
| abstract_inverted_index.spirit | 175 |
| abstract_inverted_index.verify | 78 |
| abstract_inverted_index.answers | 97 |
| abstract_inverted_index.coding, | 17 |
| abstract_inverted_index.compute | 140 |
| abstract_inverted_index.diverse | 46 |
| abstract_inverted_index.improve | 152 |
| abstract_inverted_index.methods | 54 |
| abstract_inverted_index.puzzles | 89, 126, 136 |
| abstract_inverted_index.robust, | 169 |
| abstract_inverted_index.varying | 160 |
| abstract_inverted_index.DeepSeek | 8 |
| abstract_inverted_index.Olympiad | 27 |
| abstract_inverted_index.accuracy | 104, 113 |
| abstract_inverted_index.adapting | 155 |
| abstract_inverted_index.advanced | 21 |
| abstract_inverted_index.approach | 48, 101, 166 |
| abstract_inverted_index.combines | 50 |
| abstract_inverted_index.feedback | 151 |
| abstract_inverted_index.multiple | 51 |
| abstract_inverted_index.problems | 71, 84, 108 |
| abstract_inverted_index.progress | 13 |
| abstract_inverted_index.prompts, | 161 |
| abstract_inverted_index.publicly | 183 |
| abstract_inverted_index.puzzles, | 36 |
| abstract_inverted_index.sampling | 68 |
| abstract_inverted_index.Reasoning | 0, 33 |
| abstract_inverted_index.Test-time | 143 |
| abstract_inverted_index.available | 184 |
| abstract_inverted_index.best-of-N | 95 |
| abstract_inverted_index.datasets. | 164 |
| abstract_inverted_index.increases | 102 |
| abstract_inverted_index.inference | 47, 150 |
| abstract_inverted_index.learning, | 146 |
| abstract_inverted_index.problems, | 30, 65 |
| abstract_inverted_index.questions | 116 |
| abstract_inverted_index.rejection | 67 |
| abstract_inverted_index.reliable, | 168 |
| abstract_inverted_index.research, | 178 |
| abstract_inverted_index.scalable, | 171 |
| abstract_inverted_index.solutions | 81 |
| abstract_inverted_index.verifying | 61 |
| abstract_inverted_index.Humanity's | 38 |
| abstract_inverted_index.effective. | 75 |
| abstract_inverted_index.questions. | 42, 99 |
| abstract_inverted_index.Abstraction | 31 |
| abstract_inverted_index.challenging | 20 |
| abstract_inverted_index.correctness | 79 |
| abstract_inverted_index.effectively | 96 |
| abstract_inverted_index.mathematics | 15, 62 |
| abstract_inverted_index.significant | 12 |
| abstract_inverted_index.Mathematical | 26 |
| abstract_inverted_index.publication. | 186 |
| abstract_inverted_index.reproducible | 177 |
| abstract_inverted_index.simulations, | 144 |
| abstract_inverted_index.International | 25 |
| abstract_inverted_index.automatically | 77 |
| abstract_inverted_index.combinatorics | 29, 107 |
| abstract_inverted_index.meta-learning | 148 |
| abstract_inverted_index.reinforcement | 145 |
| abstract_inverted_index.generalization | 153 |
| abstract_inverted_index.representations | 158 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 14 |
| citation_normalized_percentile |