Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2510.03149
Test-time algorithms that combine the generative power of language models with process verifiers that assess the quality of partial generations offer a promising lever for eliciting new reasoning capabilities, but the algorithmic design space and computational scaling properties of such approaches are still opaque, and their benefits are far from apparent when one accounts for the cost of learning a high-quality verifier. Our starting point is the observation that seemingly benign errors in a learned verifier can lead to catastrophic failures for standard decoding techniques due to error amplification during the course of generation. We then ask: can this be improved with more sophisticated decoding strategies? We introduce a new process-guided test-time sampling algorithm, VGB, which uses theoretically grounded backtracking to achieve provably better robustness to verifier errors. VGB interprets autoregressive generation as a random walk on a tree of partial generations, with transition probabilities guided by the process verifier and base model; crucially, backtracking occurs probabilistically. This process generalizes the seminal Sinclair-Jerrum random walk (Sinclair & Jerrum, 1989) from the literature on approximate counting and sampling in theoretical computer science, and a conceptual contribution of our work is to highlight parallels with this literature. Empirically, we demonstrate on both synthetic and real language modeling tasks that VGB outperforms baselines on a variety of metrics.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2510.03149
- https://arxiv.org/pdf/2510.03149
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416371366
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416371366Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2510.03149Digital Object Identifier
- Title
-
Taming Imperfect Process Verifiers: A Sampling Perspective on BacktrackingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-10-03Full publication date if available
- Authors
-
Dhruv Rohatgi, Abhishek Shetty, Donya Saless, Ankur Moitra, Andrej RisteskiList of authors in order
- Landing page
-
https://arxiv.org/abs/2510.03149Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2510.03149Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2510.03149Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416371366 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2510.03149 |
| ids.doi | https://doi.org/10.48550/arxiv.2510.03149 |
| ids.openalex | https://openalex.org/W4416371366 |
| fwci | |
| type | preprint |
| title | Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2510.03149 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2510.03149 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2510.03149 |
| locations[1].id | doi:10.48550/arxiv.2510.03149 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2510.03149 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5036102061 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0569-6698 |
| authorships[0].author.display_name | Dhruv Rohatgi |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Rohatgi, Dhruv |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5089291709 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-2693-0866 |
| authorships[1].author.display_name | Abhishek Shetty |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Shetty, Abhishek |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5120364128 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Donya Saless |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Saless, Donya |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5045798295 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-7047-0495 |
| authorships[3].author.display_name | Ankur Moitra |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Moitra, Ankur |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5014972240 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Andrej Risteski |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Risteski, Andrej |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2510.03149 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T12:25:22.337395 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2510.03149 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2510.03149 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2510.03149 |
| primary_location.id | pmh:oai:arXiv.org:2510.03149 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2510.03149 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2510.03149 |
| publication_date | 2025-10-03 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 21, 59, 73, 108, 133, 137, 182, 211 |
| abstract_inverted_index.We | 94, 106 |
| abstract_inverted_index.as | 132 |
| abstract_inverted_index.be | 99 |
| abstract_inverted_index.by | 146 |
| abstract_inverted_index.in | 72, 177 |
| abstract_inverted_index.is | 65, 188 |
| abstract_inverted_index.of | 7, 17, 38, 57, 92, 139, 185, 213 |
| abstract_inverted_index.on | 136, 172, 198, 210 |
| abstract_inverted_index.to | 78, 86, 120, 125, 189 |
| abstract_inverted_index.we | 196 |
| abstract_inverted_index.Our | 62 |
| abstract_inverted_index.VGB | 128, 207 |
| abstract_inverted_index.and | 34, 44, 150, 175, 181, 201 |
| abstract_inverted_index.are | 41, 47 |
| abstract_inverted_index.but | 29 |
| abstract_inverted_index.can | 76, 97 |
| abstract_inverted_index.due | 85 |
| abstract_inverted_index.far | 48 |
| abstract_inverted_index.for | 24, 54, 81 |
| abstract_inverted_index.new | 26, 109 |
| abstract_inverted_index.one | 52 |
| abstract_inverted_index.our | 186 |
| abstract_inverted_index.the | 4, 15, 30, 55, 66, 90, 147, 160, 170 |
| abstract_inverted_index.This | 157 |
| abstract_inverted_index.VGB, | 114 |
| abstract_inverted_index.ask: | 96 |
| abstract_inverted_index.base | 151 |
| abstract_inverted_index.both | 199 |
| abstract_inverted_index.cost | 56 |
| abstract_inverted_index.from | 49, 169 |
| abstract_inverted_index.lead | 77 |
| abstract_inverted_index.more | 102 |
| abstract_inverted_index.real | 202 |
| abstract_inverted_index.such | 39 |
| abstract_inverted_index.that | 2, 13, 68, 206 |
| abstract_inverted_index.then | 95 |
| abstract_inverted_index.this | 98, 193 |
| abstract_inverted_index.tree | 138 |
| abstract_inverted_index.uses | 116 |
| abstract_inverted_index.walk | 135, 164 |
| abstract_inverted_index.when | 51 |
| abstract_inverted_index.with | 10, 101, 142, 192 |
| abstract_inverted_index.work | 187 |
| abstract_inverted_index.& | 166 |
| abstract_inverted_index.1989) | 168 |
| abstract_inverted_index.error | 87 |
| abstract_inverted_index.lever | 23 |
| abstract_inverted_index.offer | 20 |
| abstract_inverted_index.point | 64 |
| abstract_inverted_index.power | 6 |
| abstract_inverted_index.space | 33 |
| abstract_inverted_index.still | 42 |
| abstract_inverted_index.tasks | 205 |
| abstract_inverted_index.their | 45 |
| abstract_inverted_index.which | 115 |
| abstract_inverted_index.assess | 14 |
| abstract_inverted_index.benign | 70 |
| abstract_inverted_index.better | 123 |
| abstract_inverted_index.course | 91 |
| abstract_inverted_index.design | 32 |
| abstract_inverted_index.during | 89 |
| abstract_inverted_index.errors | 71 |
| abstract_inverted_index.guided | 145 |
| abstract_inverted_index.model; | 152 |
| abstract_inverted_index.models | 9 |
| abstract_inverted_index.occurs | 155 |
| abstract_inverted_index.random | 134, 163 |
| abstract_inverted_index.Jerrum, | 167 |
| abstract_inverted_index.achieve | 121 |
| abstract_inverted_index.combine | 3 |
| abstract_inverted_index.errors. | 127 |
| abstract_inverted_index.learned | 74 |
| abstract_inverted_index.opaque, | 43 |
| abstract_inverted_index.partial | 18, 140 |
| abstract_inverted_index.process | 11, 148, 158 |
| abstract_inverted_index.quality | 16 |
| abstract_inverted_index.scaling | 36 |
| abstract_inverted_index.seminal | 161 |
| abstract_inverted_index.variety | 212 |
| abstract_inverted_index.accounts | 53 |
| abstract_inverted_index.apparent | 50 |
| abstract_inverted_index.benefits | 46 |
| abstract_inverted_index.computer | 179 |
| abstract_inverted_index.counting | 174 |
| abstract_inverted_index.decoding | 83, 104 |
| abstract_inverted_index.failures | 80 |
| abstract_inverted_index.grounded | 118 |
| abstract_inverted_index.improved | 100 |
| abstract_inverted_index.language | 8, 203 |
| abstract_inverted_index.learning | 58 |
| abstract_inverted_index.metrics. | 214 |
| abstract_inverted_index.modeling | 204 |
| abstract_inverted_index.provably | 122 |
| abstract_inverted_index.sampling | 112, 176 |
| abstract_inverted_index.science, | 180 |
| abstract_inverted_index.standard | 82 |
| abstract_inverted_index.starting | 63 |
| abstract_inverted_index.verifier | 75, 126, 149 |
| abstract_inverted_index.(Sinclair | 165 |
| abstract_inverted_index.Test-time | 0 |
| abstract_inverted_index.baselines | 209 |
| abstract_inverted_index.eliciting | 25 |
| abstract_inverted_index.highlight | 190 |
| abstract_inverted_index.introduce | 107 |
| abstract_inverted_index.parallels | 191 |
| abstract_inverted_index.promising | 22 |
| abstract_inverted_index.reasoning | 27 |
| abstract_inverted_index.seemingly | 69 |
| abstract_inverted_index.synthetic | 200 |
| abstract_inverted_index.test-time | 111 |
| abstract_inverted_index.verifier. | 61 |
| abstract_inverted_index.verifiers | 12 |
| abstract_inverted_index.algorithm, | 113 |
| abstract_inverted_index.algorithms | 1 |
| abstract_inverted_index.approaches | 40 |
| abstract_inverted_index.conceptual | 183 |
| abstract_inverted_index.crucially, | 153 |
| abstract_inverted_index.generation | 131 |
| abstract_inverted_index.generative | 5 |
| abstract_inverted_index.interprets | 129 |
| abstract_inverted_index.literature | 171 |
| abstract_inverted_index.properties | 37 |
| abstract_inverted_index.robustness | 124 |
| abstract_inverted_index.techniques | 84 |
| abstract_inverted_index.transition | 143 |
| abstract_inverted_index.algorithmic | 31 |
| abstract_inverted_index.approximate | 173 |
| abstract_inverted_index.demonstrate | 197 |
| abstract_inverted_index.generalizes | 159 |
| abstract_inverted_index.generation. | 93 |
| abstract_inverted_index.generations | 19 |
| abstract_inverted_index.literature. | 194 |
| abstract_inverted_index.observation | 67 |
| abstract_inverted_index.outperforms | 208 |
| abstract_inverted_index.strategies? | 105 |
| abstract_inverted_index.theoretical | 178 |
| abstract_inverted_index.Empirically, | 195 |
| abstract_inverted_index.backtracking | 119, 154 |
| abstract_inverted_index.catastrophic | 79 |
| abstract_inverted_index.contribution | 184 |
| abstract_inverted_index.generations, | 141 |
| abstract_inverted_index.high-quality | 60 |
| abstract_inverted_index.amplification | 88 |
| abstract_inverted_index.capabilities, | 28 |
| abstract_inverted_index.computational | 35 |
| abstract_inverted_index.probabilities | 144 |
| abstract_inverted_index.sophisticated | 103 |
| abstract_inverted_index.theoretically | 117 |
| abstract_inverted_index.autoregressive | 130 |
| abstract_inverted_index.process-guided | 110 |
| abstract_inverted_index.Sinclair-Jerrum | 162 |
| abstract_inverted_index.probabilistically. | 156 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |