Block Verification Accelerates Speculative Decoding Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2403.10444
Speculative decoding is an effective method for lossless acceleration of large language models during inference. It uses a fast model to draft a block of tokens which are then verified in parallel by the target model, and provides a guarantee that the output is distributed identically to a sample from the target model. In prior works, draft verification is performed independently token-by-token. Surprisingly, we show that this approach is not optimal. We propose Block Verification, a simple draft verification algorithm that verifies the entire block jointly and provides additional wall-clock speedup. We prove that the proposed mechanism is optimal in the expected number of tokens produced each iteration and specifically is never worse than the standard token-level verification. Empirically, block verification provides modest but consistent wall-clock speedups over the standard token verification algorithm of 5%-8% in a range of tasks and datasets. Given that block verification does not increase code complexity, maintains the strong lossless guarantee of the standard speculative decoding verification algorithm, cannot deteriorate performance, and, in fact, consistently improves it, it can be used as a good default in speculative decoding implementations.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2403.10444
- https://arxiv.org/pdf/2403.10444
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4392933336
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4392933336Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2403.10444Digital Object Identifier
- Title
-
Block Verification Accelerates Speculative DecodingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-03-15Full publication date if available
- Authors
-
Ziteng Sun, Jae Hun Ro, Ahmad Beirami, Ananda Theertha SureshList of authors in order
- Landing page
-
https://arxiv.org/abs/2403.10444Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2403.10444Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2403.10444Direct OA link when available
- Concepts
-
Decoding methods, Computer science, Block (permutation group theory), Parallel computing, Arithmetic, Algorithm, Mathematics, CombinatoricsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4392933336 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2403.10444 |
| ids.doi | https://doi.org/10.48550/arxiv.2403.10444 |
| ids.openalex | https://openalex.org/W4392933336 |
| fwci | |
| type | preprint |
| title | Block Verification Accelerates Speculative Decoding |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10142 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.8593000173568726 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1703 |
| topics[0].subfield.display_name | Computational Theory and Mathematics |
| topics[0].display_name | Formal Methods in Verification |
| topics[1].id | https://openalex.org/T11697 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.7466999888420105 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1703 |
| topics[1].subfield.display_name | Computational Theory and Mathematics |
| topics[1].display_name | Numerical Methods and Algorithms |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C57273362 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6136338710784912 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q576722 |
| concepts[0].display_name | Decoding methods |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.578831136226654 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C2777210771 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5584269165992737 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q4927124 |
| concepts[2].display_name | Block (permutation group theory) |
| concepts[3].id | https://openalex.org/C173608175 |
| concepts[3].level | 1 |
| concepts[3].score | 0.32798290252685547 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[3].display_name | Parallel computing |
| concepts[4].id | https://openalex.org/C94375191 |
| concepts[4].level | 1 |
| concepts[4].score | 0.3251579999923706 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11205 |
| concepts[4].display_name | Arithmetic |
| concepts[5].id | https://openalex.org/C11413529 |
| concepts[5].level | 1 |
| concepts[5].score | 0.2887619733810425 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[5].display_name | Algorithm |
| concepts[6].id | https://openalex.org/C33923547 |
| concepts[6].level | 0 |
| concepts[6].score | 0.18833491206169128 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[6].display_name | Mathematics |
| concepts[7].id | https://openalex.org/C114614502 |
| concepts[7].level | 1 |
| concepts[7].score | 0.05554953217506409 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q76592 |
| concepts[7].display_name | Combinatorics |
| keywords[0].id | https://openalex.org/keywords/decoding-methods |
| keywords[0].score | 0.6136338710784912 |
| keywords[0].display_name | Decoding methods |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.578831136226654 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/block |
| keywords[2].score | 0.5584269165992737 |
| keywords[2].display_name | Block (permutation group theory) |
| keywords[3].id | https://openalex.org/keywords/parallel-computing |
| keywords[3].score | 0.32798290252685547 |
| keywords[3].display_name | Parallel computing |
| keywords[4].id | https://openalex.org/keywords/arithmetic |
| keywords[4].score | 0.3251579999923706 |
| keywords[4].display_name | Arithmetic |
| keywords[5].id | https://openalex.org/keywords/algorithm |
| keywords[5].score | 0.2887619733810425 |
| keywords[5].display_name | Algorithm |
| keywords[6].id | https://openalex.org/keywords/mathematics |
| keywords[6].score | 0.18833491206169128 |
| keywords[6].display_name | Mathematics |
| keywords[7].id | https://openalex.org/keywords/combinatorics |
| keywords[7].score | 0.05554953217506409 |
| keywords[7].display_name | Combinatorics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2403.10444 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2403.10444 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2403.10444 |
| locations[1].id | doi:10.48550/arxiv.2403.10444 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2403.10444 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5001128596 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Ziteng Sun |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Sun, Ziteng |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5075278288 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Jae Hun Ro |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ro, Jae Hun |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5008645615 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-1998-5271 |
| authorships[2].author.display_name | Ahmad Beirami |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Beirami, Ahmad |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5103890688 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Ananda Theertha Suresh |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Suresh, Ananda Theertha |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2403.10444 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-03-19T00:00:00 |
| display_name | Block Verification Accelerates Speculative Decoding |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10142 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.8593000173568726 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1703 |
| primary_topic.subfield.display_name | Computational Theory and Mathematics |
| primary_topic.display_name | Formal Methods in Verification |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W2382290278, https://openalex.org/W3203142394, https://openalex.org/W2478288626, https://openalex.org/W4391913857, https://openalex.org/W2029210135 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2403.10444 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2403.10444 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2403.10444 |
| primary_location.id | pmh:oai:arXiv.org:2403.10444 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2403.10444 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2403.10444 |
| publication_date | 2024-03-15 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 17, 22, 38, 47, 75, 136, 177 |
| abstract_inverted_index.In | 53 |
| abstract_inverted_index.It | 15 |
| abstract_inverted_index.We | 71, 91 |
| abstract_inverted_index.an | 3 |
| abstract_inverted_index.as | 176 |
| abstract_inverted_index.be | 174 |
| abstract_inverted_index.by | 32 |
| abstract_inverted_index.in | 30, 99, 135, 167, 180 |
| abstract_inverted_index.is | 2, 43, 58, 68, 97, 110 |
| abstract_inverted_index.it | 172 |
| abstract_inverted_index.of | 9, 24, 103, 133, 138, 156 |
| abstract_inverted_index.to | 20, 46 |
| abstract_inverted_index.we | 63 |
| abstract_inverted_index.and | 36, 86, 108, 140 |
| abstract_inverted_index.are | 27 |
| abstract_inverted_index.but | 123 |
| abstract_inverted_index.can | 173 |
| abstract_inverted_index.for | 6 |
| abstract_inverted_index.it, | 171 |
| abstract_inverted_index.not | 69, 147 |
| abstract_inverted_index.the | 33, 41, 50, 82, 94, 100, 114, 128, 152, 157 |
| abstract_inverted_index.and, | 166 |
| abstract_inverted_index.code | 149 |
| abstract_inverted_index.does | 146 |
| abstract_inverted_index.each | 106 |
| abstract_inverted_index.fast | 18 |
| abstract_inverted_index.from | 49 |
| abstract_inverted_index.good | 178 |
| abstract_inverted_index.over | 127 |
| abstract_inverted_index.show | 64 |
| abstract_inverted_index.than | 113 |
| abstract_inverted_index.that | 40, 65, 80, 93, 143 |
| abstract_inverted_index.then | 28 |
| abstract_inverted_index.this | 66 |
| abstract_inverted_index.used | 175 |
| abstract_inverted_index.uses | 16 |
| abstract_inverted_index.5%-8% | 134 |
| abstract_inverted_index.Block | 73 |
| abstract_inverted_index.Given | 142 |
| abstract_inverted_index.block | 23, 84, 119, 144 |
| abstract_inverted_index.draft | 21, 56, 77 |
| abstract_inverted_index.fact, | 168 |
| abstract_inverted_index.large | 10 |
| abstract_inverted_index.model | 19 |
| abstract_inverted_index.never | 111 |
| abstract_inverted_index.prior | 54 |
| abstract_inverted_index.prove | 92 |
| abstract_inverted_index.range | 137 |
| abstract_inverted_index.tasks | 139 |
| abstract_inverted_index.token | 130 |
| abstract_inverted_index.which | 26 |
| abstract_inverted_index.worse | 112 |
| abstract_inverted_index.cannot | 163 |
| abstract_inverted_index.during | 13 |
| abstract_inverted_index.entire | 83 |
| abstract_inverted_index.method | 5 |
| abstract_inverted_index.model, | 35 |
| abstract_inverted_index.model. | 52 |
| abstract_inverted_index.models | 12 |
| abstract_inverted_index.modest | 122 |
| abstract_inverted_index.number | 102 |
| abstract_inverted_index.output | 42 |
| abstract_inverted_index.sample | 48 |
| abstract_inverted_index.simple | 76 |
| abstract_inverted_index.strong | 153 |
| abstract_inverted_index.target | 34, 51 |
| abstract_inverted_index.tokens | 25, 104 |
| abstract_inverted_index.works, | 55 |
| abstract_inverted_index.default | 179 |
| abstract_inverted_index.jointly | 85 |
| abstract_inverted_index.optimal | 98 |
| abstract_inverted_index.propose | 72 |
| abstract_inverted_index.approach | 67 |
| abstract_inverted_index.decoding | 1, 160, 182 |
| abstract_inverted_index.expected | 101 |
| abstract_inverted_index.improves | 170 |
| abstract_inverted_index.increase | 148 |
| abstract_inverted_index.language | 11 |
| abstract_inverted_index.lossless | 7, 154 |
| abstract_inverted_index.optimal. | 70 |
| abstract_inverted_index.parallel | 31 |
| abstract_inverted_index.produced | 105 |
| abstract_inverted_index.proposed | 95 |
| abstract_inverted_index.provides | 37, 87, 121 |
| abstract_inverted_index.speedup. | 90 |
| abstract_inverted_index.speedups | 126 |
| abstract_inverted_index.standard | 115, 129, 158 |
| abstract_inverted_index.verified | 29 |
| abstract_inverted_index.verifies | 81 |
| abstract_inverted_index.algorithm | 79, 132 |
| abstract_inverted_index.datasets. | 141 |
| abstract_inverted_index.effective | 4 |
| abstract_inverted_index.guarantee | 39, 155 |
| abstract_inverted_index.iteration | 107 |
| abstract_inverted_index.maintains | 151 |
| abstract_inverted_index.mechanism | 96 |
| abstract_inverted_index.performed | 59 |
| abstract_inverted_index.additional | 88 |
| abstract_inverted_index.algorithm, | 162 |
| abstract_inverted_index.consistent | 124 |
| abstract_inverted_index.inference. | 14 |
| abstract_inverted_index.wall-clock | 89, 125 |
| abstract_inverted_index.Speculative | 0 |
| abstract_inverted_index.complexity, | 150 |
| abstract_inverted_index.deteriorate | 164 |
| abstract_inverted_index.distributed | 44 |
| abstract_inverted_index.identically | 45 |
| abstract_inverted_index.speculative | 159, 181 |
| abstract_inverted_index.token-level | 116 |
| abstract_inverted_index.Empirically, | 118 |
| abstract_inverted_index.acceleration | 8 |
| abstract_inverted_index.consistently | 169 |
| abstract_inverted_index.performance, | 165 |
| abstract_inverted_index.specifically | 109 |
| abstract_inverted_index.verification | 57, 78, 120, 131, 145, 161 |
| abstract_inverted_index.Surprisingly, | 62 |
| abstract_inverted_index.Verification, | 74 |
| abstract_inverted_index.independently | 60 |
| abstract_inverted_index.verification. | 117 |
| abstract_inverted_index.token-by-token. | 61 |
| abstract_inverted_index.implementations. | 183 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |