Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2210.15458
Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and others), are embarrassingly parallel, but have no guarantees about duplicate samples. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model. We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2210.15458
- https://arxiv.org/pdf/2210.15458
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4307537154
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4307537154Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2210.15458Digital Object Identifier
- Title
-
Arithmetic Sampling: Parallel Diverse Decoding for Large Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-10-18Full publication date if available
- Authors
-
Luke Vilnis, Yury Zemlyanskiy, Patrick R. Murray, A. M. A. dos Passos, Sumit SanghaiList of authors in order
- Landing page
-
https://arxiv.org/abs/2210.15458Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2210.15458Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2210.15458Direct OA link when available
- Concepts
-
Sampling (signal processing), Decoding methods, Computer science, Machine translation, Embarrassingly parallel, Computation, Code (set theory), Parallel computing, Algorithm, Artificial intelligence, Telecommunications, Detector, Set (abstract data type), Programming languageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4307537154 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2210.15458 |
| ids.doi | https://doi.org/10.48550/arxiv.2210.15458 |
| ids.openalex | https://openalex.org/W4307537154 |
| fwci | |
| type | preprint |
| title | Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9980000257492065 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9973000288009644 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| topics[2].id | https://openalex.org/T12535 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9869999885559082 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Machine Learning and Data Classification |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C140779682 |
| concepts[0].level | 3 |
| concepts[0].score | 0.7810869216918945 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q210868 |
| concepts[0].display_name | Sampling (signal processing) |
| concepts[1].id | https://openalex.org/C57273362 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6825367212295532 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q576722 |
| concepts[1].display_name | Decoding methods |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.672345757484436 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C203005215 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5911448001861572 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q79798 |
| concepts[3].display_name | Machine translation |
| concepts[4].id | https://openalex.org/C126909462 |
| concepts[4].level | 3 |
| concepts[4].score | 0.5024368762969971 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q5369501 |
| concepts[4].display_name | Embarrassingly parallel |
| concepts[5].id | https://openalex.org/C45374587 |
| concepts[5].level | 2 |
| concepts[5].score | 0.49297189712524414 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q12525525 |
| concepts[5].display_name | Computation |
| concepts[6].id | https://openalex.org/C2776760102 |
| concepts[6].level | 3 |
| concepts[6].score | 0.4538726210594177 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q5139990 |
| concepts[6].display_name | Code (set theory) |
| concepts[7].id | https://openalex.org/C173608175 |
| concepts[7].level | 1 |
| concepts[7].score | 0.4286465346813202 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[7].display_name | Parallel computing |
| concepts[8].id | https://openalex.org/C11413529 |
| concepts[8].level | 1 |
| concepts[8].score | 0.4259936809539795 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[8].display_name | Algorithm |
| concepts[9].id | https://openalex.org/C154945302 |
| concepts[9].level | 1 |
| concepts[9].score | 0.2605627179145813 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[9].display_name | Artificial intelligence |
| concepts[10].id | https://openalex.org/C76155785 |
| concepts[10].level | 1 |
| concepts[10].score | 0.07697007060050964 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q418 |
| concepts[10].display_name | Telecommunications |
| concepts[11].id | https://openalex.org/C94915269 |
| concepts[11].level | 2 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q1834857 |
| concepts[11].display_name | Detector |
| concepts[12].id | https://openalex.org/C177264268 |
| concepts[12].level | 2 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[12].display_name | Set (abstract data type) |
| concepts[13].id | https://openalex.org/C199360897 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[13].display_name | Programming language |
| keywords[0].id | https://openalex.org/keywords/sampling |
| keywords[0].score | 0.7810869216918945 |
| keywords[0].display_name | Sampling (signal processing) |
| keywords[1].id | https://openalex.org/keywords/decoding-methods |
| keywords[1].score | 0.6825367212295532 |
| keywords[1].display_name | Decoding methods |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.672345757484436 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/machine-translation |
| keywords[3].score | 0.5911448001861572 |
| keywords[3].display_name | Machine translation |
| keywords[4].id | https://openalex.org/keywords/embarrassingly-parallel |
| keywords[4].score | 0.5024368762969971 |
| keywords[4].display_name | Embarrassingly parallel |
| keywords[5].id | https://openalex.org/keywords/computation |
| keywords[5].score | 0.49297189712524414 |
| keywords[5].display_name | Computation |
| keywords[6].id | https://openalex.org/keywords/code |
| keywords[6].score | 0.4538726210594177 |
| keywords[6].display_name | Code (set theory) |
| keywords[7].id | https://openalex.org/keywords/parallel-computing |
| keywords[7].score | 0.4286465346813202 |
| keywords[7].display_name | Parallel computing |
| keywords[8].id | https://openalex.org/keywords/algorithm |
| keywords[8].score | 0.4259936809539795 |
| keywords[8].display_name | Algorithm |
| keywords[9].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[9].score | 0.2605627179145813 |
| keywords[9].display_name | Artificial intelligence |
| keywords[10].id | https://openalex.org/keywords/telecommunications |
| keywords[10].score | 0.07697007060050964 |
| keywords[10].display_name | Telecommunications |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2210.15458 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2210.15458 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2210.15458 |
| locations[1].id | doi:10.48550/arxiv.2210.15458 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2210.15458 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5016501970 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Luke Vilnis |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Vilnis, Luke |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5036319063 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Yury Zemlyanskiy |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zemlyanskiy, Yury |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5091794841 |
| authorships[2].author.orcid | https://orcid.org/0009-0004-3584-0808 |
| authorships[2].author.display_name | Patrick R. Murray |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Murray, Patrick |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5018424374 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-9917-0688 |
| authorships[3].author.display_name | A. M. A. dos Passos |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Passos, Alexandre |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5012884492 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Sumit Sanghai |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Sanghai, Sumit |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2210.15458 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9980000257492065 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W1522943736, https://openalex.org/W4295122399, https://openalex.org/W2001581899, https://openalex.org/W1826438552, https://openalex.org/W2122454857, https://openalex.org/W3011059803, https://openalex.org/W2093790547, https://openalex.org/W4295125675, https://openalex.org/W1968289971, https://openalex.org/W4300588357 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2210.15458 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2210.15458 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2210.15458 |
| primary_location.id | pmh:oai:arXiv.org:2210.15458 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2210.15458 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2210.15458 |
| publication_date | 2022-10-18 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 27, 71, 84 |
| abstract_inverted_index.We | 69, 116 |
| abstract_inverted_index.an | 77 |
| abstract_inverted_index.as | 18, 45, 100, 102 |
| abstract_inverted_index.by | 83, 151 |
| abstract_inverted_index.no | 64 |
| abstract_inverted_index.of | 10, 14, 33, 120 |
| abstract_inverted_index.on | 123 |
| abstract_inverted_index.to | 40, 76, 153 |
| abstract_inverted_index.up | 152 |
| abstract_inverted_index.WMT | 124 |
| abstract_inverted_index.and | 12, 21, 48, 57, 106, 109, 139, 148 |
| abstract_inverted_index.are | 37, 59 |
| abstract_inverted_index.but | 36, 62 |
| abstract_inverted_index.can | 25 |
| abstract_inverted_index.for | 2, 30, 73 |
| abstract_inverted_index.gap | 144 |
| abstract_inverted_index.its | 49 |
| abstract_inverted_index.not | 38 |
| abstract_inverted_index.our | 121 |
| abstract_inverted_index.the | 34, 113, 118, 130, 141 |
| abstract_inverted_index.63%. | 154 |
| abstract_inverted_index.BLEU | 136, 142 |
| abstract_inverted_index.beam | 19, 95, 149 |
| abstract_inverted_index.book | 80 |
| abstract_inverted_index.code | 79 |
| abstract_inverted_index.each | 31 |
| abstract_inverted_index.easy | 39 |
| abstract_inverted_index.from | 112 |
| abstract_inverted_index.have | 63 |
| abstract_inverted_index.more | 127 |
| abstract_inverted_index.such | 17, 44 |
| abstract_inverted_index.than | 128 |
| abstract_inverted_index.well | 101 |
| abstract_inverted_index.when | 133 |
| abstract_inverted_index.with | 89, 93 |
| abstract_inverted_index.about | 66 |
| abstract_inverted_index.beam, | 35 |
| abstract_inverted_index.being | 103 |
| abstract_inverted_index.large | 3, 85 |
| abstract_inverted_index.often | 6 |
| abstract_inverted_index.score | 137, 143 |
| abstract_inverted_index.top-k | 23 |
| abstract_inverted_index.under | 97 |
| abstract_inverted_index.(top-k | 51 |
| abstract_inverted_index.Gumbel | 22 |
| abstract_inverted_index.common | 90 |
| abstract_inverted_index.model, | 87 |
| abstract_inverted_index.model. | 115 |
| abstract_inverted_index.models | 5 |
| abstract_inverted_index.output | 29 |
| abstract_inverted_index.search | 20, 150 |
| abstract_inverted_index.Methods | 16 |
| abstract_inverted_index.between | 8, 145 |
| abstract_inverted_index.certain | 98 |
| abstract_inverted_index.closing | 140 |
| abstract_inverted_index.defined | 82 |
| abstract_inverted_index.element | 32 |
| abstract_inverted_index.halving | 129 |
| abstract_inverted_index.machine | 125 |
| abstract_inverted_index.methods | 1, 43 |
| abstract_inverted_index.nucleus | 53 |
| abstract_inverted_index.outputs | 11 |
| abstract_inverted_index.present | 70 |
| abstract_inverted_index.reward, | 138 |
| abstract_inverted_index.typical | 55 |
| abstract_inverted_index.Decoding | 0 |
| abstract_inverted_index.approach | 122 |
| abstract_inverted_index.expected | 135 |
| abstract_inverted_index.language | 4, 86 |
| abstract_inverted_index.original | 114 |
| abstract_inverted_index.others), | 58 |
| abstract_inverted_index.parallel | 105 |
| abstract_inverted_index.provable | 94 |
| abstract_inverted_index.samples. | 68 |
| abstract_inverted_index.sampling | 24, 47, 74, 91, 147 |
| abstract_inverted_index.standard | 131 |
| abstract_inverted_index.unbiased | 108 |
| abstract_inverted_index.according | 75 |
| abstract_inverted_index.decoding, | 56 |
| abstract_inverted_index.deviation | 132 |
| abstract_inverted_index.different | 28 |
| abstract_inverted_index.diversity | 9, 96 |
| abstract_inverted_index.duplicate | 67 |
| abstract_inverted_index.framework | 72 |
| abstract_inverted_index.guarantee | 26 |
| abstract_inverted_index.parallel, | 61 |
| abstract_inverted_index.providing | 107 |
| abstract_inverted_index.sampling, | 52, 54 |
| abstract_inverted_index.trade-off | 7 |
| abstract_inverted_index.arithmetic | 78 |
| abstract_inverted_index.compatible | 88 |
| abstract_inverted_index.consistent | 110 |
| abstract_inverted_index.estimating | 134 |
| abstract_inverted_index.guarantees | 65 |
| abstract_inverted_index.implicitly | 81 |
| abstract_inverted_index.conditions, | 99 |
| abstract_inverted_index.demonstrate | 117 |
| abstract_inverted_index.independent | 146 |
| abstract_inverted_index.parallelism | 13 |
| abstract_inverted_index.temperature | 46 |
| abstract_inverted_index.variations, | 92 |
| abstract_inverted_index.computation. | 15 |
| abstract_inverted_index.expectations | 111 |
| abstract_inverted_index.parallelize. | 41 |
| abstract_inverted_index.translation, | 126 |
| abstract_inverted_index.effectiveness | 119 |
| abstract_inverted_index.modifications | 50 |
| abstract_inverted_index.Alternatively, | 42 |
| abstract_inverted_index.embarrassingly | 60, 104 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.41999998688697815 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |