Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2406.16758
Large language models (LLMs) have revolutionized natural language processing and broadened their applicability across diverse commercial applications. However, the deployment of these models is constrained by high inference time in multilingual settings. To mitigate this challenge, this paper explores a training recipe of an assistant model in speculative decoding, which is leveraged to draft and-then its future tokens are verified by the target LLM. We show that language-specific draft models, optimized through a targeted pretrain-and-finetune strategy, substantially brings a speedup in inference time compared to the previous methods. We validate these models across various languages in inference time, out-of-domain speedup, and GPT-4o evaluation.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2406.16758
- https://arxiv.org/pdf/2406.16758
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4400024700
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4400024700Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2406.16758Digital Object Identifier
- Title
-
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized DraftersWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-06-24Full publication date if available
- Authors
-
Euiin Yi, Taehyeon Kim, Hongseok Jeung, Du-Seong Chang, Se-Young YunList of authors in order
- Landing page
-
https://arxiv.org/abs/2406.16758Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2406.16758Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2406.16758Direct OA link when available
- Concepts
-
Decoding methods, Inference, Computer science, Telecommunications, Artificial intelligenceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4400024700 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2406.16758 |
| ids.doi | https://doi.org/10.48550/arxiv.2406.16758 |
| ids.openalex | https://openalex.org/W4400024700 |
| fwci | |
| type | preprint |
| title | Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9973999857902527 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C57273362 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7667186260223389 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q576722 |
| concepts[0].display_name | Decoding methods |
| concepts[1].id | https://openalex.org/C2776214188 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6755660772323608 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q408386 |
| concepts[1].display_name | Inference |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.5619612336158752 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C76155785 |
| concepts[3].level | 1 |
| concepts[3].score | 0.25701189041137695 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q418 |
| concepts[3].display_name | Telecommunications |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.1976216435432434 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| keywords[0].id | https://openalex.org/keywords/decoding-methods |
| keywords[0].score | 0.7667186260223389 |
| keywords[0].display_name | Decoding methods |
| keywords[1].id | https://openalex.org/keywords/inference |
| keywords[1].score | 0.6755660772323608 |
| keywords[1].display_name | Inference |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.5619612336158752 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/telecommunications |
| keywords[3].score | 0.25701189041137695 |
| keywords[3].display_name | Telecommunications |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.1976216435432434 |
| keywords[4].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2406.16758 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2406.16758 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2406.16758 |
| locations[1].id | doi:10.48550/arxiv.2406.16758 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2406.16758 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5099497251 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Euiin Yi |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Yi, Euiin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100774197 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-5496-2625 |
| authorships[1].author.display_name | Taehyeon Kim |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Kim, Taehyeon |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5099497252 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Hongseok Jeung |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Jeung, Hongseok |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5066359770 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Du-Seong Chang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Chang, Du-Seong |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5091674853 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Se-Young Yun |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Yun, Se-Young |
| authorships[4].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2406.16758 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9973999857902527 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052, https://openalex.org/W2382290278, https://openalex.org/W4395014643 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2406.16758 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2406.16758 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2406.16758 |
| primary_location.id | pmh:oai:arXiv.org:2406.16758 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2406.16758 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2406.16758 |
| publication_date | 2024-06-24 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 39, 72, 78 |
| abstract_inverted_index.To | 32 |
| abstract_inverted_index.We | 64, 88 |
| abstract_inverted_index.an | 43 |
| abstract_inverted_index.by | 25, 60 |
| abstract_inverted_index.in | 29, 46, 80, 95 |
| abstract_inverted_index.is | 23, 50 |
| abstract_inverted_index.of | 20, 42 |
| abstract_inverted_index.to | 52, 84 |
| abstract_inverted_index.and | 9, 100 |
| abstract_inverted_index.are | 58 |
| abstract_inverted_index.its | 55 |
| abstract_inverted_index.the | 18, 61, 85 |
| abstract_inverted_index.LLM. | 63 |
| abstract_inverted_index.have | 4 |
| abstract_inverted_index.high | 26 |
| abstract_inverted_index.show | 65 |
| abstract_inverted_index.that | 66 |
| abstract_inverted_index.this | 34, 36 |
| abstract_inverted_index.time | 28, 82 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.draft | 53, 68 |
| abstract_inverted_index.model | 45 |
| abstract_inverted_index.paper | 37 |
| abstract_inverted_index.their | 11 |
| abstract_inverted_index.these | 21, 90 |
| abstract_inverted_index.time, | 97 |
| abstract_inverted_index.which | 49 |
| abstract_inverted_index.(LLMs) | 3 |
| abstract_inverted_index.GPT-4o | 101 |
| abstract_inverted_index.across | 13, 92 |
| abstract_inverted_index.brings | 77 |
| abstract_inverted_index.future | 56 |
| abstract_inverted_index.models | 2, 22, 91 |
| abstract_inverted_index.recipe | 41 |
| abstract_inverted_index.target | 62 |
| abstract_inverted_index.tokens | 57 |
| abstract_inverted_index.diverse | 14 |
| abstract_inverted_index.models, | 69 |
| abstract_inverted_index.natural | 6 |
| abstract_inverted_index.speedup | 79 |
| abstract_inverted_index.through | 71 |
| abstract_inverted_index.various | 93 |
| abstract_inverted_index.However, | 17 |
| abstract_inverted_index.and-then | 54 |
| abstract_inverted_index.compared | 83 |
| abstract_inverted_index.explores | 38 |
| abstract_inverted_index.language | 1, 7 |
| abstract_inverted_index.methods. | 87 |
| abstract_inverted_index.mitigate | 33 |
| abstract_inverted_index.previous | 86 |
| abstract_inverted_index.speedup, | 99 |
| abstract_inverted_index.targeted | 73 |
| abstract_inverted_index.training | 40 |
| abstract_inverted_index.validate | 89 |
| abstract_inverted_index.verified | 59 |
| abstract_inverted_index.assistant | 44 |
| abstract_inverted_index.broadened | 10 |
| abstract_inverted_index.decoding, | 48 |
| abstract_inverted_index.inference | 27, 81, 96 |
| abstract_inverted_index.languages | 94 |
| abstract_inverted_index.leveraged | 51 |
| abstract_inverted_index.optimized | 70 |
| abstract_inverted_index.settings. | 31 |
| abstract_inverted_index.strategy, | 75 |
| abstract_inverted_index.challenge, | 35 |
| abstract_inverted_index.commercial | 15 |
| abstract_inverted_index.deployment | 19 |
| abstract_inverted_index.processing | 8 |
| abstract_inverted_index.constrained | 24 |
| abstract_inverted_index.evaluation. | 102 |
| abstract_inverted_index.speculative | 47 |
| abstract_inverted_index.multilingual | 30 |
| abstract_inverted_index.applicability | 12 |
| abstract_inverted_index.applications. | 16 |
| abstract_inverted_index.out-of-domain | 98 |
| abstract_inverted_index.substantially | 76 |
| abstract_inverted_index.revolutionized | 5 |
| abstract_inverted_index.language-specific | 67 |
| abstract_inverted_index.pretrain-and-finetune | 74 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |