Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2407.08223
Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Speculative RAG - a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. Each draft is generated from a distinct subset of retrieved documents, offering diverse perspectives on the evidence while reducing input token counts per draft. This approach enhances comprehension of each subset and mitigates potential position bias over long context. Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts. Extensive experiments demonstrate that Speculative RAG achieves state-of-the-art performance with reduced latency on TriviaQA, MuSiQue, PopQA, PubHealth, and ARC-Challenge benchmarks. It notably enhances accuracy by up to 12.97% while reducing latency by 50.83% compared to conventional RAG systems on PubHealth.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2407.08223
- https://arxiv.org/pdf/2407.08223
- OA Status
- green
- Cited By
- 6
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4400611538
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4400611538Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2407.08223Digital Object Identifier
- Title
-
Speculative RAG: Enhancing Retrieval Augmented Generation through DraftingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-07-11Full publication date if available
- Authors
-
Zilong Wang, Zifeng Wang, Long Tan Le, Huaixiu Zheng, Swaroop Mishra, Vinçent Pérot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas PfisterList of authors in order
- Landing page
-
https://arxiv.org/abs/2407.08223Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2407.08223Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2407.08223Direct OA link when available
- Concepts
-
Information retrieval, Computer science, Computer graphics (images), Engineering drawing, EngineeringTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
6Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 4, 2024: 2Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4400611538 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2407.08223 |
| ids.doi | https://doi.org/10.48550/arxiv.2407.08223 |
| ids.openalex | https://openalex.org/W4400611538 |
| fwci | |
| type | preprint |
| title | Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9725000262260437 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T10215 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9315999746322632 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Semantic Web and Ontologies |
| topics[2].id | https://openalex.org/T10028 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9017000198364258 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Topic Modeling |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C23123220 |
| concepts[0].level | 1 |
| concepts[0].score | 0.44623687863349915 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[0].display_name | Information retrieval |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.44378310441970825 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C121684516 |
| concepts[2].level | 1 |
| concepts[2].score | 0.3365703821182251 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q7600677 |
| concepts[2].display_name | Computer graphics (images) |
| concepts[3].id | https://openalex.org/C199639397 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3279005289077759 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1788588 |
| concepts[3].display_name | Engineering drawing |
| concepts[4].id | https://openalex.org/C127413603 |
| concepts[4].level | 0 |
| concepts[4].score | 0.19370296597480774 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[4].display_name | Engineering |
| keywords[0].id | https://openalex.org/keywords/information-retrieval |
| keywords[0].score | 0.44623687863349915 |
| keywords[0].display_name | Information retrieval |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.44378310441970825 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/computer-graphics |
| keywords[2].score | 0.3365703821182251 |
| keywords[2].display_name | Computer graphics (images) |
| keywords[3].id | https://openalex.org/keywords/engineering-drawing |
| keywords[3].score | 0.3279005289077759 |
| keywords[3].display_name | Engineering drawing |
| keywords[4].id | https://openalex.org/keywords/engineering |
| keywords[4].score | 0.19370296597480774 |
| keywords[4].display_name | Engineering |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2407.08223 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2407.08223 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2407.08223 |
| locations[1].id | doi:10.48550/arxiv.2407.08223 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2407.08223 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100384097 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0551-7899 |
| authorships[0].author.display_name | Zilong Wang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wang, Zilong |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100733697 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-0068-9042 |
| authorships[1].author.display_name | Zifeng Wang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wang, Zifeng |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5103325289 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-3284-1990 |
| authorships[2].author.display_name | Long Tan Le |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Le, Long |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5003807260 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Huaixiu Zheng |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zheng, Huaixiu Steven |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5063722751 |
| authorships[4].author.orcid | https://orcid.org/0009-0001-6413-7001 |
| authorships[4].author.display_name | Swaroop Mishra |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Mishra, Swaroop |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5057389913 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Vinçent Pérot |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Perot, Vincent |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5100326579 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-4616-1067 |
| authorships[6].author.display_name | Yuwei Zhang |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Zhang, Yuwei |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5104435492 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Anush Mattapalli |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Mattapalli, Anush |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5069391199 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Ankur Taly |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Taly, Ankur |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5039500313 |
| authorships[9].author.orcid | https://orcid.org/0000-0002-7249-4404 |
| authorships[9].author.display_name | Jingbo Shang |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Shang, Jingbo |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5068372754 |
| authorships[10].author.orcid | |
| authorships[10].author.display_name | Chen-Yu Lee |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Lee, Chen-Yu |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5101265241 |
| authorships[11].author.orcid | |
| authorships[11].author.display_name | Tomas Pfister |
| authorships[11].author_position | last |
| authorships[11].raw_author_name | Pfister, Tomas |
| authorships[11].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2407.08223 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9725000262260437 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052, https://openalex.org/W2382290278, https://openalex.org/W4395014643 |
| cited_by_count | 6 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 4 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 2 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2407.08223 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2407.08223 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2407.08223 |
| primary_location.id | pmh:oai:arXiv.org:2407.08223 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2407.08223 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2407.08223 |
| publication_date | 2024-07-11 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.- | 53 |
| abstract_inverted_index.a | 54, 58, 72, 82, 134 |
| abstract_inverted_index.In | 46 |
| abstract_inverted_index.It | 161 |
| abstract_inverted_index.LM | 61, 132 |
| abstract_inverted_index.by | 71, 120, 165, 172 |
| abstract_inverted_index.in | 69 |
| abstract_inverted_index.is | 79 |
| abstract_inverted_index.of | 8, 44, 85, 105 |
| abstract_inverted_index.on | 28, 91, 153, 179 |
| abstract_inverted_index.or | 36 |
| abstract_inverted_index.to | 17, 62, 123, 167, 175 |
| abstract_inverted_index.up | 166 |
| abstract_inverted_index.we | 49 |
| abstract_inverted_index.LLM | 34 |
| abstract_inverted_index.LM, | 127 |
| abstract_inverted_index.LM. | 76 |
| abstract_inverted_index.Our | 116 |
| abstract_inverted_index.RAG | 25, 52, 66, 119, 146, 177 |
| abstract_inverted_index.and | 21, 108, 158 |
| abstract_inverted_index.per | 99 |
| abstract_inverted_index.the | 5, 92, 124, 129, 139 |
| abstract_inverted_index.Each | 77 |
| abstract_inverted_index.This | 101 |
| abstract_inverted_index.bias | 112 |
| abstract_inverted_index.each | 106 |
| abstract_inverted_index.from | 81 |
| abstract_inverted_index.long | 114 |
| abstract_inverted_index.more | 19 |
| abstract_inverted_index.over | 113, 138 |
| abstract_inverted_index.pass | 137 |
| abstract_inverted_index.that | 56, 144 |
| abstract_inverted_index.this | 47 |
| abstract_inverted_index.with | 13, 128, 150 |
| abstract_inverted_index.(RAG) | 3 |
| abstract_inverted_index.LLMs. | 45 |
| abstract_inverted_index.draft | 78 |
| abstract_inverted_index.focus | 27 |
| abstract_inverted_index.input | 96 |
| abstract_inverted_index.large | 9 |
| abstract_inverted_index.token | 97 |
| abstract_inverted_index.while | 94, 169 |
| abstract_inverted_index.work, | 48 |
| abstract_inverted_index.(LLMs) | 12 |
| abstract_inverted_index.12.97% | 168 |
| abstract_inverted_index.50.83% | 173 |
| abstract_inverted_index.PopQA, | 156 |
| abstract_inverted_index.Recent | 24 |
| abstract_inverted_index.counts | 98 |
| abstract_inverted_index.draft. | 100 |
| abstract_inverted_index.drafts | 67 |
| abstract_inverted_index.larger | 59, 130 |
| abstract_inverted_index.method | 117 |
| abstract_inverted_index.models | 11 |
| abstract_inverted_index.single | 135 |
| abstract_inverted_index.subset | 84, 107 |
| abstract_inverted_index.tuning | 43 |
| abstract_inverted_index.verify | 64 |
| abstract_inverted_index.diverse | 89 |
| abstract_inverted_index.drafts. | 140 |
| abstract_inverted_index.latency | 152, 171 |
| abstract_inverted_index.notably | 162 |
| abstract_inverted_index.provide | 18 |
| abstract_inverted_index.reduced | 151 |
| abstract_inverted_index.smaller | 125 |
| abstract_inverted_index.sources | 16 |
| abstract_inverted_index.systems | 178 |
| abstract_inverted_index.through | 32, 40 |
| abstract_inverted_index.MuSiQue, | 155 |
| abstract_inverted_index.accuracy | 164 |
| abstract_inverted_index.accurate | 20 |
| abstract_inverted_index.achieves | 147 |
| abstract_inverted_index.acquired | 39 |
| abstract_inverted_index.approach | 102 |
| abstract_inverted_index.combines | 4 |
| abstract_inverted_index.compared | 174 |
| abstract_inverted_index.context. | 115 |
| abstract_inverted_index.distinct | 83 |
| abstract_inverted_index.drafting | 122 |
| abstract_inverted_index.enhances | 103, 163 |
| abstract_inverted_index.evidence | 93 |
| abstract_inverted_index.external | 14 |
| abstract_inverted_index.language | 10 |
| abstract_inverted_index.multiple | 65 |
| abstract_inverted_index.offering | 88 |
| abstract_inverted_index.outcomes | 31 |
| abstract_inverted_index.parallel | 70 |
| abstract_inverted_index.position | 111 |
| abstract_inverted_index.produced | 68 |
| abstract_inverted_index.reducing | 95, 170 |
| abstract_inverted_index.smaller, | 73 |
| abstract_inverted_index.Extensive | 141 |
| abstract_inverted_index.Retrieval | 0 |
| abstract_inverted_index.TriviaQA, | 154 |
| abstract_inverted_index.abilities | 7 |
| abstract_inverted_index.augmented | 1 |
| abstract_inverted_index.distilled | 74 |
| abstract_inverted_index.framework | 55 |
| abstract_inverted_index.generated | 80 |
| abstract_inverted_index.improving | 29 |
| abstract_inverted_index.introduce | 50 |
| abstract_inverted_index.iterative | 33 |
| abstract_inverted_index.knowledge | 15 |
| abstract_inverted_index.leverages | 57 |
| abstract_inverted_index.mitigates | 109 |
| abstract_inverted_index.potential | 110 |
| abstract_inverted_index.retrieval | 30 |
| abstract_inverted_index.retrieved | 86 |
| abstract_inverted_index.PubHealth, | 157 |
| abstract_inverted_index.PubHealth. | 180 |
| abstract_inverted_index.additional | 41 |
| abstract_inverted_index.delegating | 121 |
| abstract_inverted_index.documents, | 87 |
| abstract_inverted_index.generalist | 60, 131 |
| abstract_inverted_index.generation | 2 |
| abstract_inverted_index.generative | 6 |
| abstract_inverted_index.performing | 133 |
| abstract_inverted_index.refinement | 35 |
| abstract_inverted_index.responses. | 23 |
| abstract_inverted_index.specialist | 75, 126 |
| abstract_inverted_index.up-to-date | 22 |
| abstract_inverted_index.Speculative | 51, 145 |
| abstract_inverted_index.accelerates | 118 |
| abstract_inverted_index.benchmarks. | 160 |
| abstract_inverted_index.demonstrate | 143 |
| abstract_inverted_index.efficiently | 63 |
| abstract_inverted_index.experiments | 142 |
| abstract_inverted_index.instruction | 42 |
| abstract_inverted_index.performance | 149 |
| abstract_inverted_index.advancements | 26 |
| abstract_inverted_index.capabilities | 38 |
| abstract_inverted_index.conventional | 176 |
| abstract_inverted_index.perspectives | 90 |
| abstract_inverted_index.verification | 136 |
| abstract_inverted_index.ARC-Challenge | 159 |
| abstract_inverted_index.comprehension | 104 |
| abstract_inverted_index.self-critique | 37 |
| abstract_inverted_index.state-of-the-art | 148 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 12 |
| citation_normalized_percentile |