Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2405.13226
Large language models (LLMs) are commonly trained on datasets consisting of fixed-length token sequences. These datasets are created by randomly concatenating documents of various lengths and then chunking them into sequences of a predetermined target length (concat-and-chunk). Recent attention implementations mask cross-document attention, reducing the effective length of a chunk of tokens. Additionally, training on long sequences becomes computationally prohibitive due to the quadratic cost of attention. In this study, we introduce dataset decomposition, a novel variable sequence length training technique, to tackle these challenges. We decompose a dataset into a union of buckets, each containing sequences of the same size extracted from a unique document. During training, we use variable sequence length and batch-size, sampling simultaneously from all buckets with a curriculum. In contrast to the concat-and-chunk baseline, which incurs a fixed attention cost at every step of training, our proposed method incurs a computational cost proportional to the actual document lengths at each step, resulting in significant savings in training time. We train an 8k context-length 1B model at the same cost as a 2k context-length model trained with the baseline approach. Experiments on a web-scale corpus demonstrate that our approach significantly enhances performance on standard language evaluations and long-context benchmarks, reaching target accuracy with up to 6x faster training compared to the baseline. Our method not only enables efficient pretraining on long sequences but also scales effectively with dataset size. Lastly, we shed light on a critical yet less studied aspect of training large language models: the distribution and curriculum of sequence lengths, which results in a non-negligible difference in performance.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2405.13226
- https://arxiv.org/pdf/2405.13226
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4398795109
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4398795109Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2405.13226Digital Object Identifier
- Title
-
Dataset Decomposition: Faster LLM Training with Variable Sequence Length CurriculumWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-05-21Full publication date if available
- Authors
-
Hadi Pouransari, Chunliang Li, Jen-Hao Rick Chang, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Oncel TuzelList of authors in order
- Landing page
-
https://arxiv.org/abs/2405.13226Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2405.13226Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2405.13226Direct OA link when available
- Concepts
-
Sequence (biology), Decomposition, Curriculum, Variable (mathematics), Training (meteorology), Computer science, Mathematics education, Artificial intelligence, Psychology, Mathematics, Pedagogy, Geography, Biology, Genetics, Mathematical analysis, Meteorology, EcologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4398795109 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2405.13226 |
| ids.doi | https://doi.org/10.48550/arxiv.2405.13226 |
| ids.openalex | https://openalex.org/W4398795109 |
| fwci | |
| type | preprint |
| title | Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12535 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.8780999779701233 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Machine Learning and Data Classification |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.8252000212669373 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| topics[2].id | https://openalex.org/T10028 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.8044999837875366 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Topic Modeling |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2778112365 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6676957011222839 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q3511065 |
| concepts[0].display_name | Sequence (biology) |
| concepts[1].id | https://openalex.org/C124681953 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6557350158691406 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q339062 |
| concepts[1].display_name | Decomposition |
| concepts[2].id | https://openalex.org/C47177190 |
| concepts[2].level | 2 |
| concepts[2].score | 0.599926769733429 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q207137 |
| concepts[2].display_name | Curriculum |
| concepts[3].id | https://openalex.org/C182365436 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5398591756820679 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q50701 |
| concepts[3].display_name | Variable (mathematics) |
| concepts[4].id | https://openalex.org/C2777211547 |
| concepts[4].level | 2 |
| concepts[4].score | 0.531351625919342 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q17141490 |
| concepts[4].display_name | Training (meteorology) |
| concepts[5].id | https://openalex.org/C41008148 |
| concepts[5].level | 0 |
| concepts[5].score | 0.4534866213798523 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[5].display_name | Computer science |
| concepts[6].id | https://openalex.org/C145420912 |
| concepts[6].level | 1 |
| concepts[6].score | 0.3748461604118347 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q853077 |
| concepts[6].display_name | Mathematics education |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.361674040555954 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C15744967 |
| concepts[8].level | 0 |
| concepts[8].score | 0.2736992835998535 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[8].display_name | Psychology |
| concepts[9].id | https://openalex.org/C33923547 |
| concepts[9].level | 0 |
| concepts[9].score | 0.22329947352409363 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[9].display_name | Mathematics |
| concepts[10].id | https://openalex.org/C19417346 |
| concepts[10].level | 1 |
| concepts[10].score | 0.14317730069160461 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q7922 |
| concepts[10].display_name | Pedagogy |
| concepts[11].id | https://openalex.org/C205649164 |
| concepts[11].level | 0 |
| concepts[11].score | 0.1308712363243103 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[11].display_name | Geography |
| concepts[12].id | https://openalex.org/C86803240 |
| concepts[12].level | 0 |
| concepts[12].score | 0.12949803471565247 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[12].display_name | Biology |
| concepts[13].id | https://openalex.org/C54355233 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0827188789844513 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q7162 |
| concepts[13].display_name | Genetics |
| concepts[14].id | https://openalex.org/C134306372 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[14].display_name | Mathematical analysis |
| concepts[15].id | https://openalex.org/C153294291 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q25261 |
| concepts[15].display_name | Meteorology |
| concepts[16].id | https://openalex.org/C18903297 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q7150 |
| concepts[16].display_name | Ecology |
| keywords[0].id | https://openalex.org/keywords/sequence |
| keywords[0].score | 0.6676957011222839 |
| keywords[0].display_name | Sequence (biology) |
| keywords[1].id | https://openalex.org/keywords/decomposition |
| keywords[1].score | 0.6557350158691406 |
| keywords[1].display_name | Decomposition |
| keywords[2].id | https://openalex.org/keywords/curriculum |
| keywords[2].score | 0.599926769733429 |
| keywords[2].display_name | Curriculum |
| keywords[3].id | https://openalex.org/keywords/variable |
| keywords[3].score | 0.5398591756820679 |
| keywords[3].display_name | Variable (mathematics) |
| keywords[4].id | https://openalex.org/keywords/training |
| keywords[4].score | 0.531351625919342 |
| keywords[4].display_name | Training (meteorology) |
| keywords[5].id | https://openalex.org/keywords/computer-science |
| keywords[5].score | 0.4534866213798523 |
| keywords[5].display_name | Computer science |
| keywords[6].id | https://openalex.org/keywords/mathematics-education |
| keywords[6].score | 0.3748461604118347 |
| keywords[6].display_name | Mathematics education |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.361674040555954 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/psychology |
| keywords[8].score | 0.2736992835998535 |
| keywords[8].display_name | Psychology |
| keywords[9].id | https://openalex.org/keywords/mathematics |
| keywords[9].score | 0.22329947352409363 |
| keywords[9].display_name | Mathematics |
| keywords[10].id | https://openalex.org/keywords/pedagogy |
| keywords[10].score | 0.14317730069160461 |
| keywords[10].display_name | Pedagogy |
| keywords[11].id | https://openalex.org/keywords/geography |
| keywords[11].score | 0.1308712363243103 |
| keywords[11].display_name | Geography |
| keywords[12].id | https://openalex.org/keywords/biology |
| keywords[12].score | 0.12949803471565247 |
| keywords[12].display_name | Biology |
| keywords[13].id | https://openalex.org/keywords/genetics |
| keywords[13].score | 0.0827188789844513 |
| keywords[13].display_name | Genetics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2405.13226 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2405.13226 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2405.13226 |
| locations[1].id | doi:10.48550/arxiv.2405.13226 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2405.13226 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5059295598 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Hadi Pouransari |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Pouransari, Hadi |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5102972645 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-5938-5510 |
| authorships[1].author.display_name | Chunliang Li |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Li, Chun-Liang |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5066816752 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Jen-Hao Rick Chang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Chang, Jen-Hao Rick |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5013120724 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Pavan Kumar Anasosalu Vasu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Vasu, Pavan Kumar Anasosalu |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5104313177 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Cem Koc |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Koc, Cem |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5112327867 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Vaishaal Shankar |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Shankar, Vaishaal |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5028613002 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Oncel Tuzel |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Tuzel, Oncel |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2405.13226 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12535 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.8780999779701233 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Machine Learning and Data Classification |
| related_works | https://openalex.org/W230091440, https://openalex.org/W2233261550, https://openalex.org/W2810751659, https://openalex.org/W258997015, https://openalex.org/W2997094352, https://openalex.org/W3216976533, https://openalex.org/W100620283, https://openalex.org/W2495260952, https://openalex.org/W4366179611, https://openalex.org/W2996078371 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2405.13226 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2405.13226 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2405.13226 |
| primary_location.id | pmh:oai:arXiv.org:2405.13226 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2405.13226 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2405.13226 |
| publication_date | 2024-05-21 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 32, 48, 74, 87, 90, 103, 121, 131, 144, 175, 186, 238, 259 |
| abstract_inverted_index.1B | 168 |
| abstract_inverted_index.2k | 176 |
| abstract_inverted_index.6x | 209 |
| abstract_inverted_index.8k | 166 |
| abstract_inverted_index.In | 67, 123 |
| abstract_inverted_index.We | 85, 163 |
| abstract_inverted_index.an | 165 |
| abstract_inverted_index.as | 174 |
| abstract_inverted_index.at | 135, 153, 170 |
| abstract_inverted_index.by | 18 |
| abstract_inverted_index.in | 157, 160, 258, 262 |
| abstract_inverted_index.of | 10, 22, 31, 47, 50, 65, 92, 97, 138, 244, 253 |
| abstract_inverted_index.on | 7, 54, 185, 196, 223, 237 |
| abstract_inverted_index.to | 61, 81, 125, 148, 208, 213 |
| abstract_inverted_index.up | 207 |
| abstract_inverted_index.we | 70, 108, 234 |
| abstract_inverted_index.Our | 216 |
| abstract_inverted_index.all | 118 |
| abstract_inverted_index.and | 25, 113, 200, 251 |
| abstract_inverted_index.are | 4, 16 |
| abstract_inverted_index.but | 226 |
| abstract_inverted_index.due | 60 |
| abstract_inverted_index.not | 218 |
| abstract_inverted_index.our | 140, 191 |
| abstract_inverted_index.the | 44, 62, 98, 126, 149, 171, 181, 214, 249 |
| abstract_inverted_index.use | 109 |
| abstract_inverted_index.yet | 240 |
| abstract_inverted_index.also | 227 |
| abstract_inverted_index.cost | 64, 134, 146, 173 |
| abstract_inverted_index.each | 94, 154 |
| abstract_inverted_index.from | 102, 117 |
| abstract_inverted_index.into | 29, 89 |
| abstract_inverted_index.less | 241 |
| abstract_inverted_index.long | 55, 224 |
| abstract_inverted_index.mask | 40 |
| abstract_inverted_index.only | 219 |
| abstract_inverted_index.same | 99, 172 |
| abstract_inverted_index.shed | 235 |
| abstract_inverted_index.size | 100 |
| abstract_inverted_index.step | 137 |
| abstract_inverted_index.that | 190 |
| abstract_inverted_index.them | 28 |
| abstract_inverted_index.then | 26 |
| abstract_inverted_index.this | 68 |
| abstract_inverted_index.with | 120, 180, 206, 230 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.These | 14 |
| abstract_inverted_index.chunk | 49 |
| abstract_inverted_index.every | 136 |
| abstract_inverted_index.fixed | 132 |
| abstract_inverted_index.large | 246 |
| abstract_inverted_index.light | 236 |
| abstract_inverted_index.model | 169, 178 |
| abstract_inverted_index.novel | 75 |
| abstract_inverted_index.size. | 232 |
| abstract_inverted_index.step, | 155 |
| abstract_inverted_index.these | 83 |
| abstract_inverted_index.time. | 162 |
| abstract_inverted_index.token | 12 |
| abstract_inverted_index.train | 164 |
| abstract_inverted_index.union | 91 |
| abstract_inverted_index.which | 129, 256 |
| abstract_inverted_index.(LLMs) | 3 |
| abstract_inverted_index.During | 106 |
| abstract_inverted_index.Recent | 37 |
| abstract_inverted_index.actual | 150 |
| abstract_inverted_index.aspect | 243 |
| abstract_inverted_index.corpus | 188 |
| abstract_inverted_index.faster | 210 |
| abstract_inverted_index.incurs | 130, 143 |
| abstract_inverted_index.length | 35, 46, 78, 112 |
| abstract_inverted_index.method | 142, 217 |
| abstract_inverted_index.models | 2 |
| abstract_inverted_index.scales | 228 |
| abstract_inverted_index.study, | 69 |
| abstract_inverted_index.tackle | 82 |
| abstract_inverted_index.target | 34, 204 |
| abstract_inverted_index.unique | 104 |
| abstract_inverted_index.Lastly, | 233 |
| abstract_inverted_index.becomes | 57 |
| abstract_inverted_index.buckets | 119 |
| abstract_inverted_index.created | 17 |
| abstract_inverted_index.dataset | 72, 88, 231 |
| abstract_inverted_index.enables | 220 |
| abstract_inverted_index.lengths | 24, 152 |
| abstract_inverted_index.models: | 248 |
| abstract_inverted_index.results | 257 |
| abstract_inverted_index.savings | 159 |
| abstract_inverted_index.studied | 242 |
| abstract_inverted_index.tokens. | 51 |
| abstract_inverted_index.trained | 6, 179 |
| abstract_inverted_index.various | 23 |
| abstract_inverted_index.accuracy | 205 |
| abstract_inverted_index.approach | 192 |
| abstract_inverted_index.baseline | 182 |
| abstract_inverted_index.buckets, | 93 |
| abstract_inverted_index.chunking | 27 |
| abstract_inverted_index.commonly | 5 |
| abstract_inverted_index.compared | 212 |
| abstract_inverted_index.contrast | 124 |
| abstract_inverted_index.critical | 239 |
| abstract_inverted_index.datasets | 8, 15 |
| abstract_inverted_index.document | 151 |
| abstract_inverted_index.enhances | 194 |
| abstract_inverted_index.language | 1, 198, 247 |
| abstract_inverted_index.lengths, | 255 |
| abstract_inverted_index.proposed | 141 |
| abstract_inverted_index.randomly | 19 |
| abstract_inverted_index.reaching | 203 |
| abstract_inverted_index.reducing | 43 |
| abstract_inverted_index.sampling | 115 |
| abstract_inverted_index.sequence | 77, 111, 254 |
| abstract_inverted_index.standard | 197 |
| abstract_inverted_index.training | 53, 79, 161, 211, 245 |
| abstract_inverted_index.variable | 76, 110 |
| abstract_inverted_index.approach. | 183 |
| abstract_inverted_index.attention | 38, 133 |
| abstract_inverted_index.baseline, | 128 |
| abstract_inverted_index.baseline. | 215 |
| abstract_inverted_index.decompose | 86 |
| abstract_inverted_index.document. | 105 |
| abstract_inverted_index.documents | 21 |
| abstract_inverted_index.effective | 45 |
| abstract_inverted_index.efficient | 221 |
| abstract_inverted_index.extracted | 101 |
| abstract_inverted_index.introduce | 71 |
| abstract_inverted_index.quadratic | 63 |
| abstract_inverted_index.resulting | 156 |
| abstract_inverted_index.sequences | 30, 56, 96, 225 |
| abstract_inverted_index.training, | 107, 139 |
| abstract_inverted_index.web-scale | 187 |
| abstract_inverted_index.attention, | 42 |
| abstract_inverted_index.attention. | 66 |
| abstract_inverted_index.consisting | 9 |
| abstract_inverted_index.containing | 95 |
| abstract_inverted_index.curriculum | 252 |
| abstract_inverted_index.difference | 261 |
| abstract_inverted_index.sequences. | 13 |
| abstract_inverted_index.technique, | 80 |
| abstract_inverted_index.Experiments | 184 |
| abstract_inverted_index.batch-size, | 114 |
| abstract_inverted_index.benchmarks, | 202 |
| abstract_inverted_index.challenges. | 84 |
| abstract_inverted_index.curriculum. | 122 |
| abstract_inverted_index.demonstrate | 189 |
| abstract_inverted_index.effectively | 229 |
| abstract_inverted_index.evaluations | 199 |
| abstract_inverted_index.performance | 195 |
| abstract_inverted_index.pretraining | 222 |
| abstract_inverted_index.prohibitive | 59 |
| abstract_inverted_index.significant | 158 |
| abstract_inverted_index.distribution | 250 |
| abstract_inverted_index.fixed-length | 11 |
| abstract_inverted_index.long-context | 201 |
| abstract_inverted_index.performance. | 263 |
| abstract_inverted_index.proportional | 147 |
| abstract_inverted_index.Additionally, | 52 |
| abstract_inverted_index.computational | 145 |
| abstract_inverted_index.concatenating | 20 |
| abstract_inverted_index.predetermined | 33 |
| abstract_inverted_index.significantly | 193 |
| abstract_inverted_index.context-length | 167, 177 |
| abstract_inverted_index.cross-document | 41 |
| abstract_inverted_index.decomposition, | 73 |
| abstract_inverted_index.non-negligible | 260 |
| abstract_inverted_index.simultaneously | 116 |
| abstract_inverted_index.computationally | 58 |
| abstract_inverted_index.implementations | 39 |
| abstract_inverted_index.concat-and-chunk | 127 |
| abstract_inverted_index.(concat-and-chunk). | 36 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |