Effective Long-Context Scaling of Foundation Models Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2309.16039
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled. We perform extensive evaluation on language modeling, synthetic context probing tasks, and a wide range of research benchmarks. On research benchmarks, our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2. Notably, with a cost-effective instruction tuning procedure that does not require human-annotated long instruction data, the 70B variant can already surpass gpt-3.5-turbo-16k's overall performance on a suite of long-context tasks. Alongside these results, we provide an in-depth analysis on the individual components of our method. We delve into Llama's position encodings and discuss its limitation in modeling long dependencies. We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2309.16039
- https://arxiv.org/pdf/2309.16039
- OA Status
- green
- Cited By
- 8
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4387210426
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4387210426Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2309.16039Digital Object Identifier
- Title
-
Effective Long-Context Scaling of Foundation ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-09-27Full publication date if available
- Authors
-
Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oğuz, Madian Khabsa, Fang Han, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao MaList of authors in order
- Landing page
-
https://arxiv.org/abs/2309.16039Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2309.16039Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2309.16039Direct OA link when available
- Concepts
-
Computer science, Context (archaeology), Suite, Process (computing), Key (lock), Scratch, Language model, Artificial intelligence, Machine learning, Human–computer interaction, Natural language processing, Programming language, History, Computer security, Archaeology, Biology, PaleontologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
8Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 4, 2024: 3, 2023: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4387210426 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2309.16039 |
| ids.doi | https://doi.org/10.48550/arxiv.2309.16039 |
| ids.openalex | https://openalex.org/W4387210426 |
| fwci | |
| type | preprint |
| title | Effective Long-Context Scaling of Foundation Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9997000098228455 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9994999766349792 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| topics[2].id | https://openalex.org/T10201 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9944999814033508 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech Recognition and Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7979388236999512 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C2779343474 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7778964042663574 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q3109175 |
| concepts[1].display_name | Context (archaeology) |
| concepts[2].id | https://openalex.org/C79581498 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5169126987457275 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1367530 |
| concepts[2].display_name | Suite |
| concepts[3].id | https://openalex.org/C98045186 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5153676867485046 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q205663 |
| concepts[3].display_name | Process (computing) |
| concepts[4].id | https://openalex.org/C26517878 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4826424717903137 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q228039 |
| concepts[4].display_name | Key (lock) |
| concepts[5].id | https://openalex.org/C2781235140 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4303893446922302 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q275131 |
| concepts[5].display_name | Scratch |
| concepts[6].id | https://openalex.org/C137293760 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4237111210823059 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[6].display_name | Language model |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.381021648645401 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C119857082 |
| concepts[8].level | 1 |
| concepts[8].score | 0.36574089527130127 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[8].display_name | Machine learning |
| concepts[9].id | https://openalex.org/C107457646 |
| concepts[9].level | 1 |
| concepts[9].score | 0.35566890239715576 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q207434 |
| concepts[9].display_name | Human–computer interaction |
| concepts[10].id | https://openalex.org/C204321447 |
| concepts[10].level | 1 |
| concepts[10].score | 0.35303211212158203 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[10].display_name | Natural language processing |
| concepts[11].id | https://openalex.org/C199360897 |
| concepts[11].level | 1 |
| concepts[11].score | 0.1322130560874939 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[11].display_name | Programming language |
| concepts[12].id | https://openalex.org/C95457728 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q309 |
| concepts[12].display_name | History |
| concepts[13].id | https://openalex.org/C38652104 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[13].display_name | Computer security |
| concepts[14].id | https://openalex.org/C166957645 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q23498 |
| concepts[14].display_name | Archaeology |
| concepts[15].id | https://openalex.org/C86803240 |
| concepts[15].level | 0 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[15].display_name | Biology |
| concepts[16].id | https://openalex.org/C151730666 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q7205 |
| concepts[16].display_name | Paleontology |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7979388236999512 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/context |
| keywords[1].score | 0.7778964042663574 |
| keywords[1].display_name | Context (archaeology) |
| keywords[2].id | https://openalex.org/keywords/suite |
| keywords[2].score | 0.5169126987457275 |
| keywords[2].display_name | Suite |
| keywords[3].id | https://openalex.org/keywords/process |
| keywords[3].score | 0.5153676867485046 |
| keywords[3].display_name | Process (computing) |
| keywords[4].id | https://openalex.org/keywords/key |
| keywords[4].score | 0.4826424717903137 |
| keywords[4].display_name | Key (lock) |
| keywords[5].id | https://openalex.org/keywords/scratch |
| keywords[5].score | 0.4303893446922302 |
| keywords[5].display_name | Scratch |
| keywords[6].id | https://openalex.org/keywords/language-model |
| keywords[6].score | 0.4237111210823059 |
| keywords[6].display_name | Language model |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.381021648645401 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/machine-learning |
| keywords[8].score | 0.36574089527130127 |
| keywords[8].display_name | Machine learning |
| keywords[9].id | https://openalex.org/keywords/human–computer-interaction |
| keywords[9].score | 0.35566890239715576 |
| keywords[9].display_name | Human–computer interaction |
| keywords[10].id | https://openalex.org/keywords/natural-language-processing |
| keywords[10].score | 0.35303211212158203 |
| keywords[10].display_name | Natural language processing |
| keywords[11].id | https://openalex.org/keywords/programming-language |
| keywords[11].score | 0.1322130560874939 |
| keywords[11].display_name | Programming language |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2309.16039 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2309.16039 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2309.16039 |
| locations[1].id | doi:10.48550/arxiv.2309.16039 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2309.16039 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5110635444 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Wenhan Xiong |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xiong, Wenhan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100424755 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-2332-5062 |
| authorships[1].author.display_name | Jingyu Liu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Liu, Jingyu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5041560478 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-7251-3532 |
| authorships[2].author.display_name | Igor Molybog |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Molybog, Igor |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5047725424 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-5222-5277 |
| authorships[3].author.display_name | Hejia Zhang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zhang, Hejia |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5028858668 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-7315-548X |
| authorships[4].author.display_name | Prajjwal Bhargava |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Bhargava, Prajjwal |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5103219508 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-1320-1894 |
| authorships[5].author.display_name | Rui Hou |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Hou, Rui |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5109434168 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Louis Martin |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Martin, Louis |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5035660188 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Rashi Rungta |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Rungta, Rashi |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5056358363 |
| authorships[8].author.orcid | https://orcid.org/0000-0002-8569-5694 |
| authorships[8].author.display_name | Karthik Abinav Sankararaman |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Sankararaman, Karthik Abinav |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5071728146 |
| authorships[9].author.orcid | |
| authorships[9].author.display_name | Barlas Oğuz |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Oguz, Barlas |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5054253075 |
| authorships[10].author.orcid | |
| authorships[10].author.display_name | Madian Khabsa |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Khabsa, Madian |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5049734483 |
| authorships[11].author.orcid | https://orcid.org/0000-0002-3052-9197 |
| authorships[11].author.display_name | Fang Han |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | Fang, Han |
| authorships[11].is_corresponding | False |
| authorships[12].author.id | https://openalex.org/A5067997502 |
| authorships[12].author.orcid | |
| authorships[12].author.display_name | Yashar Mehdad |
| authorships[12].author_position | middle |
| authorships[12].raw_author_name | Mehdad, Yashar |
| authorships[12].is_corresponding | False |
| authorships[13].author.id | https://openalex.org/A5079540764 |
| authorships[13].author.orcid | |
| authorships[13].author.display_name | Sharan Narang |
| authorships[13].author_position | middle |
| authorships[13].raw_author_name | Narang, Sharan |
| authorships[13].is_corresponding | False |
| authorships[14].author.id | https://openalex.org/A5112754279 |
| authorships[14].author.orcid | |
| authorships[14].author.display_name | Kshitiz Malik |
| authorships[14].author_position | middle |
| authorships[14].raw_author_name | Malik, Kshitiz |
| authorships[14].is_corresponding | False |
| authorships[15].author.id | https://openalex.org/A5083185771 |
| authorships[15].author.orcid | https://orcid.org/0000-0002-2500-3019 |
| authorships[15].author.display_name | Angela Fan |
| authorships[15].author_position | middle |
| authorships[15].raw_author_name | Fan, Angela |
| authorships[15].is_corresponding | False |
| authorships[16].author.id | https://openalex.org/A5065321401 |
| authorships[16].author.orcid | |
| authorships[16].author.display_name | Shruti Bhosale |
| authorships[16].author_position | middle |
| authorships[16].raw_author_name | Bhosale, Shruti |
| authorships[16].is_corresponding | False |
| authorships[17].author.id | https://openalex.org/A5016113002 |
| authorships[17].author.orcid | |
| authorships[17].author.display_name | Sergey Edunov |
| authorships[17].author_position | middle |
| authorships[17].raw_author_name | Edunov, Sergey |
| authorships[17].is_corresponding | False |
| authorships[18].author.id | https://openalex.org/A5004412943 |
| authorships[18].author.orcid | https://orcid.org/0000-0003-0679-6612 |
| authorships[18].author.display_name | Mike Lewis |
| authorships[18].author_position | middle |
| authorships[18].raw_author_name | Lewis, Mike |
| authorships[18].is_corresponding | False |
| authorships[19].author.id | https://openalex.org/A5049450075 |
| authorships[19].author.orcid | https://orcid.org/0009-0008-5329-9620 |
| authorships[19].author.display_name | Sinong Wang |
| authorships[19].author_position | middle |
| authorships[19].raw_author_name | Wang, Sinong |
| authorships[19].is_corresponding | False |
| authorships[20].author.id | https://openalex.org/A5100452334 |
| authorships[20].author.orcid | https://orcid.org/0000-0001-8585-6665 |
| authorships[20].author.display_name | Hao Ma |
| authorships[20].author_position | last |
| authorships[20].raw_author_name | Ma, Hao |
| authorships[20].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2309.16039 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Effective Long-Context Scaling of Foundation Models |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9997000098228455 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W2475116013, https://openalex.org/W2770018148, https://openalex.org/W2358308169, https://openalex.org/W2385135707, https://openalex.org/W2140315382, https://openalex.org/W2059109728, https://openalex.org/W322691623, https://openalex.org/W2494989134, https://openalex.org/W2509444723, https://openalex.org/W2004958254 |
| cited_by_count | 8 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 4 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 3 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2309.16039 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2309.16039 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2309.16039 |
| primary_location.id | pmh:oai:arXiv.org:2309.16039 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2309.16039 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2309.16039 |
| publication_date | 2023-09-27 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.2 | 27 |
| abstract_inverted_index.a | 2, 34, 53, 82, 105 |
| abstract_inverted_index.-- | 163 |
| abstract_inverted_index.2. | 79 |
| abstract_inverted_index.On | 59 |
| abstract_inverted_index.We | 0, 41, 125, 139 |
| abstract_inverted_index.an | 115 |
| abstract_inverted_index.in | 135, 148, 173 |
| abstract_inverted_index.is | 177, 194 |
| abstract_inverted_index.of | 4, 12, 56, 107, 122, 144, 160 |
| abstract_inverted_index.on | 33, 45, 67, 74, 104, 118 |
| abstract_inverted_index.to | 14, 181, 201 |
| abstract_inverted_index.up | 13 |
| abstract_inverted_index.we | 113, 186 |
| abstract_inverted_index.70B | 96 |
| abstract_inverted_index.Our | 17 |
| abstract_inverted_index.and | 32, 52, 71, 131, 156, 185, 197 |
| abstract_inverted_index.are | 20, 39 |
| abstract_inverted_index.can | 98 |
| abstract_inverted_index.its | 133 |
| abstract_inverted_index.key | 180 |
| abstract_inverted_index.mix | 155 |
| abstract_inverted_index.not | 89, 178 |
| abstract_inverted_index.our | 62, 123, 164 |
| abstract_inverted_index.the | 95, 119, 142, 149, 153, 157, 174, 179 |
| abstract_inverted_index.LLMs | 6 |
| abstract_inverted_index.also | 140 |
| abstract_inverted_index.data | 154 |
| abstract_inverted_index.does | 88 |
| abstract_inverted_index.from | 25, 203 |
| abstract_inverted_index.into | 127 |
| abstract_inverted_index.long | 37, 92, 137, 171, 190, 206 |
| abstract_inverted_index.more | 195 |
| abstract_inverted_index.most | 68 |
| abstract_inverted_index.over | 77 |
| abstract_inverted_index.that | 7, 87, 168, 189 |
| abstract_inverted_index.wide | 54 |
| abstract_inverted_index.with | 28, 81, 205 |
| abstract_inverted_index.Llama | 26, 78 |
| abstract_inverted_index.built | 21 |
| abstract_inverted_index.data, | 94 |
| abstract_inverted_index.delve | 126 |
| abstract_inverted_index.model | 18 |
| abstract_inverted_index.range | 55 |
| abstract_inverted_index.suite | 106 |
| abstract_inverted_index.tasks | 70, 76 |
| abstract_inverted_index.texts | 38, 172 |
| abstract_inverted_index.these | 111 |
| abstract_inverted_index.where | 36 |
| abstract_inverted_index.32,768 | 15 |
| abstract_inverted_index.design | 146 |
| abstract_inverted_index.having | 169 |
| abstract_inverted_index.impact | 143 |
| abstract_inverted_index.longer | 29 |
| abstract_inverted_index.models | 63 |
| abstract_inverted_index.series | 3, 19 |
| abstract_inverted_index.strong | 183 |
| abstract_inverted_index.tasks, | 51 |
| abstract_inverted_index.tasks. | 109 |
| abstract_inverted_index.tuning | 85 |
| abstract_inverted_index.verify | 188 |
| abstract_inverted_index.Llama's | 128 |
| abstract_inverted_index.achieve | 64 |
| abstract_inverted_index.already | 99 |
| abstract_inverted_index.choices | 147 |
| abstract_inverted_index.context | 10, 49, 191 |
| abstract_inverted_index.dataset | 35, 176 |
| abstract_inverted_index.discuss | 132 |
| abstract_inverted_index.examine | 141 |
| abstract_inverted_index.lengths | 162 |
| abstract_inverted_index.method. | 124 |
| abstract_inverted_index.overall | 102 |
| abstract_inverted_index.perform | 42 |
| abstract_inverted_index.present | 1 |
| abstract_inverted_index.probing | 50 |
| abstract_inverted_index.provide | 114 |
| abstract_inverted_index.regular | 69 |
| abstract_inverted_index.require | 90 |
| abstract_inverted_index.scratch | 204 |
| abstract_inverted_index.suggest | 167 |
| abstract_inverted_index.support | 8 |
| abstract_inverted_index.surpass | 100 |
| abstract_inverted_index.through | 22 |
| abstract_inverted_index.tokens. | 16 |
| abstract_inverted_index.variant | 97 |
| abstract_inverted_index.various | 145 |
| abstract_inverted_index.windows | 11 |
| abstract_inverted_index.Notably, | 80 |
| abstract_inverted_index.ablation | 165 |
| abstract_inverted_index.abundant | 170 |
| abstract_inverted_index.analysis | 117 |
| abstract_inverted_index.compared | 200 |
| abstract_inverted_index.in-depth | 116 |
| abstract_inverted_index.language | 46 |
| abstract_inverted_index.modeling | 136 |
| abstract_inverted_index.position | 129 |
| abstract_inverted_index.pretrain | 175 |
| abstract_inverted_index.process, | 151 |
| abstract_inverted_index.research | 57, 60 |
| abstract_inverted_index.results, | 112 |
| abstract_inverted_index.sequence | 161 |
| abstract_inverted_index.training | 30, 158 |
| abstract_inverted_index.Alongside | 110 |
| abstract_inverted_index.achieving | 182 |
| abstract_inverted_index.continual | 23, 192 |
| abstract_inverted_index.effective | 9, 199 |
| abstract_inverted_index.efficient | 196 |
| abstract_inverted_index.encodings | 130 |
| abstract_inverted_index.extensive | 43 |
| abstract_inverted_index.including | 152 |
| abstract_inverted_index.modeling, | 47 |
| abstract_inverted_index.procedure | 86 |
| abstract_inverted_index.sequences | 31 |
| abstract_inverted_index.similarly | 198 |
| abstract_inverted_index.synthetic | 48 |
| abstract_inverted_index.components | 121 |
| abstract_inverted_index.consistent | 65 |
| abstract_inverted_index.curriculum | 159 |
| abstract_inverted_index.evaluation | 44 |
| abstract_inverted_index.individual | 120 |
| abstract_inverted_index.limitation | 134 |
| abstract_inverted_index.sequences. | 207 |
| abstract_inverted_index.upsampled. | 40 |
| abstract_inverted_index.benchmarks, | 61 |
| abstract_inverted_index.benchmarks. | 58 |
| abstract_inverted_index.empirically | 187 |
| abstract_inverted_index.experiments | 166 |
| abstract_inverted_index.instruction | 84, 93 |
| abstract_inverted_index.performance | 103 |
| abstract_inverted_index.pretraining | 24, 150, 193, 202 |
| abstract_inverted_index.significant | 72 |
| abstract_inverted_index.improvements | 66, 73 |
| abstract_inverted_index.long-context | 5, 75, 108 |
| abstract_inverted_index.performance, | 184 |
| abstract_inverted_index.dependencies. | 138 |
| abstract_inverted_index.cost-effective | 83 |
| abstract_inverted_index.human-annotated | 91 |
| abstract_inverted_index.gpt-3.5-turbo-16k's | 101 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 21 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.7300000190734863 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |