The Stack: 3 TB of permissively licensed source code Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2211.15533
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (AI)--not only for natural language processing but also for code understanding and generation. To stimulate open and responsible research on LLMs for code, we introduce The Stack, a 3.1 TB dataset consisting of permissively licensed source code in 30 programming languages. We describe how we collect the full dataset, construct a permissively licensed subset, present a data governance plan, discuss limitations, and show promising results on text2code benchmarks by training 350M-parameter decoders on different Python subsets. We find that (1) near-deduplicating the data significantly boosts performance across all experiments, and (2) it is possible to match previously reported HumanEval and MBPP performance using only permissively licensed data. We make the dataset available at https://hf.co/BigCode, provide a tool called "Am I in The Stack" (https://hf.co/spaces/bigcode/in-the-stack) for developers to search The Stack for copies of their code, and provide a process for code to be removed from the dataset by following the instructions at https://www.bigcode-project.org/docs/about/the-stack/.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2211.15533
- https://arxiv.org/pdf/2211.15533
- OA Status
- green
- Cited By
- 36
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4310428868
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4310428868Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2211.15533Digital Object Identifier
- Title
-
The Stack: 3 TB of permissively licensed source codeWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-11-20Full publication date if available
- Authors
-
Denis Kocetkov, Raymond Li, Loubna Ben Allal, Jia Li, Chenghao Mou, Carlos Muñoz Ferrandis, Yacine Jernite, Margaret Mitchell, Sean Hughes, Thomas Wolf, Dzmitry Bahdanau, Leandro von Werra, Harm de VriesList of authors in order
- Landing page
-
https://arxiv.org/abs/2211.15533Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2211.15533Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2211.15533Direct OA link when available
- Concepts
-
Python (programming language), Computer science, Stack (abstract data type), Source code, Call stack, Code (set theory), Construct (python library), Open source, Programming language, Artificial intelligence, Software, Set (abstract data type)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
36Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 6, 2024: 16, 2023: 14Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4310428868 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2211.15533 |
| ids.doi | https://doi.org/10.48550/arxiv.2211.15533 |
| ids.openalex | https://openalex.org/W4310428868 |
| fwci | |
| type | preprint |
| title | The Stack: 3 TB of permissively licensed source code |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9807999730110168 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T10260 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9740999937057495 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1710 |
| topics[1].subfield.display_name | Information Systems |
| topics[1].display_name | Software Engineering Research |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C519991488 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7459749579429626 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q28865 |
| concepts[0].display_name | Python (programming language) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7222114205360413 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C9395851 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6496925354003906 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q177929 |
| concepts[2].display_name | Stack (abstract data type) |
| concepts[3].id | https://openalex.org/C43126263 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6362005472183228 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q128751 |
| concepts[3].display_name | Source code |
| concepts[4].id | https://openalex.org/C119024030 |
| concepts[4].level | 3 |
| concepts[4].score | 0.5868026614189148 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q759899 |
| concepts[4].display_name | Call stack |
| concepts[5].id | https://openalex.org/C2776760102 |
| concepts[5].level | 3 |
| concepts[5].score | 0.5000715255737305 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q5139990 |
| concepts[5].display_name | Code (set theory) |
| concepts[6].id | https://openalex.org/C2780801425 |
| concepts[6].level | 2 |
| concepts[6].score | 0.47454532980918884 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q5164392 |
| concepts[6].display_name | Construct (python library) |
| concepts[7].id | https://openalex.org/C3018397939 |
| concepts[7].level | 3 |
| concepts[7].score | 0.47281375527381897 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q3644502 |
| concepts[7].display_name | Open source |
| concepts[8].id | https://openalex.org/C199360897 |
| concepts[8].level | 1 |
| concepts[8].score | 0.43752729892730713 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[8].display_name | Programming language |
| concepts[9].id | https://openalex.org/C154945302 |
| concepts[9].level | 1 |
| concepts[9].score | 0.3555911183357239 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[9].display_name | Artificial intelligence |
| concepts[10].id | https://openalex.org/C2777904410 |
| concepts[10].level | 2 |
| concepts[10].score | 0.09871017932891846 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q7397 |
| concepts[10].display_name | Software |
| concepts[11].id | https://openalex.org/C177264268 |
| concepts[11].level | 2 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[11].display_name | Set (abstract data type) |
| keywords[0].id | https://openalex.org/keywords/python |
| keywords[0].score | 0.7459749579429626 |
| keywords[0].display_name | Python (programming language) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7222114205360413 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/stack |
| keywords[2].score | 0.6496925354003906 |
| keywords[2].display_name | Stack (abstract data type) |
| keywords[3].id | https://openalex.org/keywords/source-code |
| keywords[3].score | 0.6362005472183228 |
| keywords[3].display_name | Source code |
| keywords[4].id | https://openalex.org/keywords/call-stack |
| keywords[4].score | 0.5868026614189148 |
| keywords[4].display_name | Call stack |
| keywords[5].id | https://openalex.org/keywords/code |
| keywords[5].score | 0.5000715255737305 |
| keywords[5].display_name | Code (set theory) |
| keywords[6].id | https://openalex.org/keywords/construct |
| keywords[6].score | 0.47454532980918884 |
| keywords[6].display_name | Construct (python library) |
| keywords[7].id | https://openalex.org/keywords/open-source |
| keywords[7].score | 0.47281375527381897 |
| keywords[7].display_name | Open source |
| keywords[8].id | https://openalex.org/keywords/programming-language |
| keywords[8].score | 0.43752729892730713 |
| keywords[8].display_name | Programming language |
| keywords[9].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[9].score | 0.3555911183357239 |
| keywords[9].display_name | Artificial intelligence |
| keywords[10].id | https://openalex.org/keywords/software |
| keywords[10].score | 0.09871017932891846 |
| keywords[10].display_name | Software |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2211.15533 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2211.15533 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2211.15533 |
| locations[1].id | doi:10.48550/arxiv.2211.15533 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2211.15533 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5068788224 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Denis Kocetkov |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Kocetkov, Denis |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5009823475 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-7957-7798 |
| authorships[1].author.display_name | Raymond Li |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Li, Raymond |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5072906919 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Loubna Ben Allal |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Allal, Loubna Ben |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100405697 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-6362-4385 |
| authorships[3].author.display_name | Jia Li |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Li, Jia |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5002251783 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Chenghao Mou |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Mou, Chenghao |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5065502511 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-7178-762X |
| authorships[5].author.display_name | Carlos Muñoz Ferrandis |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Ferrandis, Carlos Muñoz |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5000126238 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-8053-6862 |
| authorships[6].author.display_name | Yacine Jernite |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Jernite, Yacine |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5046235098 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-7043-6545 |
| authorships[7].author.display_name | Margaret Mitchell |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Mitchell, Margaret |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5085873515 |
| authorships[8].author.orcid | https://orcid.org/0000-0002-2264-8479 |
| authorships[8].author.display_name | Sean Hughes |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Hughes, Sean |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5078865608 |
| authorships[9].author.orcid | https://orcid.org/0000-0002-7134-7314 |
| authorships[9].author.display_name | Thomas Wolf |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Wolf, Thomas |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5010465328 |
| authorships[10].author.orcid | |
| authorships[10].author.display_name | Dzmitry Bahdanau |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Bahdanau, Dzmitry |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5008355834 |
| authorships[11].author.orcid | |
| authorships[11].author.display_name | Leandro von Werra |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | von Werra, Leandro |
| authorships[11].is_corresponding | False |
| authorships[12].author.id | https://openalex.org/A5103854790 |
| authorships[12].author.orcid | |
| authorships[12].author.display_name | Harm de Vries |
| authorships[12].author_position | last |
| authorships[12].raw_author_name | de Vries, Harm |
| authorships[12].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2211.15533 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | The Stack: 3 TB of permissively licensed source code |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9807999730110168 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W3080200277, https://openalex.org/W2087972928, https://openalex.org/W3015514077, https://openalex.org/W2557718140, https://openalex.org/W2779721357, https://openalex.org/W1527172253, https://openalex.org/W67092138, https://openalex.org/W1968278738, https://openalex.org/W4225687299, https://openalex.org/W3125263037 |
| cited_by_count | 36 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 6 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 16 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 14 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2211.15533 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2211.15533 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2211.15533 |
| primary_location.id | pmh:oai:arXiv.org:2211.15533 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2211.15533 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2211.15533 |
| publication_date | 2022-11-20 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.I | 133 |
| abstract_inverted_index.a | 41, 64, 69, 129, 151 |
| abstract_inverted_index.30 | 52 |
| abstract_inverted_index.TB | 43 |
| abstract_inverted_index.To | 27 |
| abstract_inverted_index.We | 55, 90, 121 |
| abstract_inverted_index.an | 5 |
| abstract_inverted_index.at | 126, 165 |
| abstract_inverted_index.be | 156 |
| abstract_inverted_index.by | 82, 161 |
| abstract_inverted_index.in | 8, 51, 134 |
| abstract_inverted_index.is | 106 |
| abstract_inverted_index.it | 105 |
| abstract_inverted_index.of | 11, 46, 146 |
| abstract_inverted_index.on | 33, 79, 86 |
| abstract_inverted_index.to | 108, 140, 155 |
| abstract_inverted_index.we | 37, 58 |
| abstract_inverted_index."Am | 132 |
| abstract_inverted_index.(1) | 93 |
| abstract_inverted_index.(2) | 104 |
| abstract_inverted_index.3.1 | 42 |
| abstract_inverted_index.The | 39, 135, 142 |
| abstract_inverted_index.all | 101 |
| abstract_inverted_index.and | 25, 30, 75, 103, 113, 149 |
| abstract_inverted_index.but | 20 |
| abstract_inverted_index.for | 16, 22, 35, 138, 144, 153 |
| abstract_inverted_index.how | 57 |
| abstract_inverted_index.the | 9, 60, 95, 123, 159, 163 |
| abstract_inverted_index.LLMs | 34 |
| abstract_inverted_index.MBPP | 114 |
| abstract_inverted_index.also | 21 |
| abstract_inverted_index.code | 23, 50, 154 |
| abstract_inverted_index.data | 70, 96 |
| abstract_inverted_index.find | 91 |
| abstract_inverted_index.from | 158 |
| abstract_inverted_index.full | 61 |
| abstract_inverted_index.make | 122 |
| abstract_inverted_index.only | 15, 117 |
| abstract_inverted_index.open | 29 |
| abstract_inverted_index.play | 4 |
| abstract_inverted_index.role | 7 |
| abstract_inverted_index.show | 76 |
| abstract_inverted_index.that | 92 |
| abstract_inverted_index.tool | 130 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.Stack | 143 |
| abstract_inverted_index.code, | 36, 148 |
| abstract_inverted_index.data. | 120 |
| abstract_inverted_index.field | 10 |
| abstract_inverted_index.match | 109 |
| abstract_inverted_index.plan, | 72 |
| abstract_inverted_index.their | 147 |
| abstract_inverted_index.using | 116 |
| abstract_inverted_index.(LLMs) | 3 |
| abstract_inverted_index.Models | 2 |
| abstract_inverted_index.Python | 88 |
| abstract_inverted_index.Stack" | 136 |
| abstract_inverted_index.Stack, | 40 |
| abstract_inverted_index.across | 100 |
| abstract_inverted_index.boosts | 98 |
| abstract_inverted_index.called | 131 |
| abstract_inverted_index.copies | 145 |
| abstract_inverted_index.search | 141 |
| abstract_inverted_index.source | 49 |
| abstract_inverted_index.collect | 59 |
| abstract_inverted_index.dataset | 44, 124, 160 |
| abstract_inverted_index.discuss | 73 |
| abstract_inverted_index.natural | 17 |
| abstract_inverted_index.present | 68 |
| abstract_inverted_index.process | 152 |
| abstract_inverted_index.provide | 128, 150 |
| abstract_inverted_index.removed | 157 |
| abstract_inverted_index.results | 78 |
| abstract_inverted_index.subset, | 67 |
| abstract_inverted_index.Language | 1 |
| abstract_inverted_index.dataset, | 62 |
| abstract_inverted_index.decoders | 85 |
| abstract_inverted_index.describe | 56 |
| abstract_inverted_index.language | 18 |
| abstract_inverted_index.licensed | 48, 66, 119 |
| abstract_inverted_index.possible | 107 |
| abstract_inverted_index.reported | 111 |
| abstract_inverted_index.research | 32 |
| abstract_inverted_index.subsets. | 89 |
| abstract_inverted_index.training | 83 |
| abstract_inverted_index.(AI)--not | 14 |
| abstract_inverted_index.HumanEval | 112 |
| abstract_inverted_index.available | 125 |
| abstract_inverted_index.construct | 63 |
| abstract_inverted_index.different | 87 |
| abstract_inverted_index.following | 162 |
| abstract_inverted_index.introduce | 38 |
| abstract_inverted_index.promising | 77 |
| abstract_inverted_index.stimulate | 28 |
| abstract_inverted_index.text2code | 80 |
| abstract_inverted_index.Artificial | 12 |
| abstract_inverted_index.benchmarks | 81 |
| abstract_inverted_index.consisting | 45 |
| abstract_inverted_index.developers | 139 |
| abstract_inverted_index.governance | 71 |
| abstract_inverted_index.languages. | 54 |
| abstract_inverted_index.previously | 110 |
| abstract_inverted_index.processing | 19 |
| abstract_inverted_index.generation. | 26 |
| abstract_inverted_index.performance | 99, 115 |
| abstract_inverted_index.programming | 53 |
| abstract_inverted_index.responsible | 31 |
| abstract_inverted_index.Intelligence | 13 |
| abstract_inverted_index.experiments, | 102 |
| abstract_inverted_index.instructions | 164 |
| abstract_inverted_index.limitations, | 74 |
| abstract_inverted_index.permissively | 47, 65, 118 |
| abstract_inverted_index.significantly | 97 |
| abstract_inverted_index.understanding | 24 |
| abstract_inverted_index.350M-parameter | 84 |
| abstract_inverted_index.ever-increasing | 6 |
| abstract_inverted_index.near-deduplicating | 94 |
| abstract_inverted_index.https://hf.co/BigCode, | 127 |
| abstract_inverted_index.(https://hf.co/spaces/bigcode/in-the-stack) | 137 |
| abstract_inverted_index.https://www.bigcode-project.org/docs/about/the-stack/. | 166 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 13 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.5099999904632568 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |