Computational Bottlenecks of Training Small-scale Large Language Models Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2410.19456
While large language models (LLMs) dominate the AI landscape, Small-scale large Language Models (SLMs) are gaining attention due to cost and efficiency demands from consumers. However, there is limited research on the training behavior and computational requirements of SLMs. In this study, we explore the computational bottlenecks of training SLMs (up to 2B parameters) by examining the effects of various hyperparameters and configurations, including GPU type, batch size, model size, communication protocol, attention type, and the number of GPUs. We assess these factors on popular cloud services using metrics such as loss per dollar and tokens per second. Our findings aim to support the broader adoption and optimization of language model training for low-resource AI research institutes.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2410.19456
- https://arxiv.org/pdf/2410.19456
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4404312129
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4404312129Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2410.19456Digital Object Identifier
- Title
-
Computational Bottlenecks of Training Small-scale Large Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-10-25Full publication date if available
- Authors
-
Saleh Ashkboos, Iman Mirzadeh, Keivan Alizadeh, Mohammad Hossein Sekhavat, Moin Nabi, Mehrdad Farajtabar, Fartash FaghriList of authors in order
- Landing page
-
https://arxiv.org/abs/2410.19456Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2410.19456Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2410.19456Direct OA link when available
- Concepts
-
Training (meteorology), Scale (ratio), Computer science, Language model, Artificial intelligence, Geography, Cartography, MeteorologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4404312129 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2410.19456 |
| ids.doi | https://doi.org/10.48550/arxiv.2410.19456 |
| ids.openalex | https://openalex.org/W4404312129 |
| fwci | |
| type | preprint |
| title | Computational Bottlenecks of Training Small-scale Large Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.935699999332428 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9233999848365784 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2777211547 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7229854464530945 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q17141490 |
| concepts[0].display_name | Training (meteorology) |
| concepts[1].id | https://openalex.org/C2778755073 |
| concepts[1].level | 2 |
| concepts[1].score | 0.64008629322052 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q10858537 |
| concepts[1].display_name | Scale (ratio) |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.6160519123077393 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C137293760 |
| concepts[3].level | 2 |
| concepts[3].score | 0.4561835825443268 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[3].display_name | Language model |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.3108747601509094 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C205649164 |
| concepts[5].level | 0 |
| concepts[5].score | 0.15019840002059937 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[5].display_name | Geography |
| concepts[6].id | https://openalex.org/C58640448 |
| concepts[6].level | 1 |
| concepts[6].score | 0.0975010097026825 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q42515 |
| concepts[6].display_name | Cartography |
| concepts[7].id | https://openalex.org/C153294291 |
| concepts[7].level | 1 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q25261 |
| concepts[7].display_name | Meteorology |
| keywords[0].id | https://openalex.org/keywords/training |
| keywords[0].score | 0.7229854464530945 |
| keywords[0].display_name | Training (meteorology) |
| keywords[1].id | https://openalex.org/keywords/scale |
| keywords[1].score | 0.64008629322052 |
| keywords[1].display_name | Scale (ratio) |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.6160519123077393 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/language-model |
| keywords[3].score | 0.4561835825443268 |
| keywords[3].display_name | Language model |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.3108747601509094 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/geography |
| keywords[5].score | 0.15019840002059937 |
| keywords[5].display_name | Geography |
| keywords[6].id | https://openalex.org/keywords/cartography |
| keywords[6].score | 0.0975010097026825 |
| keywords[6].display_name | Cartography |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2410.19456 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2410.19456 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2410.19456 |
| locations[1].id | doi:10.48550/arxiv.2410.19456 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2410.19456 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5086409515 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-6115-6779 |
| authorships[0].author.display_name | Saleh Ashkboos |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ashkboos, Saleh |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5079412282 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Iman Mirzadeh |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Mirzadeh, Iman |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5030482460 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Keivan Alizadeh |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Alizadeh, Keivan |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5095886473 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Mohammad Hossein Sekhavat |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Sekhavat, Mohammad Hossein |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5001459748 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-7559-9888 |
| authorships[4].author.display_name | Moin Nabi |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Nabi, Moin |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5050499655 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-5510-518X |
| authorships[5].author.display_name | Mehrdad Farajtabar |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Farajtabar, Mehrdad |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5036601505 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-5975-5158 |
| authorships[6].author.display_name | Fartash Faghri |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Faghri, Fartash |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2410.19456 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Computational Bottlenecks of Training Small-scale Large Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.935699999332428 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W230091440, https://openalex.org/W2390279801, https://openalex.org/W2233261550, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2810751659 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2410.19456 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2410.19456 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2410.19456 |
| primary_location.id | pmh:oai:arXiv.org:2410.19456 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2410.19456 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2410.19456 |
| publication_date | 2024-10-25 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.2B | 52 |
| abstract_inverted_index.AI | 7, 114 |
| abstract_inverted_index.In | 39 |
| abstract_inverted_index.We | 79 |
| abstract_inverted_index.as | 90 |
| abstract_inverted_index.by | 54 |
| abstract_inverted_index.is | 27 |
| abstract_inverted_index.of | 37, 47, 58, 77, 108 |
| abstract_inverted_index.on | 30, 83 |
| abstract_inverted_index.to | 18, 51, 101 |
| abstract_inverted_index.we | 42 |
| abstract_inverted_index.(up | 50 |
| abstract_inverted_index.GPU | 64 |
| abstract_inverted_index.Our | 98 |
| abstract_inverted_index.aim | 100 |
| abstract_inverted_index.and | 20, 34, 61, 74, 94, 106 |
| abstract_inverted_index.are | 14 |
| abstract_inverted_index.due | 17 |
| abstract_inverted_index.for | 112 |
| abstract_inverted_index.per | 92, 96 |
| abstract_inverted_index.the | 6, 31, 44, 56, 75, 103 |
| abstract_inverted_index.SLMs | 49 |
| abstract_inverted_index.cost | 19 |
| abstract_inverted_index.from | 23 |
| abstract_inverted_index.loss | 91 |
| abstract_inverted_index.such | 89 |
| abstract_inverted_index.this | 40 |
| abstract_inverted_index.GPUs. | 78 |
| abstract_inverted_index.SLMs. | 38 |
| abstract_inverted_index.While | 0 |
| abstract_inverted_index.batch | 66 |
| abstract_inverted_index.cloud | 85 |
| abstract_inverted_index.large | 1, 10 |
| abstract_inverted_index.model | 68, 110 |
| abstract_inverted_index.size, | 67, 69 |
| abstract_inverted_index.there | 26 |
| abstract_inverted_index.these | 81 |
| abstract_inverted_index.type, | 65, 73 |
| abstract_inverted_index.using | 87 |
| abstract_inverted_index.(LLMs) | 4 |
| abstract_inverted_index.(SLMs) | 13 |
| abstract_inverted_index.Models | 12 |
| abstract_inverted_index.assess | 80 |
| abstract_inverted_index.dollar | 93 |
| abstract_inverted_index.models | 3 |
| abstract_inverted_index.number | 76 |
| abstract_inverted_index.study, | 41 |
| abstract_inverted_index.tokens | 95 |
| abstract_inverted_index.broader | 104 |
| abstract_inverted_index.demands | 22 |
| abstract_inverted_index.effects | 57 |
| abstract_inverted_index.explore | 43 |
| abstract_inverted_index.factors | 82 |
| abstract_inverted_index.gaining | 15 |
| abstract_inverted_index.limited | 28 |
| abstract_inverted_index.metrics | 88 |
| abstract_inverted_index.popular | 84 |
| abstract_inverted_index.second. | 97 |
| abstract_inverted_index.support | 102 |
| abstract_inverted_index.various | 59 |
| abstract_inverted_index.However, | 25 |
| abstract_inverted_index.Language | 11 |
| abstract_inverted_index.adoption | 105 |
| abstract_inverted_index.behavior | 33 |
| abstract_inverted_index.dominate | 5 |
| abstract_inverted_index.findings | 99 |
| abstract_inverted_index.language | 2, 109 |
| abstract_inverted_index.research | 29, 115 |
| abstract_inverted_index.services | 86 |
| abstract_inverted_index.training | 32, 48, 111 |
| abstract_inverted_index.attention | 16, 72 |
| abstract_inverted_index.examining | 55 |
| abstract_inverted_index.including | 63 |
| abstract_inverted_index.protocol, | 71 |
| abstract_inverted_index.consumers. | 24 |
| abstract_inverted_index.efficiency | 21 |
| abstract_inverted_index.landscape, | 8 |
| abstract_inverted_index.Small-scale | 9 |
| abstract_inverted_index.bottlenecks | 46 |
| abstract_inverted_index.institutes. | 116 |
| abstract_inverted_index.parameters) | 53 |
| abstract_inverted_index.low-resource | 113 |
| abstract_inverted_index.optimization | 107 |
| abstract_inverted_index.requirements | 36 |
| abstract_inverted_index.communication | 70 |
| abstract_inverted_index.computational | 35, 45 |
| abstract_inverted_index.configurations, | 62 |
| abstract_inverted_index.hyperparameters | 60 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |