Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2502.01804
Machine learning models are routinely trained on a mixture of different data domains. Different domain weights yield very different downstream performances. We propose the Soup-of-Experts, a novel architecture that can instantiate a model at test time for any domain weights with minimal computational cost and without re-training the model. Our architecture consists of a bank of expert parameters, which are linearly combined to instantiate one model. We learn the linear combination coefficients as a function of the input domain weights. To train this architecture, we sample random domain weights, instantiate the corresponding model, and backprop through one batch of data sampled with these domain weights. We demonstrate how our approach obtains small specialized models on several language modeling tasks quickly. Soup-of-Experts are particularly appealing when one needs to ship many different specialist models quickly under a model size constraint.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2502.01804
- https://arxiv.org/pdf/2502.01804
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4407184882
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4407184882Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2502.01804Digital Object Identifier
- Title
-
Soup-of-Experts: Pretraining Specialist Models via Parameters AveragingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-03Full publication date if available
- Authors
-
Pierre Ablin, Angelos Katharopoulos, Skyler Seto, David GrangierList of authors in order
- Landing page
-
https://arxiv.org/abs/2502.01804Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2502.01804Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2502.01804Direct OA link when available
- Concepts
-
Psychology, Computer science, Artificial intelligenceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4407184882 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2502.01804 |
| ids.doi | https://doi.org/10.48550/arxiv.2502.01804 |
| ids.openalex | https://openalex.org/W4407184882 |
| fwci | |
| type | preprint |
| title | Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11986 |
| topics[0].field.id | https://openalex.org/fields/18 |
| topics[0].field.display_name | Decision Sciences |
| topics[0].score | 0.8669999837875366 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1802 |
| topics[0].subfield.display_name | Information Systems and Management |
| topics[0].display_name | Scientific Computing and Data Management |
| topics[1].id | https://openalex.org/T13274 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.8632000088691711 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1710 |
| topics[1].subfield.display_name | Information Systems |
| topics[1].display_name | Expert finding and Q&A systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C15744967 |
| concepts[0].level | 0 |
| concepts[0].score | 0.44216105341911316 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[0].display_name | Psychology |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.3524307310581207 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.3376082181930542 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| keywords[0].id | https://openalex.org/keywords/psychology |
| keywords[0].score | 0.44216105341911316 |
| keywords[0].display_name | Psychology |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.3524307310581207 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.3376082181930542 |
| keywords[2].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2502.01804 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2502.01804 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2502.01804 |
| locations[1].id | doi:10.48550/arxiv.2502.01804 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2502.01804 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5042340163 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4277-5202 |
| authorships[0].author.display_name | Pierre Ablin |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ablin, Pierre |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5031829458 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Angelos Katharopoulos |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Katharopoulos, Angelos |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5059839283 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Skyler Seto |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Seto, Skyler |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5065912572 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-8847-9532 |
| authorships[3].author.display_name | David Grangier |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Grangier, David |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2502.01804 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11986 |
| primary_topic.field.id | https://openalex.org/fields/18 |
| primary_topic.field.display_name | Decision Sciences |
| primary_topic.score | 0.8669999837875366 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1802 |
| primary_topic.subfield.display_name | Information Systems and Management |
| primary_topic.display_name | Scientific Computing and Data Management |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2502.01804 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2502.01804 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2502.01804 |
| primary_location.id | pmh:oai:arXiv.org:2502.01804 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2502.01804 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2502.01804 |
| publication_date | 2025-02-03 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 7, 25, 31, 53, 73, 135 |
| abstract_inverted_index.To | 80 |
| abstract_inverted_index.We | 21, 66, 105 |
| abstract_inverted_index.as | 72 |
| abstract_inverted_index.at | 33 |
| abstract_inverted_index.of | 9, 52, 55, 75, 98 |
| abstract_inverted_index.on | 6, 114 |
| abstract_inverted_index.to | 62, 127 |
| abstract_inverted_index.we | 84 |
| abstract_inverted_index.Our | 49 |
| abstract_inverted_index.and | 44, 93 |
| abstract_inverted_index.any | 37 |
| abstract_inverted_index.are | 3, 59, 121 |
| abstract_inverted_index.can | 29 |
| abstract_inverted_index.for | 36 |
| abstract_inverted_index.how | 107 |
| abstract_inverted_index.one | 64, 96, 125 |
| abstract_inverted_index.our | 108 |
| abstract_inverted_index.the | 23, 47, 68, 76, 90 |
| abstract_inverted_index.bank | 54 |
| abstract_inverted_index.cost | 43 |
| abstract_inverted_index.data | 11, 99 |
| abstract_inverted_index.many | 129 |
| abstract_inverted_index.ship | 128 |
| abstract_inverted_index.size | 137 |
| abstract_inverted_index.test | 34 |
| abstract_inverted_index.that | 28 |
| abstract_inverted_index.this | 82 |
| abstract_inverted_index.time | 35 |
| abstract_inverted_index.very | 17 |
| abstract_inverted_index.when | 124 |
| abstract_inverted_index.with | 40, 101 |
| abstract_inverted_index.batch | 97 |
| abstract_inverted_index.input | 77 |
| abstract_inverted_index.learn | 67 |
| abstract_inverted_index.model | 32, 136 |
| abstract_inverted_index.needs | 126 |
| abstract_inverted_index.novel | 26 |
| abstract_inverted_index.small | 111 |
| abstract_inverted_index.tasks | 118 |
| abstract_inverted_index.these | 102 |
| abstract_inverted_index.train | 81 |
| abstract_inverted_index.under | 134 |
| abstract_inverted_index.which | 58 |
| abstract_inverted_index.yield | 16 |
| abstract_inverted_index.domain | 14, 38, 78, 87, 103 |
| abstract_inverted_index.expert | 56 |
| abstract_inverted_index.linear | 69 |
| abstract_inverted_index.model, | 92 |
| abstract_inverted_index.model. | 48, 65 |
| abstract_inverted_index.models | 2, 113, 132 |
| abstract_inverted_index.random | 86 |
| abstract_inverted_index.sample | 85 |
| abstract_inverted_index.Machine | 0 |
| abstract_inverted_index.minimal | 41 |
| abstract_inverted_index.mixture | 8 |
| abstract_inverted_index.obtains | 110 |
| abstract_inverted_index.propose | 22 |
| abstract_inverted_index.quickly | 133 |
| abstract_inverted_index.sampled | 100 |
| abstract_inverted_index.several | 115 |
| abstract_inverted_index.through | 95 |
| abstract_inverted_index.trained | 5 |
| abstract_inverted_index.weights | 15, 39 |
| abstract_inverted_index.without | 45 |
| abstract_inverted_index.approach | 109 |
| abstract_inverted_index.backprop | 94 |
| abstract_inverted_index.combined | 61 |
| abstract_inverted_index.consists | 51 |
| abstract_inverted_index.domains. | 12 |
| abstract_inverted_index.function | 74 |
| abstract_inverted_index.language | 116 |
| abstract_inverted_index.learning | 1 |
| abstract_inverted_index.linearly | 60 |
| abstract_inverted_index.modeling | 117 |
| abstract_inverted_index.quickly. | 119 |
| abstract_inverted_index.weights, | 88 |
| abstract_inverted_index.weights. | 79, 104 |
| abstract_inverted_index.Different | 13 |
| abstract_inverted_index.appealing | 123 |
| abstract_inverted_index.different | 10, 18, 130 |
| abstract_inverted_index.routinely | 4 |
| abstract_inverted_index.downstream | 19 |
| abstract_inverted_index.specialist | 131 |
| abstract_inverted_index.combination | 70 |
| abstract_inverted_index.constraint. | 138 |
| abstract_inverted_index.demonstrate | 106 |
| abstract_inverted_index.instantiate | 30, 63, 89 |
| abstract_inverted_index.parameters, | 57 |
| abstract_inverted_index.re-training | 46 |
| abstract_inverted_index.specialized | 112 |
| abstract_inverted_index.architecture | 27, 50 |
| abstract_inverted_index.coefficients | 71 |
| abstract_inverted_index.particularly | 122 |
| abstract_inverted_index.architecture, | 83 |
| abstract_inverted_index.computational | 42 |
| abstract_inverted_index.corresponding | 91 |
| abstract_inverted_index.performances. | 20 |
| abstract_inverted_index.Soup-of-Experts | 120 |
| abstract_inverted_index.Soup-of-Experts, | 24 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |