Mechanistic Design and Scaling of Hybrid Architectures Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2403.17844
The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation. We set out to simplify this process by grounding it in an end-to-end mechanistic architecture design (MAD) pipeline, encompassing small-scale capability unit tests predictive of scaling laws. Through a suite of synthetic token manipulation tasks such as compression and recall, designed to probe capabilities, we identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis, training over 500 language models between 70M to 7B parameters. Surprisingly, we find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures via isolated proxy tasks. The new architectures found via MAD, based on simple ideas such as hybridization and sparsity, outperform state-of-the-art Transformer, convolutional, and recurrent architectures (Transformer++, Hyena, Mamba) in scaling, both at compute-optimal budgets and in overtrained regimes. Overall, these results provide evidence that performance on curated synthetic tasks can be predictive of scaling laws, and that an optimal architecture should leverage specialized layers via a hybrid topology.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2403.17844
- https://arxiv.org/pdf/2403.17844
- OA Status
- green
- Cited By
- 4
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4393247957
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4393247957Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2403.17844Digital Object Identifier
- Title
-
Mechanistic Design and Scaling of Hybrid ArchitecturesWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-03-26Full publication date if available
- Authors
-
Michael Poli, Armin W. Thomas, Éric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano MassaroliList of authors in order
- Landing page
-
https://arxiv.org/abs/2403.17844Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2403.17844Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2403.17844Direct OA link when available
- Concepts
-
Scaling, Computer science, Mathematics, GeometryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
4Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 3, 2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4393247957 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2403.17844 |
| ids.doi | https://doi.org/10.48550/arxiv.2403.17844 |
| ids.openalex | https://openalex.org/W4393247957 |
| fwci | |
| type | preprint |
| title | Mechanistic Design and Scaling of Hybrid Architectures |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T13518 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.8012999892234802 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2216 |
| topics[0].subfield.display_name | Architecture |
| topics[0].display_name | Architecture and Computational Design |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C99844830 |
| concepts[0].level | 2 |
| concepts[0].score | 0.650663435459137 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q102441924 |
| concepts[0].display_name | Scaling |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.4856933057308197 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C33923547 |
| concepts[2].level | 0 |
| concepts[2].score | 0.21999943256378174 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[2].display_name | Mathematics |
| concepts[3].id | https://openalex.org/C2524010 |
| concepts[3].level | 1 |
| concepts[3].score | 0.05807796120643616 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q8087 |
| concepts[3].display_name | Geometry |
| keywords[0].id | https://openalex.org/keywords/scaling |
| keywords[0].score | 0.650663435459137 |
| keywords[0].display_name | Scaling |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.4856933057308197 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/mathematics |
| keywords[2].score | 0.21999943256378174 |
| keywords[2].display_name | Mathematics |
| keywords[3].id | https://openalex.org/keywords/geometry |
| keywords[3].score | 0.05807796120643616 |
| keywords[3].display_name | Geometry |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2403.17844 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2403.17844 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2403.17844 |
| locations[1].id | doi:10.48550/arxiv.2403.17844 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2403.17844 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5078213488 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-5384-9372 |
| authorships[0].author.display_name | Michael Poli |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Poli, Michael |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5070255304 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-9947-5705 |
| authorships[1].author.display_name | Armin W. Thomas |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Thomas, Armin W |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5074287696 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-0516-2434 |
| authorships[2].author.display_name | Éric Nguyen |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Nguyen, Eric |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5014477939 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-3790-5757 |
| authorships[3].author.display_name | Pragaash Ponnusamy |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Ponnusamy, Pragaash |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5077091864 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Björn Deiseroth |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Deiseroth, Björn |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5037636074 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-2873-9152 |
| authorships[5].author.display_name | Kristian Kersting |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Kersting, Kristian |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5078812767 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-3459-1016 |
| authorships[6].author.display_name | Taiji Suzuki |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Suzuki, Taiji |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5021962955 |
| authorships[7].author.orcid | https://orcid.org/0000-0003-3224-8142 |
| authorships[7].author.display_name | Brian Hie |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Hie, Brian |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5091179481 |
| authorships[8].author.orcid | https://orcid.org/0000-0003-0039-2887 |
| authorships[8].author.display_name | Stefano Ermon |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Ermon, Stefano |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5103852640 |
| authorships[9].author.orcid | |
| authorships[9].author.display_name | Christopher Ré |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Ré, Christopher |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5100383734 |
| authorships[10].author.orcid | https://orcid.org/0000-0003-0815-4365 |
| authorships[10].author.display_name | Ce Zhang |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Zhang, Ce |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5053046176 |
| authorships[11].author.orcid | https://orcid.org/0000-0003-3788-6290 |
| authorships[11].author.display_name | Stefano Massaroli |
| authorships[11].author_position | last |
| authorships[11].raw_author_name | Massaroli, Stefano |
| authorships[11].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2403.17844 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Mechanistic Design and Scaling of Hybrid Architectures |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T13518 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.8012999892234802 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2216 |
| primary_topic.subfield.display_name | Architecture |
| primary_topic.display_name | Architecture and Computational Design |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W2382290278, https://openalex.org/W2478288626, https://openalex.org/W4391913857, https://openalex.org/W2350741829, https://openalex.org/W2530322880 |
| cited_by_count | 4 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 3 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2403.17844 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2403.17844 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2403.17844 |
| primary_location.id | pmh:oai:arXiv.org:2403.17844 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2403.17844 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2403.17844 |
| publication_date | 2024-03-26 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 7, 12, 58, 83, 99, 197 |
| abstract_inverted_index.7B | 113 |
| abstract_inverted_index.We | 30, 88 |
| abstract_inverted_index.an | 41, 95, 189 |
| abstract_inverted_index.as | 66, 146 |
| abstract_inverted_index.at | 163 |
| abstract_inverted_index.be | 182 |
| abstract_inverted_index.by | 37 |
| abstract_inverted_index.in | 40, 160, 167 |
| abstract_inverted_index.is | 6 |
| abstract_inverted_index.it | 39 |
| abstract_inverted_index.of | 2, 54, 60, 85, 128, 184 |
| abstract_inverted_index.on | 142, 177 |
| abstract_inverted_index.to | 11, 33, 71, 112, 120 |
| abstract_inverted_index.we | 74, 116 |
| abstract_inverted_index.500 | 107 |
| abstract_inverted_index.70M | 111 |
| abstract_inverted_index.MAD | 118 |
| abstract_inverted_index.The | 0, 135 |
| abstract_inverted_index.and | 19, 28, 68, 76, 98, 148, 154, 166, 187 |
| abstract_inverted_index.can | 181 |
| abstract_inverted_index.due | 10 |
| abstract_inverted_index.law | 103 |
| abstract_inverted_index.new | 78, 100, 129, 136 |
| abstract_inverted_index.out | 32 |
| abstract_inverted_index.set | 31 |
| abstract_inverted_index.the | 91 |
| abstract_inverted_index.via | 94, 131, 139, 196 |
| abstract_inverted_index.MAD, | 140 |
| abstract_inverted_index.both | 162 |
| abstract_inverted_index.deep | 3 |
| abstract_inverted_index.find | 117 |
| abstract_inverted_index.from | 82 |
| abstract_inverted_index.high | 20 |
| abstract_inverted_index.long | 16 |
| abstract_inverted_index.over | 106 |
| abstract_inverted_index.such | 65, 145 |
| abstract_inverted_index.test | 77 |
| abstract_inverted_index.that | 175, 188 |
| abstract_inverted_index.this | 35 |
| abstract_inverted_index.unit | 51 |
| abstract_inverted_index.vast | 13 |
| abstract_inverted_index.with | 24, 122 |
| abstract_inverted_index.(MAD) | 46 |
| abstract_inverted_index.based | 141 |
| abstract_inverted_index.costs | 22 |
| abstract_inverted_index.found | 138 |
| abstract_inverted_index.ideas | 144 |
| abstract_inverted_index.laws, | 186 |
| abstract_inverted_index.laws. | 56 |
| abstract_inverted_index.model | 26 |
| abstract_inverted_index.probe | 72 |
| abstract_inverted_index.proxy | 133 |
| abstract_inverted_index.suite | 59 |
| abstract_inverted_index.tasks | 64, 180 |
| abstract_inverted_index.tests | 52 |
| abstract_inverted_index.these | 171 |
| abstract_inverted_index.token | 62 |
| abstract_inverted_index.Hyena, | 158 |
| abstract_inverted_index.Mamba) | 159 |
| abstract_inverted_index.design | 14, 45 |
| abstract_inverted_index.hybrid | 79, 198 |
| abstract_inverted_index.layers | 195 |
| abstract_inverted_index.models | 109 |
| abstract_inverted_index.should | 192 |
| abstract_inverted_index.simple | 143 |
| abstract_inverted_index.space, | 15 |
| abstract_inverted_index.tasks. | 134 |
| abstract_inverted_index.times, | 18 |
| abstract_inverted_index.Through | 57 |
| abstract_inverted_index.between | 110 |
| abstract_inverted_index.budgets | 165 |
| abstract_inverted_index.compute | 21 |
| abstract_inverted_index.curated | 178 |
| abstract_inverted_index.optimal | 190 |
| abstract_inverted_index.process | 36 |
| abstract_inverted_index.provide | 173 |
| abstract_inverted_index.recall, | 69 |
| abstract_inverted_index.results | 172 |
| abstract_inverted_index.scaling | 55, 102, 185 |
| abstract_inverted_index.variety | 84 |
| abstract_inverted_index.Overall, | 170 |
| abstract_inverted_index.accurate | 126 |
| abstract_inverted_index.at-scale | 25 |
| abstract_inverted_index.designed | 70 |
| abstract_inverted_index.enabling | 125 |
| abstract_inverted_index.evidence | 174 |
| abstract_inverted_index.identify | 75 |
| abstract_inverted_index.isolated | 132 |
| abstract_inverted_index.language | 108 |
| abstract_inverted_index.learning | 4 |
| abstract_inverted_index.leverage | 193 |
| abstract_inverted_index.process, | 9 |
| abstract_inverted_index.regimes. | 169 |
| abstract_inverted_index.scaling, | 161 |
| abstract_inverted_index.simplify | 34 |
| abstract_inverted_index.training | 27, 105 |
| abstract_inverted_index.validate | 90 |
| abstract_inverted_index.analysis, | 104 |
| abstract_inverted_index.correlate | 121 |
| abstract_inverted_index.extensive | 96 |
| abstract_inverted_index.grounding | 38 |
| abstract_inverted_index.pipeline, | 47 |
| abstract_inverted_index.recurrent | 155 |
| abstract_inverted_index.resulting | 92 |
| abstract_inverted_index.sparsity, | 149 |
| abstract_inverted_index.synthetic | 61, 179 |
| abstract_inverted_index.topology. | 199 |
| abstract_inverted_index.associated | 23 |
| abstract_inverted_index.capability | 50 |
| abstract_inverted_index.end-to-end | 42 |
| abstract_inverted_index.evaluation | 127 |
| abstract_inverted_index.outperform | 150 |
| abstract_inverted_index.predictive | 53, 183 |
| abstract_inverted_index.synthetics | 119 |
| abstract_inverted_index.compression | 67 |
| abstract_inverted_index.constructed | 81 |
| abstract_inverted_index.development | 1 |
| abstract_inverted_index.evaluation. | 29 |
| abstract_inverted_index.mechanistic | 43 |
| abstract_inverted_index.overtrained | 168 |
| abstract_inverted_index.parameters. | 114 |
| abstract_inverted_index.performance | 176 |
| abstract_inverted_index.perplexity, | 124 |
| abstract_inverted_index.primitives. | 87 |
| abstract_inverted_index.prototyping | 17 |
| abstract_inverted_index.small-scale | 49 |
| abstract_inverted_index.specialized | 194 |
| abstract_inverted_index.Transformer, | 152 |
| abstract_inverted_index.architecture | 44, 191 |
| abstract_inverted_index.encompassing | 48 |
| abstract_inverted_index.manipulation | 63 |
| abstract_inverted_index.Surprisingly, | 115 |
| abstract_inverted_index.architectures | 5, 80, 93, 130, 137, 156 |
| abstract_inverted_index.capabilities, | 73 |
| abstract_inverted_index.computational | 86 |
| abstract_inverted_index.hybridization | 147 |
| abstract_inverted_index.state-optimal | 101 |
| abstract_inverted_index.convolutional, | 153 |
| abstract_inverted_index.experimentally | 89 |
| abstract_inverted_index.(Transformer++, | 157 |
| abstract_inverted_index.compute-optimal | 97, 123, 164 |
| abstract_inverted_index.state-of-the-art | 151 |
| abstract_inverted_index.resource-demanding | 8 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 12 |
| citation_normalized_percentile |