Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale Article Swipe
YOU?
·
· 2025
· Open Access
·
We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations. First, operators in hybrid models can be tailored to token manipulation tasks such as in-context recall, multi-token recall, and compression, with input-dependent convolutions and attention offering complementary performance. Second, co-designing convolution operators and hardware-aware algorithms enables efficiency gains in regimes where previous alternative architectures struggle to surpass Transformers. At the 40 billion parameter scale, we train end-to-end 1.2 to 2.9 times faster than optimized Transformers, and 1.1 to 1.4 times faster than previous generation hybrids. On H100 GPUs and model width 4096, individual operators in the proposed multi-hybrid StripedHyena 2 architecture achieve two-fold throughput improvement over linear attention and state-space models. Multi-hybrids excel at sequence modeling over byte-tokenized data, as demonstrated by the Evo 2 line of models. We discuss the foundations that enable these results, including architecture design, overlap-add blocked kernels for tensor cores, and dedicated all-to-all and point-to-point context parallelism strategies.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- http://arxiv.org/abs/2503.01868
- https://arxiv.org/pdf/2503.01868
- OA Status
- green
- Cited By
- 1
- OpenAlex ID
- https://openalex.org/W4415339007
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415339007Canonical identifier for this work in OpenAlex
- Title
-
Systems and Algorithms for Convolutional Multi-Hybrid Language Models at ScaleWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-25Full publication date if available
- Authors
-
Ja‐Lok Ku, Eric Nguyen, David W. Romero, Garyk Brixi, B. S. Yang, A. A. Vorontsov, Ali Taghibakhshi, Amy X. Lu, Donald S. Burke, Greg Brockman, Stefano Massaroli, Christopher Ré, Patrick D. Hsu, Brian Hie, Stefano Ermon, Michael PoliList of authors in order
- Landing page
-
https://arxiv.org/abs/2503.01868Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2503.01868Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2503.01868Direct OA link when available
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
Full payload
| id | https://openalex.org/W4415339007 |
|---|---|
| doi | |
| ids.openalex | https://openalex.org/W4415339007 |
| fwci | 4.81974515 |
| type | article |
| title | Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.8751000165939331 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.8726000189781189 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2503.01868 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2503.01868 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2503.01868 |
| indexed_in | arxiv |
| authorships[0].author.id | https://openalex.org/A5001974637 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-7090-537X |
| authorships[0].author.display_name | Ja‐Lok Ku |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ku, Jerome |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5033535593 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Eric Nguyen |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Nguyen, Eric |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5084474296 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-5446-1070 |
| authorships[2].author.display_name | David W. Romero |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Romero, David W. |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5091078534 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-1253-0522 |
| authorships[3].author.display_name | Garyk Brixi |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Brixi, Garyk |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5023581431 |
| authorships[4].author.orcid | https://orcid.org/0009-0008-5942-316X |
| authorships[4].author.display_name | B. S. Yang |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Yang, Brandon |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5083748791 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-7946-0576 |
| authorships[5].author.display_name | A. A. Vorontsov |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Vorontsov, Anton |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5089955412 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Ali Taghibakhshi |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Taghibakhshi, Ali |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5023348845 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Amy X. Lu |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Lu, Amy X. |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5112420205 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Donald S. Burke |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Burke, Dave P. |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5116367404 |
| authorships[9].author.orcid | |
| authorships[9].author.display_name | Greg Brockman |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Brockman, Greg |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5053046176 |
| authorships[10].author.orcid | https://orcid.org/0000-0003-3788-6290 |
| authorships[10].author.display_name | Stefano Massaroli |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Massaroli, Stefano |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5103852640 |
| authorships[11].author.orcid | |
| authorships[11].author.display_name | Christopher Ré |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | Ré, Christopher |
| authorships[11].is_corresponding | False |
| authorships[12].author.id | https://openalex.org/A5060049408 |
| authorships[12].author.orcid | https://orcid.org/0000-0002-9380-2648 |
| authorships[12].author.display_name | Patrick D. Hsu |
| authorships[12].author_position | middle |
| authorships[12].raw_author_name | Hsu, Patrick D. |
| authorships[12].is_corresponding | False |
| authorships[13].author.id | https://openalex.org/A5021962955 |
| authorships[13].author.orcid | https://orcid.org/0000-0003-3224-8142 |
| authorships[13].author.display_name | Brian Hie |
| authorships[13].author_position | middle |
| authorships[13].raw_author_name | Hie, Brian L. |
| authorships[13].is_corresponding | False |
| authorships[14].author.id | https://openalex.org/A5091179481 |
| authorships[14].author.orcid | https://orcid.org/0000-0003-0039-2887 |
| authorships[14].author.display_name | Stefano Ermon |
| authorships[14].author_position | middle |
| authorships[14].raw_author_name | Ermon, Stefano |
| authorships[14].is_corresponding | False |
| authorships[15].author.id | https://openalex.org/A5078213488 |
| authorships[15].author.orcid | https://orcid.org/0000-0001-5384-9372 |
| authorships[15].author.display_name | Michael Poli |
| authorships[15].author_position | last |
| authorships[15].raw_author_name | Poli, Michael |
| authorships[15].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2503.01868 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-19T00:00:00 |
| display_name | Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T04:12:42.849631 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.8751000165939331 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | pmh:oai:arXiv.org:2503.01868 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2503.01868 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2503.01868 |
| primary_location.id | pmh:oai:arXiv.org:2503.01868 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2503.01868 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2503.01868 |
| publication_date | 2025-02-25 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.2 | 102, 127 |
| abstract_inverted_index.a | 6 |
| abstract_inverted_index.40 | 63 |
| abstract_inverted_index.At | 61 |
| abstract_inverted_index.On | 88 |
| abstract_inverted_index.We | 0, 131 |
| abstract_inverted_index.as | 26, 122 |
| abstract_inverted_index.at | 116 |
| abstract_inverted_index.be | 19 |
| abstract_inverted_index.by | 124 |
| abstract_inverted_index.in | 15, 51, 97 |
| abstract_inverted_index.of | 129 |
| abstract_inverted_index.on | 9 |
| abstract_inverted_index.to | 21, 58, 71, 80 |
| abstract_inverted_index.we | 67 |
| abstract_inverted_index.1.1 | 79 |
| abstract_inverted_index.1.2 | 70 |
| abstract_inverted_index.1.4 | 81 |
| abstract_inverted_index.2.9 | 72 |
| abstract_inverted_index.Evo | 126 |
| abstract_inverted_index.and | 31, 36, 45, 78, 91, 111, 148, 151 |
| abstract_inverted_index.can | 18 |
| abstract_inverted_index.for | 145 |
| abstract_inverted_index.the | 62, 98, 125, 133 |
| abstract_inverted_index.two | 10 |
| abstract_inverted_index.GPUs | 90 |
| abstract_inverted_index.H100 | 89 |
| abstract_inverted_index.line | 128 |
| abstract_inverted_index.over | 108, 119 |
| abstract_inverted_index.such | 25 |
| abstract_inverted_index.than | 75, 84 |
| abstract_inverted_index.that | 135 |
| abstract_inverted_index.with | 5, 33 |
| abstract_inverted_index.4096, | 94 |
| abstract_inverted_index.data, | 121 |
| abstract_inverted_index.excel | 115 |
| abstract_inverted_index.gains | 50 |
| abstract_inverted_index.model | 92 |
| abstract_inverted_index.tasks | 24 |
| abstract_inverted_index.these | 137 |
| abstract_inverted_index.times | 73, 82 |
| abstract_inverted_index.token | 22 |
| abstract_inverted_index.train | 68 |
| abstract_inverted_index.where | 53 |
| abstract_inverted_index.width | 93 |
| abstract_inverted_index.First, | 13 |
| abstract_inverted_index.cores, | 147 |
| abstract_inverted_index.design | 7 |
| abstract_inverted_index.enable | 136 |
| abstract_inverted_index.faster | 74, 83 |
| abstract_inverted_index.hybrid | 16 |
| abstract_inverted_index.linear | 109 |
| abstract_inverted_index.models | 17 |
| abstract_inverted_index.scale, | 66 |
| abstract_inverted_index.simple | 11 |
| abstract_inverted_index.tensor | 146 |
| abstract_inverted_index.Second, | 41 |
| abstract_inverted_index.achieve | 104 |
| abstract_inverted_index.billion | 64 |
| abstract_inverted_index.blocked | 143 |
| abstract_inverted_index.context | 153 |
| abstract_inverted_index.design, | 141 |
| abstract_inverted_index.discuss | 132 |
| abstract_inverted_index.enables | 48 |
| abstract_inverted_index.kernels | 144 |
| abstract_inverted_index.models. | 113, 130 |
| abstract_inverted_index.recall, | 28, 30 |
| abstract_inverted_index.regimes | 52 |
| abstract_inverted_index.surpass | 59 |
| abstract_inverted_index.grounded | 8 |
| abstract_inverted_index.hybrids. | 87 |
| abstract_inverted_index.modeling | 118 |
| abstract_inverted_index.offering | 38 |
| abstract_inverted_index.previous | 54, 85 |
| abstract_inverted_index.proposed | 99 |
| abstract_inverted_index.results, | 138 |
| abstract_inverted_index.sequence | 117 |
| abstract_inverted_index.struggle | 57 |
| abstract_inverted_index.tailored | 20 |
| abstract_inverted_index.two-fold | 105 |
| abstract_inverted_index.attention | 37, 110 |
| abstract_inverted_index.dedicated | 149 |
| abstract_inverted_index.including | 139 |
| abstract_inverted_index.introduce | 1 |
| abstract_inverted_index.operators | 14, 44, 96 |
| abstract_inverted_index.optimized | 76 |
| abstract_inverted_index.parameter | 65 |
| abstract_inverted_index.algorithms | 47 |
| abstract_inverted_index.all-to-all | 150 |
| abstract_inverted_index.efficiency | 49 |
| abstract_inverted_index.end-to-end | 69 |
| abstract_inverted_index.generation | 86 |
| abstract_inverted_index.in-context | 27 |
| abstract_inverted_index.individual | 95 |
| abstract_inverted_index.throughput | 106 |
| abstract_inverted_index.alternative | 55 |
| abstract_inverted_index.convolution | 43 |
| abstract_inverted_index.foundations | 134 |
| abstract_inverted_index.improvement | 107 |
| abstract_inverted_index.multi-token | 29 |
| abstract_inverted_index.overlap-add | 142 |
| abstract_inverted_index.parallelism | 154 |
| abstract_inverted_index.state-space | 112 |
| abstract_inverted_index.strategies. | 155 |
| abstract_inverted_index.StripedHyena | 101 |
| abstract_inverted_index.architecture | 103, 140 |
| abstract_inverted_index.co-designing | 42 |
| abstract_inverted_index.compression, | 32 |
| abstract_inverted_index.convolutions | 35 |
| abstract_inverted_index.demonstrated | 123 |
| abstract_inverted_index.manipulation | 23 |
| abstract_inverted_index.multi-hybrid | 3, 100 |
| abstract_inverted_index.performance. | 40 |
| abstract_inverted_index.Multi-hybrids | 114 |
| abstract_inverted_index.Transformers, | 77 |
| abstract_inverted_index.Transformers. | 60 |
| abstract_inverted_index.architectures | 56 |
| abstract_inverted_index.complementary | 39 |
| abstract_inverted_index.convolutional | 2 |
| abstract_inverted_index.observations. | 12 |
| abstract_inverted_index.architectures, | 4 |
| abstract_inverted_index.byte-tokenized | 120 |
| abstract_inverted_index.hardware-aware | 46 |
| abstract_inverted_index.point-to-point | 152 |
| abstract_inverted_index.input-dependent | 34 |
| cited_by_percentile_year.max | 95 |
| cited_by_percentile_year.min | 91 |
| countries_distinct_count | 0 |
| institutions_distinct_count | 16 |
| citation_normalized_percentile.value | 0.95936471 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |