Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2310.12109
Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension? We introduce Monarch Mixer (M2), a new architecture that uses the same sub-quadratic primitive along both sequence length and model dimension: Monarch matrices, a simple class of expressive structured matrices that captures many linear transforms, achieves high hardware efficiency on GPUs, and scales sub-quadratically. As a proof of concept, we explore the performance of M2 in three domains: non-causal BERT-style language modeling, ViT-style image classification, and causal GPT-style language modeling. For non-causal BERT-style modeling, M2 matches BERT-base and BERT-large in downstream GLUE quality with up to 27% fewer parameters, and achieves up to 9.1$\times$ higher throughput at sequence length 4K. On ImageNet, M2 outperforms ViT-b by 1% in accuracy, with only half the parameters. Causal GPT-style models introduce a technical challenge: enforcing causality via masking introduces a quadratic bottleneck. To alleviate this bottleneck, we develop a novel theoretical view of Monarch matrices based on multivariate polynomial evaluation and interpolation, which lets us parameterize M2 to be causal while remaining sub-quadratic. Using this parameterization, M2 matches GPT-style Transformers at 360M parameters in pretraining perplexity on The PILE--showing for the first time that it may be possible to match Transformer quality without attention or MLPs.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2310.12109
- https://arxiv.org/pdf/2310.12109
- OA Status
- green
- Cited By
- 14
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4387804367
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4387804367Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2310.12109Digital Object Identifier
- Title
-
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based ArchitectureWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-10-18Full publication date if available
- Authors
-
Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher RéList of authors in order
- Landing page
-
https://arxiv.org/abs/2310.12109Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2310.12109Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2310.12109Direct OA link when available
- Concepts
-
Quadratic growth, Computer science, Bottleneck, Algorithm, Quadratic equation, Quadratic programming, Theoretical computer science, Mathematics, Mathematical optimization, Embedded system, GeometryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
14Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 9, 2024: 4, 2023: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4387804367 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2310.12109 |
| ids.doi | https://doi.org/10.48550/arxiv.2310.12109 |
| ids.openalex | https://openalex.org/W4387804367 |
| fwci | |
| type | preprint |
| title | Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10320 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9908999800682068 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Neural Networks and Applications |
| topics[1].id | https://openalex.org/T12535 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9886999726295471 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Machine Learning and Data Classification |
| topics[2].id | https://openalex.org/T10028 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9872999787330627 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Topic Modeling |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C195956108 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6881837844848633 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q7268362 |
| concepts[0].display_name | Quadratic growth |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6237753033638 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C2780513914 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5853502154350281 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q18210350 |
| concepts[2].display_name | Bottleneck |
| concepts[3].id | https://openalex.org/C11413529 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5434451699256897 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[3].display_name | Algorithm |
| concepts[4].id | https://openalex.org/C129844170 |
| concepts[4].level | 2 |
| concepts[4].score | 0.533416211605072 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q41299 |
| concepts[4].display_name | Quadratic equation |
| concepts[5].id | https://openalex.org/C81845259 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4180000424385071 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q290117 |
| concepts[5].display_name | Quadratic programming |
| concepts[6].id | https://openalex.org/C80444323 |
| concepts[6].level | 1 |
| concepts[6].score | 0.3895336091518402 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2878974 |
| concepts[6].display_name | Theoretical computer science |
| concepts[7].id | https://openalex.org/C33923547 |
| concepts[7].level | 0 |
| concepts[7].score | 0.24497103691101074 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[7].display_name | Mathematics |
| concepts[8].id | https://openalex.org/C126255220 |
| concepts[8].level | 1 |
| concepts[8].score | 0.18296310305595398 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[8].display_name | Mathematical optimization |
| concepts[9].id | https://openalex.org/C149635348 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q193040 |
| concepts[9].display_name | Embedded system |
| concepts[10].id | https://openalex.org/C2524010 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q8087 |
| concepts[10].display_name | Geometry |
| keywords[0].id | https://openalex.org/keywords/quadratic-growth |
| keywords[0].score | 0.6881837844848633 |
| keywords[0].display_name | Quadratic growth |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6237753033638 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/bottleneck |
| keywords[2].score | 0.5853502154350281 |
| keywords[2].display_name | Bottleneck |
| keywords[3].id | https://openalex.org/keywords/algorithm |
| keywords[3].score | 0.5434451699256897 |
| keywords[3].display_name | Algorithm |
| keywords[4].id | https://openalex.org/keywords/quadratic-equation |
| keywords[4].score | 0.533416211605072 |
| keywords[4].display_name | Quadratic equation |
| keywords[5].id | https://openalex.org/keywords/quadratic-programming |
| keywords[5].score | 0.4180000424385071 |
| keywords[5].display_name | Quadratic programming |
| keywords[6].id | https://openalex.org/keywords/theoretical-computer-science |
| keywords[6].score | 0.3895336091518402 |
| keywords[6].display_name | Theoretical computer science |
| keywords[7].id | https://openalex.org/keywords/mathematics |
| keywords[7].score | 0.24497103691101074 |
| keywords[7].display_name | Mathematics |
| keywords[8].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[8].score | 0.18296310305595398 |
| keywords[8].display_name | Mathematical optimization |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2310.12109 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2310.12109 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2310.12109 |
| locations[1].id | doi:10.48550/arxiv.2310.12109 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2310.12109 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5032865467 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-2500-2577 |
| authorships[0].author.display_name | Daniel Y. Fu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Fu, Daniel Y. |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5055978130 |
| authorships[1].author.orcid | https://orcid.org/0009-0002-5271-2117 |
| authorships[1].author.display_name | Simran Arora |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Arora, Simran |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5039570590 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Jessica Grogan |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Grogan, Jessica |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5024441871 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Isys Johnson |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Johnson, Isys |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5011478764 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-8412-0266 |
| authorships[4].author.display_name | Sabri Eyuboglu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Eyuboglu, Sabri |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5070255304 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-9947-5705 |
| authorships[5].author.display_name | Armin W. Thomas |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Thomas, Armin W. |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5004675499 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-0468-5986 |
| authorships[6].author.display_name | Benjamin Spector |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Spector, Benjamin |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5078213488 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-5384-9372 |
| authorships[7].author.display_name | Michael Poli |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Poli, Michael |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5001041485 |
| authorships[8].author.orcid | https://orcid.org/0000-0003-4136-4719 |
| authorships[8].author.display_name | Atri Rudra |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Rudra, Atri |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5103852640 |
| authorships[9].author.orcid | |
| authorships[9].author.display_name | Christopher Ré |
| authorships[9].author_position | last |
| authorships[9].raw_author_name | Ré, Christopher |
| authorships[9].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2310.12109 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10320 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9908999800682068 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Neural Networks and Applications |
| related_works | https://openalex.org/W2112847829, https://openalex.org/W2023096387, https://openalex.org/W2077146756, https://openalex.org/W1878408459, https://openalex.org/W4297407962, https://openalex.org/W2753327353, https://openalex.org/W4297797735, https://openalex.org/W4238939226, https://openalex.org/W4293088966, https://openalex.org/W2917463375 |
| cited_by_count | 14 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 9 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 4 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2310.12109 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2310.12109 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2310.12109 |
| primary_location.id | pmh:oai:arXiv.org:2310.12109 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2310.12109 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2310.12109 |
| publication_date | 2023-10-18 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 54, 72, 94, 167, 175, 184 |
| abstract_inverted_index.1% | 155 |
| abstract_inverted_index.As | 93 |
| abstract_inverted_index.M2 | 103, 123, 151, 202, 212 |
| abstract_inverted_index.On | 149 |
| abstract_inverted_index.To | 178 |
| abstract_inverted_index.We | 33, 49 |
| abstract_inverted_index.as | 25 |
| abstract_inverted_index.at | 145, 216 |
| abstract_inverted_index.be | 204, 232 |
| abstract_inverted_index.by | 154 |
| abstract_inverted_index.in | 7, 104, 128, 156, 219 |
| abstract_inverted_index.it | 230 |
| abstract_inverted_index.of | 75, 96, 102, 188 |
| abstract_inverted_index.on | 88, 192, 222 |
| abstract_inverted_index.or | 240 |
| abstract_inverted_index.to | 14, 134, 141, 203, 234 |
| abstract_inverted_index.up | 133, 140 |
| abstract_inverted_index.us | 200 |
| abstract_inverted_index.we | 98, 182 |
| abstract_inverted_index.27% | 135 |
| abstract_inverted_index.4K. | 148 |
| abstract_inverted_index.For | 119 |
| abstract_inverted_index.The | 223 |
| abstract_inverted_index.and | 11, 18, 46, 67, 90, 114, 126, 138, 196 |
| abstract_inverted_index.are | 3, 35 |
| abstract_inverted_index.can | 40 |
| abstract_inverted_index.for | 225 |
| abstract_inverted_index.may | 231 |
| abstract_inverted_index.new | 55 |
| abstract_inverted_index.the | 59, 100, 161, 226 |
| abstract_inverted_index.via | 172 |
| abstract_inverted_index.360M | 217 |
| abstract_inverted_index.GLUE | 130 |
| abstract_inverted_index.ask: | 34 |
| abstract_inverted_index.both | 8, 30, 64 |
| abstract_inverted_index.half | 160 |
| abstract_inverted_index.high | 85 |
| abstract_inverted_index.lets | 199 |
| abstract_inverted_index.many | 81 |
| abstract_inverted_index.only | 159 |
| abstract_inverted_index.same | 60 |
| abstract_inverted_index.such | 24 |
| abstract_inverted_index.that | 39, 57, 79, 229 |
| abstract_inverted_index.this | 180, 210 |
| abstract_inverted_index.time | 228 |
| abstract_inverted_index.uses | 58 |
| abstract_inverted_index.view | 187 |
| abstract_inverted_index.with | 132, 158 |
| abstract_inverted_index.(M2), | 53 |
| abstract_inverted_index.GPUs, | 89 |
| abstract_inverted_index.MLPs. | 241 |
| abstract_inverted_index.Mixer | 52 |
| abstract_inverted_index.Using | 209 |
| abstract_inverted_index.ViT-b | 153 |
| abstract_inverted_index.along | 29, 43, 63 |
| abstract_inverted_index.axes. | 32 |
| abstract_inverted_index.based | 191 |
| abstract_inverted_index.being | 5 |
| abstract_inverted_index.class | 74 |
| abstract_inverted_index.fewer | 136 |
| abstract_inverted_index.first | 227 |
| abstract_inverted_index.image | 112 |
| abstract_inverted_index.match | 235 |
| abstract_inverted_index.model | 12, 47, 68 |
| abstract_inverted_index.novel | 185 |
| abstract_inverted_index.proof | 95 |
| abstract_inverted_index.reach | 15 |
| abstract_inverted_index.scale | 27, 41 |
| abstract_inverted_index.there | 36 |
| abstract_inverted_index.these | 31 |
| abstract_inverted_index.three | 105 |
| abstract_inverted_index.which | 198 |
| abstract_inverted_index.while | 206 |
| abstract_inverted_index.Causal | 163 |
| abstract_inverted_index.better | 19 |
| abstract_inverted_index.causal | 115, 205 |
| abstract_inverted_index.higher | 143 |
| abstract_inverted_index.length | 10, 45, 66, 147 |
| abstract_inverted_index.linear | 82 |
| abstract_inverted_index.longer | 16 |
| abstract_inverted_index.models | 2, 165 |
| abstract_inverted_index.scaled | 6 |
| abstract_inverted_index.scales | 91 |
| abstract_inverted_index.simple | 73 |
| abstract_inverted_index.Machine | 0 |
| abstract_inverted_index.Monarch | 51, 70, 189 |
| abstract_inverted_index.develop | 183 |
| abstract_inverted_index.explore | 99 |
| abstract_inverted_index.masking | 173 |
| abstract_inverted_index.matches | 124, 213 |
| abstract_inverted_index.quality | 131, 237 |
| abstract_inverted_index.without | 238 |
| abstract_inverted_index.However, | 21 |
| abstract_inverted_index.achieves | 84, 139 |
| abstract_inverted_index.captures | 80 |
| abstract_inverted_index.concept, | 97 |
| abstract_inverted_index.contexts | 17 |
| abstract_inverted_index.domains: | 106 |
| abstract_inverted_index.existing | 22 |
| abstract_inverted_index.hardware | 86 |
| abstract_inverted_index.language | 109, 117 |
| abstract_inverted_index.learning | 1 |
| abstract_inverted_index.matrices | 78, 190 |
| abstract_inverted_index.possible | 233 |
| abstract_inverted_index.sequence | 9, 44, 65, 146 |
| abstract_inverted_index.BERT-base | 125 |
| abstract_inverted_index.GPT-style | 116, 164, 214 |
| abstract_inverted_index.ImageNet, | 150 |
| abstract_inverted_index.ViT-style | 111 |
| abstract_inverted_index.accuracy, | 157 |
| abstract_inverted_index.alleviate | 179 |
| abstract_inverted_index.attention | 239 |
| abstract_inverted_index.causality | 171 |
| abstract_inverted_index.dimension | 13 |
| abstract_inverted_index.enforcing | 170 |
| abstract_inverted_index.introduce | 50, 166 |
| abstract_inverted_index.matrices, | 71 |
| abstract_inverted_index.modeling, | 110, 122 |
| abstract_inverted_index.modeling. | 118 |
| abstract_inverted_index.primitive | 62 |
| abstract_inverted_index.quadratic | 176 |
| abstract_inverted_index.remaining | 207 |
| abstract_inverted_index.technical | 168 |
| abstract_inverted_index.BERT-large | 127 |
| abstract_inverted_index.BERT-style | 108, 121 |
| abstract_inverted_index.challenge: | 169 |
| abstract_inverted_index.dimension: | 69 |
| abstract_inverted_index.dimension? | 48 |
| abstract_inverted_index.downstream | 129 |
| abstract_inverted_index.efficiency | 87 |
| abstract_inverted_index.evaluation | 195 |
| abstract_inverted_index.expressive | 76 |
| abstract_inverted_index.introduces | 174 |
| abstract_inverted_index.non-causal | 107, 120 |
| abstract_inverted_index.parameters | 218 |
| abstract_inverted_index.performant | 37 |
| abstract_inverted_index.perplexity | 221 |
| abstract_inverted_index.polynomial | 194 |
| abstract_inverted_index.structured | 77 |
| abstract_inverted_index.throughput | 144 |
| abstract_inverted_index.9.1$\times$ | 142 |
| abstract_inverted_index.Transformer | 236 |
| abstract_inverted_index.bottleneck, | 181 |
| abstract_inverted_index.bottleneck. | 177 |
| abstract_inverted_index.outperforms | 152 |
| abstract_inverted_index.parameters, | 137 |
| abstract_inverted_index.parameters. | 162 |
| abstract_inverted_index.performance | 101 |
| abstract_inverted_index.pretraining | 220 |
| abstract_inverted_index.theoretical | 186 |
| abstract_inverted_index.transforms, | 83 |
| abstract_inverted_index.Transformers | 26, 215 |
| abstract_inverted_index.architecture | 56 |
| abstract_inverted_index.increasingly | 4 |
| abstract_inverted_index.multivariate | 193 |
| abstract_inverted_index.parameterize | 201 |
| abstract_inverted_index.performance. | 20 |
| abstract_inverted_index.PILE--showing | 224 |
| abstract_inverted_index.architectures | 23, 38 |
| abstract_inverted_index.quadratically | 28 |
| abstract_inverted_index.sub-quadratic | 61 |
| abstract_inverted_index.interpolation, | 197 |
| abstract_inverted_index.sub-quadratic. | 208 |
| abstract_inverted_index.classification, | 113 |
| abstract_inverted_index.parameterization, | 211 |
| abstract_inverted_index.sub-quadratically | 42 |
| abstract_inverted_index.sub-quadratically. | 92 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 10 |
| citation_normalized_percentile |