Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2506.00744
We develop hybrid memory architectures for general-purpose sequence processing neural networks, that combine key-value memory using softmax attention (KV-memory) with fast weight memory through dynamic synaptic modulation (FW-memory) -- the core principles of quadratic and linear transformers, respectively. These two memory systems have complementary but individually limited properties: KV-memory offers precise retrieval but is constrained by quadratic complexity in sequence length, while FW-memory supports arbitrarily long sequences and enables more expressive computation but sacrifices precise recall. We propose and compare three methods to blend these two systems into a single memory system, differing in how and when input information is delivered to each system, to leverage the strengths of both. We conduct experiments on general language modeling and retrieval tasks by training 340M- and 1.3B-parameter models from scratch, as well as on synthetic algorithmic tasks designed to precisely illustrate the benefits of certain hybrid methods over others. We also evaluate our hybrid memory systems on reinforcement learning in partially observable environments. Overall, we demonstrate how a well-designed hybrid can overcome the limitations of its individual components, offering new insights into the design principle of neural memory systems.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2506.00744
- https://arxiv.org/pdf/2506.00744
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4414892132
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4414892132Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2506.00744Digital Object Identifier
- Title
-
Blending Complementary Memory Systems in Hybrid Quadratic-Linear TransformersWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-05-31Full publication date if available
- Authors
-
Kazuki Irie, Morris Yau, Samuel J. GershmanList of authors in order
- Landing page
-
https://arxiv.org/abs/2506.00744Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2506.00744Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2506.00744Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4414892132 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2506.00744 |
| ids.doi | https://doi.org/10.48550/arxiv.2506.00744 |
| ids.openalex | https://openalex.org/W4414892132 |
| fwci | |
| type | preprint |
| title | Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10502 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.994700014591217 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2208 |
| topics[0].subfield.display_name | Electrical and Electronic Engineering |
| topics[0].display_name | Advanced Memory and Neural Computing |
| topics[1].id | https://openalex.org/T10054 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9894999861717224 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1708 |
| topics[1].subfield.display_name | Hardware and Architecture |
| topics[1].display_name | Parallel Computing and Optimization Techniques |
| topics[2].id | https://openalex.org/T11181 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9800999760627747 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1705 |
| topics[2].subfield.display_name | Computer Networks and Communications |
| topics[2].display_name | Advanced Data Storage Technologies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2506.00744 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2506.00744 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2506.00744 |
| locations[1].id | doi:10.48550/arxiv.2506.00744 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2506.00744 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5002810304 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Kazuki Irie |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Irie, Kazuki |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5002957102 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Morris Yau |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yau, Morris |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5031715686 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-6546-3298 |
| authorships[2].author.display_name | Samuel J. Gershman |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Gershman, Samuel J. |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2506.00744 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10502 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.994700014591217 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2208 |
| primary_topic.subfield.display_name | Electrical and Electronic Engineering |
| primary_topic.display_name | Advanced Memory and Neural Computing |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2506.00744 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2506.00744 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2506.00744 |
| primary_location.id | pmh:oai:arXiv.org:2506.00744 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2506.00744 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2506.00744 |
| publication_date | 2025-05-31 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 88, 165 |
| abstract_inverted_index.-- | 28 |
| abstract_inverted_index.We | 0, 76, 110, 147 |
| abstract_inverted_index.as | 128, 130 |
| abstract_inverted_index.by | 55, 120 |
| abstract_inverted_index.in | 58, 93, 157 |
| abstract_inverted_index.is | 53, 99 |
| abstract_inverted_index.of | 32, 108, 141, 172, 183 |
| abstract_inverted_index.on | 113, 131, 154 |
| abstract_inverted_index.to | 82, 101, 104, 136 |
| abstract_inverted_index.we | 162 |
| abstract_inverted_index.and | 34, 67, 78, 95, 117, 123 |
| abstract_inverted_index.but | 44, 52, 72 |
| abstract_inverted_index.can | 168 |
| abstract_inverted_index.for | 5 |
| abstract_inverted_index.how | 94, 164 |
| abstract_inverted_index.its | 173 |
| abstract_inverted_index.new | 177 |
| abstract_inverted_index.our | 150 |
| abstract_inverted_index.the | 29, 106, 139, 170, 180 |
| abstract_inverted_index.two | 39, 85 |
| abstract_inverted_index.also | 148 |
| abstract_inverted_index.core | 30 |
| abstract_inverted_index.each | 102 |
| abstract_inverted_index.fast | 20 |
| abstract_inverted_index.from | 126 |
| abstract_inverted_index.have | 42 |
| abstract_inverted_index.into | 87, 179 |
| abstract_inverted_index.long | 65 |
| abstract_inverted_index.more | 69 |
| abstract_inverted_index.over | 145 |
| abstract_inverted_index.that | 11 |
| abstract_inverted_index.well | 129 |
| abstract_inverted_index.when | 96 |
| abstract_inverted_index.with | 19 |
| abstract_inverted_index.340M- | 122 |
| abstract_inverted_index.These | 38 |
| abstract_inverted_index.blend | 83 |
| abstract_inverted_index.both. | 109 |
| abstract_inverted_index.input | 97 |
| abstract_inverted_index.tasks | 119, 134 |
| abstract_inverted_index.these | 84 |
| abstract_inverted_index.three | 80 |
| abstract_inverted_index.using | 15 |
| abstract_inverted_index.while | 61 |
| abstract_inverted_index.design | 181 |
| abstract_inverted_index.hybrid | 2, 143, 151, 167 |
| abstract_inverted_index.linear | 35 |
| abstract_inverted_index.memory | 3, 14, 22, 40, 90, 152, 185 |
| abstract_inverted_index.models | 125 |
| abstract_inverted_index.neural | 9, 184 |
| abstract_inverted_index.offers | 49 |
| abstract_inverted_index.single | 89 |
| abstract_inverted_index.weight | 21 |
| abstract_inverted_index.certain | 142 |
| abstract_inverted_index.combine | 12 |
| abstract_inverted_index.compare | 79 |
| abstract_inverted_index.conduct | 111 |
| abstract_inverted_index.develop | 1 |
| abstract_inverted_index.dynamic | 24 |
| abstract_inverted_index.enables | 68 |
| abstract_inverted_index.general | 114 |
| abstract_inverted_index.length, | 60 |
| abstract_inverted_index.limited | 46 |
| abstract_inverted_index.methods | 81, 144 |
| abstract_inverted_index.others. | 146 |
| abstract_inverted_index.precise | 50, 74 |
| abstract_inverted_index.propose | 77 |
| abstract_inverted_index.recall. | 75 |
| abstract_inverted_index.softmax | 16 |
| abstract_inverted_index.system, | 91, 103 |
| abstract_inverted_index.systems | 41, 86, 153 |
| abstract_inverted_index.through | 23 |
| abstract_inverted_index.Overall, | 161 |
| abstract_inverted_index.benefits | 140 |
| abstract_inverted_index.designed | 135 |
| abstract_inverted_index.evaluate | 149 |
| abstract_inverted_index.insights | 178 |
| abstract_inverted_index.language | 115 |
| abstract_inverted_index.learning | 156 |
| abstract_inverted_index.leverage | 105 |
| abstract_inverted_index.modeling | 116 |
| abstract_inverted_index.offering | 176 |
| abstract_inverted_index.overcome | 169 |
| abstract_inverted_index.scratch, | 127 |
| abstract_inverted_index.sequence | 7, 59 |
| abstract_inverted_index.supports | 63 |
| abstract_inverted_index.synaptic | 25 |
| abstract_inverted_index.systems. | 186 |
| abstract_inverted_index.training | 121 |
| abstract_inverted_index.FW-memory | 62 |
| abstract_inverted_index.KV-memory | 48 |
| abstract_inverted_index.attention | 17 |
| abstract_inverted_index.delivered | 100 |
| abstract_inverted_index.differing | 92 |
| abstract_inverted_index.key-value | 13 |
| abstract_inverted_index.networks, | 10 |
| abstract_inverted_index.partially | 158 |
| abstract_inverted_index.precisely | 137 |
| abstract_inverted_index.principle | 182 |
| abstract_inverted_index.quadratic | 33, 56 |
| abstract_inverted_index.retrieval | 51, 118 |
| abstract_inverted_index.sequences | 66 |
| abstract_inverted_index.strengths | 107 |
| abstract_inverted_index.synthetic | 132 |
| abstract_inverted_index.complexity | 57 |
| abstract_inverted_index.expressive | 70 |
| abstract_inverted_index.illustrate | 138 |
| abstract_inverted_index.individual | 174 |
| abstract_inverted_index.modulation | 26 |
| abstract_inverted_index.observable | 159 |
| abstract_inverted_index.principles | 31 |
| abstract_inverted_index.processing | 8 |
| abstract_inverted_index.sacrifices | 73 |
| abstract_inverted_index.(FW-memory) | 27 |
| abstract_inverted_index.(KV-memory) | 18 |
| abstract_inverted_index.algorithmic | 133 |
| abstract_inverted_index.arbitrarily | 64 |
| abstract_inverted_index.components, | 175 |
| abstract_inverted_index.computation | 71 |
| abstract_inverted_index.constrained | 54 |
| abstract_inverted_index.demonstrate | 163 |
| abstract_inverted_index.experiments | 112 |
| abstract_inverted_index.information | 98 |
| abstract_inverted_index.limitations | 171 |
| abstract_inverted_index.properties: | 47 |
| abstract_inverted_index.individually | 45 |
| abstract_inverted_index.architectures | 4 |
| abstract_inverted_index.complementary | 43 |
| abstract_inverted_index.environments. | 160 |
| abstract_inverted_index.reinforcement | 155 |
| abstract_inverted_index.respectively. | 37 |
| abstract_inverted_index.transformers, | 36 |
| abstract_inverted_index.well-designed | 166 |
| abstract_inverted_index.1.3B-parameter | 124 |
| abstract_inverted_index.general-purpose | 6 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |