GatedFWA: Linear Flash Windowed Attention with Gated Associative Memory Article Swipe
Modern autoregressive models rely on attention, yet the Softmax full attention in Transformers scales quadratically with sequence length. Sliding Window Attention (SWA) achieves linear-time encoding/decoding by constraining the attention pattern, but under an \textit{Associative Memory} interpretation, its difference-style update renders the training objective effectively \emph{unbounded}. In contrast, Softmax attention normalizes updates, leading to \emph{memory shrinkage and gradient vanishing}. We propose GatedFWA: a Memory-\underline{Gated} (\underline{F}lash) \underline{W}indowed \underline{A}ttention mechanism that preserves SWAs efficiency while stabilizing memory updates and making gradient flow controllable. In essence, GatedFWA accumulate a per-token/head gate into a decay bias added to the attention logits, acting as a learnable contraction in the memory recurrence. We implement a fused one-pass gate preprocessing and a FlashAttention-compatible kernel that injects the gate under a sliding mask, ensuring I/O efficiency and numerical stability. On language modelling benchmarks, GatedFWA delivers competitive throughput with negligible overhead and better use of global context, and it integrates cleanly with token compression/selection methods such as NSA and generalizes to various autoregressive domains.
Related Topics
- Type
- article
- Landing Page
- http://arxiv.org/abs/2512.07782
- https://arxiv.org/pdf/2512.07782
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W7113916381
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W7113916381Canonical identifier for this work in OpenAlex
- Title
-
GatedFWA: Linear Flash Windowed Attention with Gated Associative MemoryWork title
- Type
-
articleOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-12-08Full publication date if available
- Authors
-
Liu Jiaxu, Bai Yuhe, Bouganis, Christos-SavvasList of authors in order
- Landing page
-
https://arxiv.org/abs/2512.07782Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2512.07782Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2512.07782Direct OA link when available
- Concepts
-
Softmax function, Computer science, Autoregressive model, Artificial intelligence, Preprocessor, Gradient descent, High memory, Algorithm, Speech recognition, Transformer, Sliding window protocol, Overhead (engineering), Robustness (evolution), Quantization (signal processing), Sketch, Security token, Memory bandwidth, Sequence (biology), Language model, Sequence learning, Memory model, Regularization (linguistics), Recall, Mechanism (biology), Throughput, Treebank, Content-addressable memoryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W7113916381 |
|---|---|
| doi | |
| ids.openalex | https://openalex.org/W7113916381 |
| fwci | 0.0 |
| type | article |
| title | GatedFWA: Linear Flash Windowed Attention with Gated Associative Memory |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10775 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.21674776077270508 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Generative Adversarial Networks and Image Synthesis |
| topics[1].id | https://openalex.org/T10054 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.11858642846345901 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1708 |
| topics[1].subfield.display_name | Hardware and Architecture |
| topics[1].display_name | Parallel Computing and Optimization Techniques |
| topics[2].id | https://openalex.org/T10028 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.09168491512537003 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Topic Modeling |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C188441871 |
| concepts[0].level | 3 |
| concepts[0].score | 0.7478923797607422 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q7554146 |
| concepts[0].display_name | Softmax function |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7139334678649902 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C159877910 |
| concepts[2].level | 2 |
| concepts[2].score | 0.4866373538970947 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q2202883 |
| concepts[2].display_name | Autoregressive model |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.4575445055961609 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C34736171 |
| concepts[4].level | 2 |
| concepts[4].score | 0.39892736077308655 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q918333 |
| concepts[4].display_name | Preprocessor |
| concepts[5].id | https://openalex.org/C153258448 |
| concepts[5].level | 3 |
| concepts[5].score | 0.39358973503112793 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1199743 |
| concepts[5].display_name | Gradient descent |
| concepts[6].id | https://openalex.org/C2781357197 |
| concepts[6].level | 2 |
| concepts[6].score | 0.38932734727859497 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q5757597 |
| concepts[6].display_name | High memory |
| concepts[7].id | https://openalex.org/C11413529 |
| concepts[7].level | 1 |
| concepts[7].score | 0.38686588406562805 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[7].display_name | Algorithm |
| concepts[8].id | https://openalex.org/C28490314 |
| concepts[8].level | 1 |
| concepts[8].score | 0.38495680689811707 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[8].display_name | Speech recognition |
| concepts[9].id | https://openalex.org/C66322947 |
| concepts[9].level | 3 |
| concepts[9].score | 0.37439677119255066 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[9].display_name | Transformer |
| concepts[10].id | https://openalex.org/C102392041 |
| concepts[10].level | 3 |
| concepts[10].score | 0.3638613820075989 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q592860 |
| concepts[10].display_name | Sliding window protocol |
| concepts[11].id | https://openalex.org/C2779960059 |
| concepts[11].level | 2 |
| concepts[11].score | 0.3511905372142792 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q7113681 |
| concepts[11].display_name | Overhead (engineering) |
| concepts[12].id | https://openalex.org/C63479239 |
| concepts[12].level | 3 |
| concepts[12].score | 0.34775876998901367 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q7353546 |
| concepts[12].display_name | Robustness (evolution) |
| concepts[13].id | https://openalex.org/C28855332 |
| concepts[13].level | 2 |
| concepts[13].score | 0.32041868567466736 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q198099 |
| concepts[13].display_name | Quantization (signal processing) |
| concepts[14].id | https://openalex.org/C2779231336 |
| concepts[14].level | 2 |
| concepts[14].score | 0.30598366260528564 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q7534724 |
| concepts[14].display_name | Sketch |
| concepts[15].id | https://openalex.org/C48145219 |
| concepts[15].level | 2 |
| concepts[15].score | 0.3006910979747772 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q1335365 |
| concepts[15].display_name | Security token |
| concepts[16].id | https://openalex.org/C188045654 |
| concepts[16].level | 2 |
| concepts[16].score | 0.2897407114505768 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q17148339 |
| concepts[16].display_name | Memory bandwidth |
| concepts[17].id | https://openalex.org/C2778112365 |
| concepts[17].level | 2 |
| concepts[17].score | 0.28307482600212097 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q3511065 |
| concepts[17].display_name | Sequence (biology) |
| concepts[18].id | https://openalex.org/C137293760 |
| concepts[18].level | 2 |
| concepts[18].score | 0.27856630086898804 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[18].display_name | Language model |
| concepts[19].id | https://openalex.org/C40506919 |
| concepts[19].level | 2 |
| concepts[19].score | 0.27825498580932617 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q7452469 |
| concepts[19].display_name | Sequence learning |
| concepts[20].id | https://openalex.org/C12186640 |
| concepts[20].level | 3 |
| concepts[20].score | 0.27654626965522766 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q6815743 |
| concepts[20].display_name | Memory model |
| concepts[21].id | https://openalex.org/C2776135515 |
| concepts[21].level | 2 |
| concepts[21].score | 0.26855167746543884 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q17143721 |
| concepts[21].display_name | Regularization (linguistics) |
| concepts[22].id | https://openalex.org/C100660578 |
| concepts[22].level | 2 |
| concepts[22].score | 0.2652367949485779 |
| concepts[22].wikidata | https://www.wikidata.org/wiki/Q18733 |
| concepts[22].display_name | Recall |
| concepts[23].id | https://openalex.org/C89611455 |
| concepts[23].level | 2 |
| concepts[23].score | 0.26518017053604126 |
| concepts[23].wikidata | https://www.wikidata.org/wiki/Q6804646 |
| concepts[23].display_name | Mechanism (biology) |
| concepts[24].id | https://openalex.org/C157764524 |
| concepts[24].level | 3 |
| concepts[24].score | 0.25582554936408997 |
| concepts[24].wikidata | https://www.wikidata.org/wiki/Q1383412 |
| concepts[24].display_name | Throughput |
| concepts[25].id | https://openalex.org/C206134035 |
| concepts[25].level | 3 |
| concepts[25].score | 0.25526246428489685 |
| concepts[25].wikidata | https://www.wikidata.org/wiki/Q811525 |
| concepts[25].display_name | Treebank |
| concepts[26].id | https://openalex.org/C53442348 |
| concepts[26].level | 3 |
| concepts[26].score | 0.25145360827445984 |
| concepts[26].wikidata | https://www.wikidata.org/wiki/Q745101 |
| concepts[26].display_name | Content-addressable memory |
| keywords[0].id | https://openalex.org/keywords/softmax-function |
| keywords[0].score | 0.7478923797607422 |
| keywords[0].display_name | Softmax function |
| keywords[1].id | https://openalex.org/keywords/autoregressive-model |
| keywords[1].score | 0.4866373538970947 |
| keywords[1].display_name | Autoregressive model |
| keywords[2].id | https://openalex.org/keywords/preprocessor |
| keywords[2].score | 0.39892736077308655 |
| keywords[2].display_name | Preprocessor |
| keywords[3].id | https://openalex.org/keywords/gradient-descent |
| keywords[3].score | 0.39358973503112793 |
| keywords[3].display_name | Gradient descent |
| keywords[4].id | https://openalex.org/keywords/high-memory |
| keywords[4].score | 0.38932734727859497 |
| keywords[4].display_name | High memory |
| keywords[5].id | https://openalex.org/keywords/transformer |
| keywords[5].score | 0.37439677119255066 |
| keywords[5].display_name | Transformer |
| keywords[6].id | https://openalex.org/keywords/sliding-window-protocol |
| keywords[6].score | 0.3638613820075989 |
| keywords[6].display_name | Sliding window protocol |
| keywords[7].id | https://openalex.org/keywords/overhead |
| keywords[7].score | 0.3511905372142792 |
| keywords[7].display_name | Overhead (engineering) |
| keywords[8].id | https://openalex.org/keywords/robustness |
| keywords[8].score | 0.34775876998901367 |
| keywords[8].display_name | Robustness (evolution) |
| language | |
| locations[0].id | pmh:oai:arXiv.org:2512.07782 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2512.07782 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2512.07782 |
| indexed_in | arxiv |
| authorships[0].author.id | https://openalex.org/A2360832783 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0559-8035 |
| authorships[0].author.display_name | Liu Jiaxu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Liu, Jiaxu |
| authorships[0].is_corresponding | True |
| authorships[1].author.id | https://openalex.org/A2391329361 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Bai Yuhe |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Bai, Yuhe |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A4227839638 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Bouganis, Christos-Savvas |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Bouganis, Christos-Savvas |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2512.07782 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-12-11T00:00:00 |
| display_name | GatedFWA: Linear Flash Windowed Attention with Gated Associative Memory |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-11T00:24:52.286860 |
| primary_topic.id | https://openalex.org/T10775 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.21674776077270508 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Generative Adversarial Networks and Image Synthesis |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | pmh:oai:arXiv.org:2512.07782 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2512.07782 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2512.07782 |
| primary_location.id | pmh:oai:arXiv.org:2512.07782 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2512.07782 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2512.07782 |
| publication_date | 2025-12-08 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 61, 84, 88, 98, 107, 113, 121 |
| abstract_inverted_index.In | 45, 80 |
| abstract_inverted_index.On | 130 |
| abstract_inverted_index.We | 58, 105 |
| abstract_inverted_index.an | 32 |
| abstract_inverted_index.as | 97, 156 |
| abstract_inverted_index.by | 25 |
| abstract_inverted_index.in | 11, 101 |
| abstract_inverted_index.it | 148 |
| abstract_inverted_index.of | 144 |
| abstract_inverted_index.on | 4 |
| abstract_inverted_index.to | 52, 92, 160 |
| abstract_inverted_index.I/O | 125 |
| abstract_inverted_index.NSA | 157 |
| abstract_inverted_index.and | 55, 75, 112, 127, 141, 147, 158 |
| abstract_inverted_index.but | 30 |
| abstract_inverted_index.its | 36 |
| abstract_inverted_index.the | 7, 27, 40, 93, 102, 118 |
| abstract_inverted_index.use | 143 |
| abstract_inverted_index.yet | 6 |
| abstract_inverted_index.SWAs | 69 |
| abstract_inverted_index.bias | 90 |
| abstract_inverted_index.flow | 78 |
| abstract_inverted_index.full | 9 |
| abstract_inverted_index.gate | 86, 110, 119 |
| abstract_inverted_index.into | 87 |
| abstract_inverted_index.rely | 3 |
| abstract_inverted_index.such | 155 |
| abstract_inverted_index.that | 67, 116 |
| abstract_inverted_index.with | 15, 138, 151 |
| abstract_inverted_index.(SWA) | 21 |
| abstract_inverted_index.added | 91 |
| abstract_inverted_index.decay | 89 |
| abstract_inverted_index.fused | 108 |
| abstract_inverted_index.mask, | 123 |
| abstract_inverted_index.token | 152 |
| abstract_inverted_index.under | 31, 120 |
| abstract_inverted_index.while | 71 |
| abstract_inverted_index.Modern | 0 |
| abstract_inverted_index.Window | 19 |
| abstract_inverted_index.acting | 96 |
| abstract_inverted_index.better | 142 |
| abstract_inverted_index.global | 145 |
| abstract_inverted_index.kernel | 115 |
| abstract_inverted_index.making | 76 |
| abstract_inverted_index.memory | 73, 103 |
| abstract_inverted_index.models | 2 |
| abstract_inverted_index.scales | 13 |
| abstract_inverted_index.update | 38 |
| abstract_inverted_index.Memory} | 34 |
| abstract_inverted_index.Sliding | 18 |
| abstract_inverted_index.Softmax | 8, 47 |
| abstract_inverted_index.cleanly | 150 |
| abstract_inverted_index.injects | 117 |
| abstract_inverted_index.leading | 51 |
| abstract_inverted_index.length. | 17 |
| abstract_inverted_index.logits, | 95 |
| abstract_inverted_index.methods | 154 |
| abstract_inverted_index.propose | 59 |
| abstract_inverted_index.renders | 39 |
| abstract_inverted_index.sliding | 122 |
| abstract_inverted_index.updates | 74 |
| abstract_inverted_index.various | 161 |
| abstract_inverted_index.GatedFWA | 82, 134 |
| abstract_inverted_index.achieves | 22 |
| abstract_inverted_index.context, | 146 |
| abstract_inverted_index.delivers | 135 |
| abstract_inverted_index.domains. | 163 |
| abstract_inverted_index.ensuring | 124 |
| abstract_inverted_index.essence, | 81 |
| abstract_inverted_index.gradient | 56, 77 |
| abstract_inverted_index.language | 131 |
| abstract_inverted_index.one-pass | 109 |
| abstract_inverted_index.overhead | 140 |
| abstract_inverted_index.pattern, | 29 |
| abstract_inverted_index.sequence | 16 |
| abstract_inverted_index.training | 41 |
| abstract_inverted_index.updates, | 50 |
| abstract_inverted_index.Attention | 20 |
| abstract_inverted_index.GatedFWA: | 60 |
| abstract_inverted_index.attention | 10, 28, 48, 94 |
| abstract_inverted_index.contrast, | 46 |
| abstract_inverted_index.implement | 106 |
| abstract_inverted_index.learnable | 99 |
| abstract_inverted_index.mechanism | 66 |
| abstract_inverted_index.modelling | 132 |
| abstract_inverted_index.numerical | 128 |
| abstract_inverted_index.objective | 42 |
| abstract_inverted_index.preserves | 68 |
| abstract_inverted_index.shrinkage | 54 |
| abstract_inverted_index.accumulate | 83 |
| abstract_inverted_index.attention, | 5 |
| abstract_inverted_index.efficiency | 70, 126 |
| abstract_inverted_index.integrates | 149 |
| abstract_inverted_index.negligible | 139 |
| abstract_inverted_index.normalizes | 49 |
| abstract_inverted_index.stability. | 129 |
| abstract_inverted_index.throughput | 137 |
| abstract_inverted_index.benchmarks, | 133 |
| abstract_inverted_index.competitive | 136 |
| abstract_inverted_index.contraction | 100 |
| abstract_inverted_index.effectively | 43 |
| abstract_inverted_index.generalizes | 159 |
| abstract_inverted_index.linear-time | 23 |
| abstract_inverted_index.recurrence. | 104 |
| abstract_inverted_index.stabilizing | 72 |
| abstract_inverted_index.vanishing}. | 57 |
| abstract_inverted_index.Transformers | 12 |
| abstract_inverted_index.\emph{memory | 53 |
| abstract_inverted_index.constraining | 26 |
| abstract_inverted_index.controllable. | 79 |
| abstract_inverted_index.preprocessing | 111 |
| abstract_inverted_index.quadratically | 14 |
| abstract_inverted_index.autoregressive | 1, 162 |
| abstract_inverted_index.per-token/head | 85 |
| abstract_inverted_index.interpretation, | 35 |
| abstract_inverted_index.difference-style | 37 |
| abstract_inverted_index.\emph{unbounded}. | 44 |
| abstract_inverted_index.encoding/decoding | 24 |
| abstract_inverted_index.(\underline{F}lash) | 63 |
| abstract_inverted_index.\textit{Associative | 33 |
| abstract_inverted_index.\underline{W}indowed | 64 |
| abstract_inverted_index.\underline{A}ttention | 65 |
| abstract_inverted_index.compression/selection | 153 |
| abstract_inverted_index.Memory-\underline{Gated} | 62 |
| abstract_inverted_index.FlashAttention-compatible | 114 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile.value | 0.81273503 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |