KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2504.09936
Efficient inference of large language models (LLMs) is hindered by an ever-growing key-value (KV) cache, making KV cache compression a critical research direction. Traditional methods selectively evict less important KV cache entries, which leads to information loss and hallucinations. Recently, merging-based strategies have been explored to retain more information by merging KV pairs that would be discarded; however, these existing approaches inevitably introduce inconsistencies in attention distributions before and after merging, causing degraded generation quality. To overcome this challenge, we propose KeepKV, a novel adaptive KV cache merging method designed to preserve performance under strict memory constraints, achieving single-step lossless compression and providing error bounds for multi-step compression. KeepKV introduces the Electoral Votes mechanism that records merging history and adaptively adjusts attention scores. Moreover, it further leverages a novel Zero Inference-Perturbation Merging method, compensating for attention loss resulting from cache merging. Extensive experiments on various benchmarks and LLM architectures demonstrate that KeepKV substantially reduces memory usage while successfully retaining essential context information, achieving over 2x inference throughput improvement and maintaining superior generation quality even with only 10% KV cache budgets.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2504.09936
- https://arxiv.org/pdf/2504.09936
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415158961
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415158961Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2504.09936Digital Object Identifier
- Title
-
KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM InferenceWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-04-14Full publication date if available
- Authors
-
Ye Tian, Zihan Wang, Yajun Peng, Ao Yuan, Zhu Wang, Bairen Yi, Xin Liu, Yong Cui, Tong YangList of authors in order
- Landing page
-
https://arxiv.org/abs/2504.09936Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2504.09936Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2504.09936Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415158961 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2504.09936 |
| ids.doi | https://doi.org/10.48550/arxiv.2504.09936 |
| ids.openalex | https://openalex.org/W4415158961 |
| fwci | |
| type | preprint |
| title | KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9969000220298767 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1705 |
| topics[0].subfield.display_name | Computer Networks and Communications |
| topics[0].display_name | Advanced Data Storage Technologies |
| topics[1].id | https://openalex.org/T11269 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9833999872207642 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Algorithms and Data Compression |
| topics[2].id | https://openalex.org/T10829 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9801999926567078 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1705 |
| topics[2].subfield.display_name | Computer Networks and Communications |
| topics[2].display_name | Interconnection Networks and Systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2504.09936 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2504.09936 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2504.09936 |
| locations[1].id | doi:10.48550/arxiv.2504.09936 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2504.09936 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5084754062 |
| authorships[0].author.orcid | https://orcid.org/0009-0003-5474-9156 |
| authorships[0].author.display_name | Ye Tian |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Tian, Yuxuan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5065878957 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-9952-9913 |
| authorships[1].author.display_name | Zihan Wang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wang, Zihan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5101678904 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-6455-2873 |
| authorships[2].author.display_name | Yajun Peng |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Peng, Yebo |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5005432115 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-8558-5604 |
| authorships[3].author.display_name | Ao Yuan |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Yuan, Aomufei |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5008727951 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-7821-8574 |
| authorships[4].author.display_name | Zhu Wang |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Wang, Zhiming |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5013585052 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Bairen Yi |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Yi, Bairen |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5100352248 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-5865-9970 |
| authorships[6].author.display_name | Xin Liu |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Liu, Xin |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5100731690 |
| authorships[7].author.orcid | https://orcid.org/0000-0003-0281-1440 |
| authorships[7].author.display_name | Yong Cui |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Cui, Yong |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5115597097 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Tong Yang |
| authorships[8].author_position | last |
| authorships[8].raw_author_name | Yang, Tong |
| authorships[8].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2504.09936 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-14T00:00:00 |
| display_name | KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-03T00:04:00.142953 |
| primary_topic.id | https://openalex.org/T11181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9969000220298767 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1705 |
| primary_topic.subfield.display_name | Computer Networks and Communications |
| primary_topic.display_name | Advanced Data Storage Technologies |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2504.09936 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2504.09936 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2504.09936 |
| primary_location.id | pmh:oai:arXiv.org:2504.09936 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2504.09936 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2504.09936 |
| publication_date | 2025-04-14 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 19, 82, 127 |
| abstract_inverted_index.2x | 164 |
| abstract_inverted_index.KV | 16, 29, 51, 85, 177 |
| abstract_inverted_index.To | 75 |
| abstract_inverted_index.an | 10 |
| abstract_inverted_index.be | 55 |
| abstract_inverted_index.by | 9, 49 |
| abstract_inverted_index.in | 64 |
| abstract_inverted_index.is | 7 |
| abstract_inverted_index.it | 124 |
| abstract_inverted_index.of | 2 |
| abstract_inverted_index.on | 143 |
| abstract_inverted_index.to | 34, 45, 90 |
| abstract_inverted_index.we | 79 |
| abstract_inverted_index.10% | 176 |
| abstract_inverted_index.LLM | 147 |
| abstract_inverted_index.and | 37, 68, 101, 118, 146, 168 |
| abstract_inverted_index.for | 105, 134 |
| abstract_inverted_index.the | 110 |
| abstract_inverted_index.(KV) | 13 |
| abstract_inverted_index.Zero | 129 |
| abstract_inverted_index.been | 43 |
| abstract_inverted_index.even | 173 |
| abstract_inverted_index.from | 138 |
| abstract_inverted_index.have | 42 |
| abstract_inverted_index.less | 27 |
| abstract_inverted_index.loss | 36, 136 |
| abstract_inverted_index.more | 47 |
| abstract_inverted_index.only | 175 |
| abstract_inverted_index.over | 163 |
| abstract_inverted_index.that | 53, 114, 150 |
| abstract_inverted_index.this | 77 |
| abstract_inverted_index.with | 174 |
| abstract_inverted_index.Votes | 112 |
| abstract_inverted_index.after | 69 |
| abstract_inverted_index.cache | 17, 30, 86, 139, 178 |
| abstract_inverted_index.error | 103 |
| abstract_inverted_index.evict | 26 |
| abstract_inverted_index.large | 3 |
| abstract_inverted_index.leads | 33 |
| abstract_inverted_index.novel | 83, 128 |
| abstract_inverted_index.pairs | 52 |
| abstract_inverted_index.these | 58 |
| abstract_inverted_index.under | 93 |
| abstract_inverted_index.usage | 155 |
| abstract_inverted_index.which | 32 |
| abstract_inverted_index.while | 156 |
| abstract_inverted_index.would | 54 |
| abstract_inverted_index.(LLMs) | 6 |
| abstract_inverted_index.KeepKV | 108, 151 |
| abstract_inverted_index.before | 67 |
| abstract_inverted_index.bounds | 104 |
| abstract_inverted_index.cache, | 14 |
| abstract_inverted_index.making | 15 |
| abstract_inverted_index.memory | 95, 154 |
| abstract_inverted_index.method | 88 |
| abstract_inverted_index.models | 5 |
| abstract_inverted_index.retain | 46 |
| abstract_inverted_index.strict | 94 |
| abstract_inverted_index.KeepKV, | 81 |
| abstract_inverted_index.Merging | 131 |
| abstract_inverted_index.adjusts | 120 |
| abstract_inverted_index.causing | 71 |
| abstract_inverted_index.context | 160 |
| abstract_inverted_index.further | 125 |
| abstract_inverted_index.history | 117 |
| abstract_inverted_index.merging | 50, 87, 116 |
| abstract_inverted_index.method, | 132 |
| abstract_inverted_index.methods | 24 |
| abstract_inverted_index.propose | 80 |
| abstract_inverted_index.quality | 172 |
| abstract_inverted_index.records | 115 |
| abstract_inverted_index.reduces | 153 |
| abstract_inverted_index.scores. | 122 |
| abstract_inverted_index.various | 144 |
| abstract_inverted_index.adaptive | 84 |
| abstract_inverted_index.budgets. | 179 |
| abstract_inverted_index.critical | 20 |
| abstract_inverted_index.degraded | 72 |
| abstract_inverted_index.designed | 89 |
| abstract_inverted_index.entries, | 31 |
| abstract_inverted_index.existing | 59 |
| abstract_inverted_index.explored | 44 |
| abstract_inverted_index.hindered | 8 |
| abstract_inverted_index.however, | 57 |
| abstract_inverted_index.language | 4 |
| abstract_inverted_index.lossless | 99 |
| abstract_inverted_index.merging, | 70 |
| abstract_inverted_index.merging. | 140 |
| abstract_inverted_index.overcome | 76 |
| abstract_inverted_index.preserve | 91 |
| abstract_inverted_index.quality. | 74 |
| abstract_inverted_index.research | 21 |
| abstract_inverted_index.superior | 170 |
| abstract_inverted_index.Efficient | 0 |
| abstract_inverted_index.Electoral | 111 |
| abstract_inverted_index.Extensive | 141 |
| abstract_inverted_index.Moreover, | 123 |
| abstract_inverted_index.Recently, | 39 |
| abstract_inverted_index.achieving | 97, 162 |
| abstract_inverted_index.attention | 65, 121, 135 |
| abstract_inverted_index.essential | 159 |
| abstract_inverted_index.important | 28 |
| abstract_inverted_index.inference | 1, 165 |
| abstract_inverted_index.introduce | 62 |
| abstract_inverted_index.key-value | 12 |
| abstract_inverted_index.leverages | 126 |
| abstract_inverted_index.mechanism | 113 |
| abstract_inverted_index.providing | 102 |
| abstract_inverted_index.resulting | 137 |
| abstract_inverted_index.retaining | 158 |
| abstract_inverted_index.adaptively | 119 |
| abstract_inverted_index.approaches | 60 |
| abstract_inverted_index.benchmarks | 145 |
| abstract_inverted_index.challenge, | 78 |
| abstract_inverted_index.direction. | 22 |
| abstract_inverted_index.discarded; | 56 |
| abstract_inverted_index.generation | 73, 171 |
| abstract_inverted_index.inevitably | 61 |
| abstract_inverted_index.introduces | 109 |
| abstract_inverted_index.multi-step | 106 |
| abstract_inverted_index.strategies | 41 |
| abstract_inverted_index.throughput | 166 |
| abstract_inverted_index.Traditional | 23 |
| abstract_inverted_index.compression | 18, 100 |
| abstract_inverted_index.demonstrate | 149 |
| abstract_inverted_index.experiments | 142 |
| abstract_inverted_index.improvement | 167 |
| abstract_inverted_index.information | 35, 48 |
| abstract_inverted_index.maintaining | 169 |
| abstract_inverted_index.performance | 92 |
| abstract_inverted_index.selectively | 25 |
| abstract_inverted_index.single-step | 98 |
| abstract_inverted_index.compensating | 133 |
| abstract_inverted_index.compression. | 107 |
| abstract_inverted_index.constraints, | 96 |
| abstract_inverted_index.ever-growing | 11 |
| abstract_inverted_index.information, | 161 |
| abstract_inverted_index.successfully | 157 |
| abstract_inverted_index.architectures | 148 |
| abstract_inverted_index.distributions | 66 |
| abstract_inverted_index.merging-based | 40 |
| abstract_inverted_index.substantially | 152 |
| abstract_inverted_index.hallucinations. | 38 |
| abstract_inverted_index.inconsistencies | 63 |
| abstract_inverted_index.Inference-Perturbation | 130 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 9 |
| citation_normalized_percentile |