One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2508.06163
Model merging has emerged as a compelling data-free paradigm for multi-task learning, enabling the fusion of multiple fine-tuned models into a single, powerful entity. A key technique in merging methods is sparsification, which prunes redundant parameters from task vectors to mitigate interference. However, prevailing approaches employ a ``one-size-fits-all'' strategy, applying a uniform sparsity ratio that overlooks the inherent structural and statistical heterogeneity of model parameters. This often leads to a suboptimal trade-off, where critical parameters are inadvertently pruned while less useful ones are retained. To address this limitation, we introduce \textbf{TADrop} (\textbf{T}ensor-wise \textbf{A}daptive \textbf{Drop}), an adaptive sparsification strategy that respects this heterogeneity. Instead of a global ratio, TADrop assigns a tailored sparsity level to each parameter tensor based on its distributional properties. The core intuition is that tensors with denser, more redundant distributions can be pruned aggressively, while sparser, more critical ones are preserved. As a simple and plug-and-play module, we validate TADrop by integrating it with foundational, classic, and SOTA merging methods. Extensive experiments across diverse tasks (vision, language, and multimodal) and models (ViT, BEiT) demonstrate that TADrop consistently and significantly boosts their performance. For instance, when enhancing a leading merging method, it achieves an average performance gain of 2.0\% across 8 ViT-B/32 tasks. TADrop provides a more effective way to mitigate parameter interference by tailoring sparsification to the model's structure, offering a new baseline for high-performance model merging.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2508.06163
- https://arxiv.org/pdf/2508.06163
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415193068
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415193068Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2508.06163Digital Object Identifier
- Title
-
One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model MergingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-08-08Full publication date if available
- Authors
-
Yingfeng Luo, D. X. Lin, Junxin Wang, Ziqiang Xu, Kangkang Chang, Tong Zheng, Bei Li, A. Ma, Tong Xiao, Zhengtao Yu, Jingbo ZhuList of authors in order
- Landing page
-
https://arxiv.org/abs/2508.06163Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2508.06163Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2508.06163Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415193068 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2508.06163 |
| ids.doi | https://doi.org/10.48550/arxiv.2508.06163 |
| ids.openalex | https://openalex.org/W4415193068 |
| fwci | |
| type | preprint |
| title | One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11106 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9937999844551086 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1711 |
| topics[0].subfield.display_name | Signal Processing |
| topics[0].display_name | Data Management and Algorithms |
| topics[1].id | https://openalex.org/T10215 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9861999750137329 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Semantic Web and Ontologies |
| topics[2].id | https://openalex.org/T10317 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9854999780654907 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1705 |
| topics[2].subfield.display_name | Computer Networks and Communications |
| topics[2].display_name | Advanced Database Systems and Queries |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2508.06163 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2508.06163 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2508.06163 |
| locations[1].id | doi:10.48550/arxiv.2508.06163 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2508.06163 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5057019417 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-1950-9045 |
| authorships[0].author.display_name | Yingfeng Luo |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Luo, Yingfeng |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5025449982 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-2943-9343 |
| authorships[1].author.display_name | D. X. Lin |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Lin, Dingyang |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5031896048 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Junxin Wang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Wang, Junxin |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5003304226 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-8258-5342 |
| authorships[3].author.display_name | Ziqiang Xu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Xu, Ziqiang |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5047680764 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-4412-8308 |
| authorships[4].author.display_name | Kangkang Chang |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Chang, Kaiyan |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100955913 |
| authorships[5].author.orcid | https://orcid.org/0009-0005-1488-2846 |
| authorships[5].author.display_name | Tong Zheng |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Zheng, Tong |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5107028150 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-7617-9041 |
| authorships[6].author.display_name | Bei Li |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Li, Bei |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5112865344 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | A. Ma |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Ma, Anxiang |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5100600701 |
| authorships[8].author.orcid | https://orcid.org/0000-0002-5842-6501 |
| authorships[8].author.display_name | Tong Xiao |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Xiao, Tong |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5100619287 |
| authorships[9].author.orcid | https://orcid.org/0000-0002-4012-461X |
| authorships[9].author.display_name | Zhengtao Yu |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Yu, Zhengtao |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5100370145 |
| authorships[10].author.orcid | |
| authorships[10].author.display_name | Jingbo Zhu |
| authorships[10].author_position | last |
| authorships[10].raw_author_name | Zhu, Jingbo |
| authorships[10].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2508.06163 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-15T00:00:00 |
| display_name | One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11106 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9937999844551086 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1711 |
| primary_topic.subfield.display_name | Signal Processing |
| primary_topic.display_name | Data Management and Algorithms |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2508.06163 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2508.06163 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2508.06163 |
| primary_location.id | pmh:oai:arXiv.org:2508.06163 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2508.06163 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2508.06163 |
| publication_date | 2025-08-08 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.8 | 202 |
| abstract_inverted_index.A | 24 |
| abstract_inverted_index.a | 5, 20, 46, 50, 69, 104, 109, 145, 189, 207, 223 |
| abstract_inverted_index.As | 144 |
| abstract_inverted_index.To | 84 |
| abstract_inverted_index.an | 94, 195 |
| abstract_inverted_index.as | 4 |
| abstract_inverted_index.be | 134 |
| abstract_inverted_index.by | 153, 215 |
| abstract_inverted_index.in | 27 |
| abstract_inverted_index.is | 30, 125 |
| abstract_inverted_index.it | 155, 193 |
| abstract_inverted_index.of | 15, 62, 103, 199 |
| abstract_inverted_index.on | 118 |
| abstract_inverted_index.to | 39, 68, 113, 211, 218 |
| abstract_inverted_index.we | 88, 150 |
| abstract_inverted_index.For | 185 |
| abstract_inverted_index.The | 122 |
| abstract_inverted_index.and | 59, 147, 159, 170, 172, 180 |
| abstract_inverted_index.are | 75, 82, 142 |
| abstract_inverted_index.can | 133 |
| abstract_inverted_index.for | 9, 226 |
| abstract_inverted_index.has | 2 |
| abstract_inverted_index.its | 119 |
| abstract_inverted_index.key | 25 |
| abstract_inverted_index.new | 224 |
| abstract_inverted_index.the | 13, 56, 219 |
| abstract_inverted_index.way | 210 |
| abstract_inverted_index.SOTA | 160 |
| abstract_inverted_index.This | 65 |
| abstract_inverted_index.core | 123 |
| abstract_inverted_index.each | 114 |
| abstract_inverted_index.from | 36 |
| abstract_inverted_index.gain | 198 |
| abstract_inverted_index.into | 19 |
| abstract_inverted_index.less | 79 |
| abstract_inverted_index.more | 130, 139, 208 |
| abstract_inverted_index.ones | 81, 141 |
| abstract_inverted_index.task | 37 |
| abstract_inverted_index.that | 54, 98, 126, 177 |
| abstract_inverted_index.this | 86, 100 |
| abstract_inverted_index.when | 187 |
| abstract_inverted_index.with | 128, 156 |
| abstract_inverted_index.(ViT, | 174 |
| abstract_inverted_index.2.0\% | 200 |
| abstract_inverted_index.BEiT) | 175 |
| abstract_inverted_index.Model | 0 |
| abstract_inverted_index.based | 117 |
| abstract_inverted_index.leads | 67 |
| abstract_inverted_index.level | 112 |
| abstract_inverted_index.model | 63, 228 |
| abstract_inverted_index.often | 66 |
| abstract_inverted_index.ratio | 53 |
| abstract_inverted_index.tasks | 167 |
| abstract_inverted_index.their | 183 |
| abstract_inverted_index.where | 72 |
| abstract_inverted_index.which | 32 |
| abstract_inverted_index.while | 78, 137 |
| abstract_inverted_index.TADrop | 107, 152, 178, 205 |
| abstract_inverted_index.across | 165, 201 |
| abstract_inverted_index.boosts | 182 |
| abstract_inverted_index.employ | 45 |
| abstract_inverted_index.fusion | 14 |
| abstract_inverted_index.global | 105 |
| abstract_inverted_index.models | 18, 173 |
| abstract_inverted_index.pruned | 77, 135 |
| abstract_inverted_index.prunes | 33 |
| abstract_inverted_index.ratio, | 106 |
| abstract_inverted_index.simple | 146 |
| abstract_inverted_index.tasks. | 204 |
| abstract_inverted_index.tensor | 116 |
| abstract_inverted_index.useful | 80 |
| abstract_inverted_index.Instead | 102 |
| abstract_inverted_index.address | 85 |
| abstract_inverted_index.assigns | 108 |
| abstract_inverted_index.average | 196 |
| abstract_inverted_index.denser, | 129 |
| abstract_inverted_index.diverse | 166 |
| abstract_inverted_index.emerged | 3 |
| abstract_inverted_index.entity. | 23 |
| abstract_inverted_index.leading | 190 |
| abstract_inverted_index.merging | 1, 28, 161, 191 |
| abstract_inverted_index.method, | 192 |
| abstract_inverted_index.methods | 29 |
| abstract_inverted_index.model's | 220 |
| abstract_inverted_index.module, | 149 |
| abstract_inverted_index.single, | 21 |
| abstract_inverted_index.tensors | 127 |
| abstract_inverted_index.uniform | 51 |
| abstract_inverted_index.vectors | 38 |
| abstract_inverted_index.(vision, | 168 |
| abstract_inverted_index.However, | 42 |
| abstract_inverted_index.ViT-B/32 | 203 |
| abstract_inverted_index.achieves | 194 |
| abstract_inverted_index.adaptive | 95 |
| abstract_inverted_index.applying | 49 |
| abstract_inverted_index.baseline | 225 |
| abstract_inverted_index.classic, | 158 |
| abstract_inverted_index.critical | 73, 140 |
| abstract_inverted_index.enabling | 12 |
| abstract_inverted_index.inherent | 57 |
| abstract_inverted_index.merging. | 229 |
| abstract_inverted_index.methods. | 162 |
| abstract_inverted_index.mitigate | 40, 212 |
| abstract_inverted_index.multiple | 16 |
| abstract_inverted_index.offering | 222 |
| abstract_inverted_index.paradigm | 8 |
| abstract_inverted_index.powerful | 22 |
| abstract_inverted_index.provides | 206 |
| abstract_inverted_index.respects | 99 |
| abstract_inverted_index.sparser, | 138 |
| abstract_inverted_index.sparsity | 52, 111 |
| abstract_inverted_index.strategy | 97 |
| abstract_inverted_index.tailored | 110 |
| abstract_inverted_index.validate | 151 |
| abstract_inverted_index.Extensive | 163 |
| abstract_inverted_index.data-free | 7 |
| abstract_inverted_index.effective | 209 |
| abstract_inverted_index.enhancing | 188 |
| abstract_inverted_index.instance, | 186 |
| abstract_inverted_index.introduce | 89 |
| abstract_inverted_index.intuition | 124 |
| abstract_inverted_index.language, | 169 |
| abstract_inverted_index.learning, | 11 |
| abstract_inverted_index.overlooks | 55 |
| abstract_inverted_index.parameter | 115, 213 |
| abstract_inverted_index.redundant | 34, 131 |
| abstract_inverted_index.retained. | 83 |
| abstract_inverted_index.strategy, | 48 |
| abstract_inverted_index.tailoring | 216 |
| abstract_inverted_index.technique | 26 |
| abstract_inverted_index.approaches | 44 |
| abstract_inverted_index.compelling | 6 |
| abstract_inverted_index.fine-tuned | 17 |
| abstract_inverted_index.multi-task | 10 |
| abstract_inverted_index.parameters | 35, 74 |
| abstract_inverted_index.preserved. | 143 |
| abstract_inverted_index.prevailing | 43 |
| abstract_inverted_index.structural | 58 |
| abstract_inverted_index.structure, | 221 |
| abstract_inverted_index.suboptimal | 70 |
| abstract_inverted_index.trade-off, | 71 |
| abstract_inverted_index.demonstrate | 176 |
| abstract_inverted_index.experiments | 164 |
| abstract_inverted_index.integrating | 154 |
| abstract_inverted_index.limitation, | 87 |
| abstract_inverted_index.multimodal) | 171 |
| abstract_inverted_index.parameters. | 64 |
| abstract_inverted_index.performance | 197 |
| abstract_inverted_index.properties. | 121 |
| abstract_inverted_index.statistical | 60 |
| abstract_inverted_index.consistently | 179 |
| abstract_inverted_index.interference | 214 |
| abstract_inverted_index.performance. | 184 |
| abstract_inverted_index.aggressively, | 136 |
| abstract_inverted_index.distributions | 132 |
| abstract_inverted_index.foundational, | 157 |
| abstract_inverted_index.heterogeneity | 61 |
| abstract_inverted_index.inadvertently | 76 |
| abstract_inverted_index.interference. | 41 |
| abstract_inverted_index.plug-and-play | 148 |
| abstract_inverted_index.significantly | 181 |
| abstract_inverted_index.distributional | 120 |
| abstract_inverted_index.heterogeneity. | 101 |
| abstract_inverted_index.sparsification | 96, 217 |
| abstract_inverted_index.\textbf{Drop}), | 93 |
| abstract_inverted_index.\textbf{TADrop} | 90 |
| abstract_inverted_index.sparsification, | 31 |
| abstract_inverted_index.high-performance | 227 |
| abstract_inverted_index.\textbf{A}daptive | 92 |
| abstract_inverted_index.(\textbf{T}ensor-wise | 91 |
| abstract_inverted_index.``one-size-fits-all'' | 47 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 11 |
| citation_normalized_percentile |