Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2506.05977
Federated fine-tuning (FedFT) of large language models (LLMs) has emerged as a promising solution for adapting models to distributed data environments while ensuring data privacy. Existing FedFT methods predominantly utilize parameter-efficient fine-tuning (PEFT) techniques to reduce communication and computation overhead. However, they often fail to adequately address the catastrophic forgetting, a critical challenge arising from continual adaptation in distributed environments. The traditional centralized fine-tuning methods, which are not designed for the heterogeneous and privacy-constrained nature of federated environments, struggle to mitigate this issue effectively. Moreover, the challenge is further exacerbated by significant variation in data distributions and device capabilities across clients, which leads to intensified forgetting and degraded model generalization. To tackle these issues, we propose FedBE, a novel FedFT framework that integrates an adaptive transformer block expansion mechanism with a dynamic trainable-block allocation strategy. Specifically, FedBE expands trainable blocks within the model architecture, structurally separating newly learned task-specific knowledge from the original pre-trained representations. Additionally, FedBE dynamically assigns these trainable blocks to clients based on their data distributions and computational capabilities. This enables the framework to better accommodate heterogeneous federated environments and enhances the generalization ability of the model.Extensive experiments show that compared with existing federated fine-tuning methods, FedBE achieves 12-74% higher accuracy retention on general tasks after fine-tuning and a model convergence acceleration ratio of 1.9-3.1x without degrading the accuracy of downstream tasks.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2506.05977
- https://arxiv.org/pdf/2506.05977
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4417097558
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4417097558Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2506.05977Digital Object Identifier
- Title
-
Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-TuningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-06-06Full publication date if available
- Authors
-
Hongli Xu, Shilong Wang, Liusheng HuangList of authors in order
- Landing page
-
https://arxiv.org/abs/2506.05977Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2506.05977Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2506.05977Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4417097558 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2506.05977 |
| ids.doi | https://doi.org/10.48550/arxiv.2506.05977 |
| ids.openalex | https://openalex.org/W4417097558 |
| fwci | |
| type | preprint |
| title | Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2506.05977 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2506.05977 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2506.05977 |
| locations[1].id | doi:10.48550/arxiv.2506.05977 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2506.05977 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5063184427 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-3831-4577 |
| authorships[0].author.display_name | Hongli Xu |
| authorships[0].author_position | last |
| authorships[0].raw_author_name | Xu, Hongli |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100633954 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-6086-0481 |
| authorships[1].author.display_name | Shilong Wang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wang, Shilong |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5019604942 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-8417-3256 |
| authorships[2].author.display_name | Liusheng Huang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Huang, Liusheng |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2506.05977 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-07T21:48:28.726393 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2506.05977 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2506.05977 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2506.05977 |
| primary_location.id | pmh:oai:arXiv.org:2506.05977 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2506.05977 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2506.05977 |
| publication_date | 2025-06-06 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 11, 50, 117, 130, 211 |
| abstract_inverted_index.To | 110 |
| abstract_inverted_index.an | 123 |
| abstract_inverted_index.as | 10 |
| abstract_inverted_index.by | 90 |
| abstract_inverted_index.in | 57, 93 |
| abstract_inverted_index.is | 87 |
| abstract_inverted_index.of | 3, 75, 187, 216, 222 |
| abstract_inverted_index.on | 165, 205 |
| abstract_inverted_index.to | 17, 34, 44, 79, 103, 162, 176 |
| abstract_inverted_index.we | 114 |
| abstract_inverted_index.The | 60 |
| abstract_inverted_index.and | 37, 72, 96, 106, 169, 182, 210 |
| abstract_inverted_index.are | 66 |
| abstract_inverted_index.for | 14, 69 |
| abstract_inverted_index.has | 8 |
| abstract_inverted_index.not | 67 |
| abstract_inverted_index.the | 47, 70, 85, 141, 151, 174, 184, 188, 220 |
| abstract_inverted_index.This | 172 |
| abstract_inverted_index.data | 19, 23, 94, 167 |
| abstract_inverted_index.fail | 43 |
| abstract_inverted_index.from | 54, 150 |
| abstract_inverted_index.show | 191 |
| abstract_inverted_index.that | 121, 192 |
| abstract_inverted_index.they | 41 |
| abstract_inverted_index.this | 81 |
| abstract_inverted_index.with | 129, 194 |
| abstract_inverted_index.FedBE | 136, 156, 199 |
| abstract_inverted_index.FedFT | 26, 119 |
| abstract_inverted_index.after | 208 |
| abstract_inverted_index.based | 164 |
| abstract_inverted_index.block | 126 |
| abstract_inverted_index.issue | 82 |
| abstract_inverted_index.large | 4 |
| abstract_inverted_index.leads | 102 |
| abstract_inverted_index.model | 108, 142, 212 |
| abstract_inverted_index.newly | 146 |
| abstract_inverted_index.novel | 118 |
| abstract_inverted_index.often | 42 |
| abstract_inverted_index.ratio | 215 |
| abstract_inverted_index.tasks | 207 |
| abstract_inverted_index.their | 166 |
| abstract_inverted_index.these | 112, 159 |
| abstract_inverted_index.which | 65, 101 |
| abstract_inverted_index.while | 21 |
| abstract_inverted_index.(LLMs) | 7 |
| abstract_inverted_index.(PEFT) | 32 |
| abstract_inverted_index.12-74% | 201 |
| abstract_inverted_index.FedBE, | 116 |
| abstract_inverted_index.across | 99 |
| abstract_inverted_index.better | 177 |
| abstract_inverted_index.blocks | 139, 161 |
| abstract_inverted_index.device | 97 |
| abstract_inverted_index.higher | 202 |
| abstract_inverted_index.models | 6, 16 |
| abstract_inverted_index.nature | 74 |
| abstract_inverted_index.reduce | 35 |
| abstract_inverted_index.tackle | 111 |
| abstract_inverted_index.tasks. | 224 |
| abstract_inverted_index.within | 140 |
| abstract_inverted_index.(FedFT) | 2 |
| abstract_inverted_index.ability | 186 |
| abstract_inverted_index.address | 46 |
| abstract_inverted_index.arising | 53 |
| abstract_inverted_index.assigns | 158 |
| abstract_inverted_index.clients | 163 |
| abstract_inverted_index.dynamic | 131 |
| abstract_inverted_index.emerged | 9 |
| abstract_inverted_index.enables | 173 |
| abstract_inverted_index.expands | 137 |
| abstract_inverted_index.further | 88 |
| abstract_inverted_index.general | 206 |
| abstract_inverted_index.issues, | 113 |
| abstract_inverted_index.learned | 147 |
| abstract_inverted_index.methods | 27 |
| abstract_inverted_index.propose | 115 |
| abstract_inverted_index.utilize | 29 |
| abstract_inverted_index.without | 218 |
| abstract_inverted_index.1.9-3.1x | 217 |
| abstract_inverted_index.Existing | 25 |
| abstract_inverted_index.However, | 40 |
| abstract_inverted_index.accuracy | 203, 221 |
| abstract_inverted_index.achieves | 200 |
| abstract_inverted_index.adapting | 15 |
| abstract_inverted_index.adaptive | 124 |
| abstract_inverted_index.clients, | 100 |
| abstract_inverted_index.compared | 193 |
| abstract_inverted_index.critical | 51 |
| abstract_inverted_index.degraded | 107 |
| abstract_inverted_index.designed | 68 |
| abstract_inverted_index.enhances | 183 |
| abstract_inverted_index.ensuring | 22 |
| abstract_inverted_index.existing | 195 |
| abstract_inverted_index.language | 5 |
| abstract_inverted_index.methods, | 64, 198 |
| abstract_inverted_index.mitigate | 80 |
| abstract_inverted_index.original | 152 |
| abstract_inverted_index.privacy. | 24 |
| abstract_inverted_index.solution | 13 |
| abstract_inverted_index.struggle | 78 |
| abstract_inverted_index.Federated | 0 |
| abstract_inverted_index.Moreover, | 84 |
| abstract_inverted_index.challenge | 52, 86 |
| abstract_inverted_index.continual | 55 |
| abstract_inverted_index.degrading | 219 |
| abstract_inverted_index.expansion | 127 |
| abstract_inverted_index.federated | 76, 180, 196 |
| abstract_inverted_index.framework | 120, 175 |
| abstract_inverted_index.knowledge | 149 |
| abstract_inverted_index.mechanism | 128 |
| abstract_inverted_index.overhead. | 39 |
| abstract_inverted_index.promising | 12 |
| abstract_inverted_index.retention | 204 |
| abstract_inverted_index.strategy. | 134 |
| abstract_inverted_index.trainable | 138, 160 |
| abstract_inverted_index.variation | 92 |
| abstract_inverted_index.adaptation | 56 |
| abstract_inverted_index.adequately | 45 |
| abstract_inverted_index.allocation | 133 |
| abstract_inverted_index.downstream | 223 |
| abstract_inverted_index.forgetting | 105 |
| abstract_inverted_index.integrates | 122 |
| abstract_inverted_index.separating | 145 |
| abstract_inverted_index.techniques | 33 |
| abstract_inverted_index.accommodate | 178 |
| abstract_inverted_index.centralized | 62 |
| abstract_inverted_index.computation | 38 |
| abstract_inverted_index.convergence | 213 |
| abstract_inverted_index.distributed | 18, 58 |
| abstract_inverted_index.dynamically | 157 |
| abstract_inverted_index.exacerbated | 89 |
| abstract_inverted_index.experiments | 190 |
| abstract_inverted_index.fine-tuning | 1, 31, 63, 197, 209 |
| abstract_inverted_index.forgetting, | 49 |
| abstract_inverted_index.intensified | 104 |
| abstract_inverted_index.pre-trained | 153 |
| abstract_inverted_index.significant | 91 |
| abstract_inverted_index.traditional | 61 |
| abstract_inverted_index.transformer | 125 |
| abstract_inverted_index.acceleration | 214 |
| abstract_inverted_index.capabilities | 98 |
| abstract_inverted_index.catastrophic | 48 |
| abstract_inverted_index.effectively. | 83 |
| abstract_inverted_index.environments | 20, 181 |
| abstract_inverted_index.structurally | 144 |
| abstract_inverted_index.Additionally, | 155 |
| abstract_inverted_index.Specifically, | 135 |
| abstract_inverted_index.architecture, | 143 |
| abstract_inverted_index.capabilities. | 171 |
| abstract_inverted_index.communication | 36 |
| abstract_inverted_index.computational | 170 |
| abstract_inverted_index.distributions | 95, 168 |
| abstract_inverted_index.environments, | 77 |
| abstract_inverted_index.environments. | 59 |
| abstract_inverted_index.heterogeneous | 71, 179 |
| abstract_inverted_index.predominantly | 28 |
| abstract_inverted_index.task-specific | 148 |
| abstract_inverted_index.generalization | 185 |
| abstract_inverted_index.generalization. | 109 |
| abstract_inverted_index.model.Extensive | 189 |
| abstract_inverted_index.trainable-block | 132 |
| abstract_inverted_index.representations. | 154 |
| abstract_inverted_index.parameter-efficient | 30 |
| abstract_inverted_index.privacy-constrained | 73 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |