Towards Efficient Low-Order Hybrid Optimizer for Language Model Fine-Tuning Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.1609/aaai.v39i22.34530
As the size of language models notably grows, fine-tuning the models becomes more challenging: fine-tuning with first-order optimizers (e.g., SGD and Adam) requires high memory consumption, while fine-tuning with a memory-efficient zeroth-order optimizer (MeZO) has a significant accuracy drop and slower convergence rate. In this work, we propose a Low order Hybrid Optimizer (LoHO) which merges zeroth-order (ZO) and first-order (FO) optimizers for fine-tuning. LoHO is empowered with inter-layer hybrid optimization and intra-layer hybrid optimization, which boosts the accuracy of MeZO while keeping memory usage within a budget. The inter-layer hybrid optimization exploits the FO optimizer in deep layers and the ZO optimizer in shallow ones, therefore avoiding unnecessary gradient propagation to improve memory efficiency. The intra-layer hybrid optimization updates a proportion of parameters in a layer by the ZO optimizer, and the rest by the FO optimizer, taking advantage of gradient sparsity for high efficiency implementation. Our experimental results across common datasets on different pre-trained backbones (i.e., RoBERTa-large, OPT-13B and OPT-30B) demonstrate that LoHO can significantly improve the predictive accuracy and convergence rate of MeZO, while controlling the memory footprint during fine-tuning. Moreover, LoHO can achieve comparable performance with first-order fine-tuning using substantially fewer memory resources.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1609/aaai.v39i22.34530
- https://ojs.aaai.org/index.php/AAAI/article/download/34530/36685
- OA Status
- diamond
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4409362829
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4409362829Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1609/aaai.v39i22.34530Digital Object Identifier
- Title
-
Towards Efficient Low-Order Hybrid Optimizer for Language Model Fine-TuningWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-04-11Full publication date if available
- Authors
-
Minping Chen, Y.S. Huang, Zeyi WenList of authors in order
- Landing page
-
https://doi.org/10.1609/aaai.v39i22.34530Publisher landing page
- PDF URL
-
https://ojs.aaai.org/index.php/AAAI/article/download/34530/36685Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
diamondOpen access status per OpenAlex
- OA URL
-
https://ojs.aaai.org/index.php/AAAI/article/download/34530/36685Direct OA link when available
- Concepts
-
Order (exchange), Computer science, Fine-tuning, Physics, Economics, Quantum mechanics, FinanceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4409362829 |
|---|---|
| doi | https://doi.org/10.1609/aaai.v39i22.34530 |
| ids.doi | https://doi.org/10.1609/aaai.v39i22.34530 |
| ids.openalex | https://openalex.org/W4409362829 |
| fwci | 0.0 |
| type | article |
| title | Towards Efficient Low-Order Hybrid Optimizer for Language Model Fine-Tuning |
| biblio.issue | 22 |
| biblio.volume | 39 |
| biblio.last_page | 23613 |
| biblio.first_page | 23605 |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9013000130653381 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C182306322 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6009076237678528 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1779371 |
| concepts[0].display_name | Order (exchange) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5625813603401184 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C157524613 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5361974239349365 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q2828883 |
| concepts[2].display_name | Fine-tuning |
| concepts[3].id | https://openalex.org/C121332964 |
| concepts[3].level | 0 |
| concepts[3].score | 0.09508207440376282 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[3].display_name | Physics |
| concepts[4].id | https://openalex.org/C162324750 |
| concepts[4].level | 0 |
| concepts[4].score | 0.07538777589797974 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[4].display_name | Economics |
| concepts[5].id | https://openalex.org/C62520636 |
| concepts[5].level | 1 |
| concepts[5].score | 0.0 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[5].display_name | Quantum mechanics |
| concepts[6].id | https://openalex.org/C10138342 |
| concepts[6].level | 1 |
| concepts[6].score | 0.0 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q43015 |
| concepts[6].display_name | Finance |
| keywords[0].id | https://openalex.org/keywords/order |
| keywords[0].score | 0.6009076237678528 |
| keywords[0].display_name | Order (exchange) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5625813603401184 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/fine-tuning |
| keywords[2].score | 0.5361974239349365 |
| keywords[2].display_name | Fine-tuning |
| keywords[3].id | https://openalex.org/keywords/physics |
| keywords[3].score | 0.09508207440376282 |
| keywords[3].display_name | Physics |
| keywords[4].id | https://openalex.org/keywords/economics |
| keywords[4].score | 0.07538777589797974 |
| keywords[4].display_name | Economics |
| language | en |
| locations[0].id | doi:10.1609/aaai.v39i22.34530 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4210191458 |
| locations[0].source.issn | 2159-5399, 2374-3468 |
| locations[0].source.type | conference |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 2159-5399 |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].source.host_organization | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310320058 |
| locations[0].source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| locations[0].license | |
| locations[0].pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/34530/36685 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| locations[0].landing_page_url | https://doi.org/10.1609/aaai.v39i22.34530 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5062234564 |
| authorships[0].author.orcid | https://orcid.org/0009-0007-1125-8049 |
| authorships[0].author.display_name | Minping Chen |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Minping Chen |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101484853 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-9739-0727 |
| authorships[1].author.display_name | Y.S. Huang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | You-Liang Huang |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5013127195 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-3370-6053 |
| authorships[2].author.display_name | Zeyi Wen |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Zeyi Wen |
| authorships[2].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://ojs.aaai.org/index.php/AAAI/article/download/34530/36685 |
| open_access.oa_status | diamond |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Towards Efficient Low-Order Hybrid Optimizer for Language Model Fine-Tuning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9013000130653381 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1609/aaai.v39i22.34530 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4210191458 |
| best_oa_location.source.issn | 2159-5399, 2374-3468 |
| best_oa_location.source.type | conference |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 2159-5399 |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.source.host_organization | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| best_oa_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/34530/36685 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| best_oa_location.landing_page_url | https://doi.org/10.1609/aaai.v39i22.34530 |
| primary_location.id | doi:10.1609/aaai.v39i22.34530 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4210191458 |
| primary_location.source.issn | 2159-5399, 2374-3468 |
| primary_location.source.type | conference |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 2159-5399 |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.source.host_organization | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_name | Association for the Advancement of Artificial Intelligence |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310320058 |
| primary_location.source.host_organization_lineage_names | Association for the Advancement of Artificial Intelligence |
| primary_location.license | |
| primary_location.pdf_url | https://ojs.aaai.org/index.php/AAAI/article/download/34530/36685 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence |
| primary_location.landing_page_url | https://doi.org/10.1609/aaai.v39i22.34530 |
| publication_date | 2025-04-11 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 29, 35, 48, 86, 120, 125 |
| abstract_inverted_index.As | 0 |
| abstract_inverted_index.FO | 94, 136 |
| abstract_inverted_index.In | 43 |
| abstract_inverted_index.ZO | 101, 129 |
| abstract_inverted_index.by | 127, 134 |
| abstract_inverted_index.in | 96, 103, 124 |
| abstract_inverted_index.is | 65 |
| abstract_inverted_index.of | 3, 79, 122, 140, 174 |
| abstract_inverted_index.on | 153 |
| abstract_inverted_index.to | 111 |
| abstract_inverted_index.we | 46 |
| abstract_inverted_index.Low | 49 |
| abstract_inverted_index.Our | 147 |
| abstract_inverted_index.SGD | 19 |
| abstract_inverted_index.The | 88, 115 |
| abstract_inverted_index.and | 20, 39, 58, 71, 99, 131, 160, 171 |
| abstract_inverted_index.can | 165, 185 |
| abstract_inverted_index.for | 62, 143 |
| abstract_inverted_index.has | 34 |
| abstract_inverted_index.the | 1, 9, 77, 93, 100, 128, 132, 135, 168, 178 |
| abstract_inverted_index.(FO) | 60 |
| abstract_inverted_index.(ZO) | 57 |
| abstract_inverted_index.LoHO | 64, 164, 184 |
| abstract_inverted_index.MeZO | 80 |
| abstract_inverted_index.deep | 97 |
| abstract_inverted_index.drop | 38 |
| abstract_inverted_index.high | 23, 144 |
| abstract_inverted_index.more | 12 |
| abstract_inverted_index.rate | 173 |
| abstract_inverted_index.rest | 133 |
| abstract_inverted_index.size | 2 |
| abstract_inverted_index.that | 163 |
| abstract_inverted_index.this | 44 |
| abstract_inverted_index.with | 15, 28, 67, 189 |
| abstract_inverted_index.Adam) | 21 |
| abstract_inverted_index.MeZO, | 175 |
| abstract_inverted_index.fewer | 194 |
| abstract_inverted_index.layer | 126 |
| abstract_inverted_index.ones, | 105 |
| abstract_inverted_index.order | 50 |
| abstract_inverted_index.rate. | 42 |
| abstract_inverted_index.usage | 84 |
| abstract_inverted_index.using | 192 |
| abstract_inverted_index.which | 54, 75 |
| abstract_inverted_index.while | 26, 81, 176 |
| abstract_inverted_index.work, | 45 |
| abstract_inverted_index.(LoHO) | 53 |
| abstract_inverted_index.(MeZO) | 33 |
| abstract_inverted_index.(e.g., | 18 |
| abstract_inverted_index.(i.e., | 157 |
| abstract_inverted_index.Hybrid | 51 |
| abstract_inverted_index.across | 150 |
| abstract_inverted_index.boosts | 76 |
| abstract_inverted_index.common | 151 |
| abstract_inverted_index.during | 181 |
| abstract_inverted_index.grows, | 7 |
| abstract_inverted_index.hybrid | 69, 73, 90, 117 |
| abstract_inverted_index.layers | 98 |
| abstract_inverted_index.memory | 24, 83, 113, 179, 195 |
| abstract_inverted_index.merges | 55 |
| abstract_inverted_index.models | 5, 10 |
| abstract_inverted_index.slower | 40 |
| abstract_inverted_index.taking | 138 |
| abstract_inverted_index.within | 85 |
| abstract_inverted_index.OPT-13B | 159 |
| abstract_inverted_index.achieve | 186 |
| abstract_inverted_index.becomes | 11 |
| abstract_inverted_index.budget. | 87 |
| abstract_inverted_index.improve | 112, 167 |
| abstract_inverted_index.keeping | 82 |
| abstract_inverted_index.notably | 6 |
| abstract_inverted_index.propose | 47 |
| abstract_inverted_index.results | 149 |
| abstract_inverted_index.shallow | 104 |
| abstract_inverted_index.updates | 119 |
| abstract_inverted_index.OPT-30B) | 161 |
| abstract_inverted_index.accuracy | 37, 78, 170 |
| abstract_inverted_index.avoiding | 107 |
| abstract_inverted_index.datasets | 152 |
| abstract_inverted_index.exploits | 92 |
| abstract_inverted_index.gradient | 109, 141 |
| abstract_inverted_index.language | 4 |
| abstract_inverted_index.requires | 22 |
| abstract_inverted_index.sparsity | 142 |
| abstract_inverted_index.Moreover, | 183 |
| abstract_inverted_index.Optimizer | 52 |
| abstract_inverted_index.advantage | 139 |
| abstract_inverted_index.backbones | 156 |
| abstract_inverted_index.different | 154 |
| abstract_inverted_index.empowered | 66 |
| abstract_inverted_index.footprint | 180 |
| abstract_inverted_index.optimizer | 32, 95, 102 |
| abstract_inverted_index.therefore | 106 |
| abstract_inverted_index.comparable | 187 |
| abstract_inverted_index.efficiency | 145 |
| abstract_inverted_index.optimizer, | 130, 137 |
| abstract_inverted_index.optimizers | 17, 61 |
| abstract_inverted_index.parameters | 123 |
| abstract_inverted_index.predictive | 169 |
| abstract_inverted_index.proportion | 121 |
| abstract_inverted_index.resources. | 196 |
| abstract_inverted_index.controlling | 177 |
| abstract_inverted_index.convergence | 41, 172 |
| abstract_inverted_index.demonstrate | 162 |
| abstract_inverted_index.efficiency. | 114 |
| abstract_inverted_index.fine-tuning | 8, 14, 27, 191 |
| abstract_inverted_index.first-order | 16, 59, 190 |
| abstract_inverted_index.inter-layer | 68, 89 |
| abstract_inverted_index.intra-layer | 72, 116 |
| abstract_inverted_index.performance | 188 |
| abstract_inverted_index.pre-trained | 155 |
| abstract_inverted_index.propagation | 110 |
| abstract_inverted_index.significant | 36 |
| abstract_inverted_index.unnecessary | 108 |
| abstract_inverted_index.challenging: | 13 |
| abstract_inverted_index.consumption, | 25 |
| abstract_inverted_index.experimental | 148 |
| abstract_inverted_index.fine-tuning. | 63, 182 |
| abstract_inverted_index.optimization | 70, 91, 118 |
| abstract_inverted_index.zeroth-order | 31, 56 |
| abstract_inverted_index.optimization, | 74 |
| abstract_inverted_index.significantly | 166 |
| abstract_inverted_index.substantially | 193 |
| abstract_inverted_index.RoBERTa-large, | 158 |
| abstract_inverted_index.implementation. | 146 |
| abstract_inverted_index.memory-efficient | 30 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile.value | 0.2040201 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |