LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2410.11551
Training large models with millions or even billions of parameters from scratch incurs substantial computational costs. Parameter Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), address this challenge by adapting only a reduced number of parameters to specific tasks with gradient-based optimizers. In this paper, we cast PEFT as an optimal filtering/state estimation problem and present Low-Rank Kalman Optimizer (LoKO) to estimate the optimal trainable parameters in an online manner. We leverage the low-rank decomposition in LoRA to significantly reduce matrix sizes in Kalman iterations and further capitalize on a diagonal approximation of the covariance matrix to effectively decrease computational complexity from quadratic to linear in the number of trainable parameters. Moreover, we discovered that the initialization of the covariance matrix within the Kalman algorithm and the accurate estimation of the observation noise covariance are the keys in this formulation, and we propose robust approaches that work well across a vast range of well-established computer vision and language models. Our results show that LoKO converges with fewer iterations and yields better performance models compared to commonly used optimizers with LoRA in both image classifications and language tasks. Our study opens up the possibility of leveraging the Kalman filter as an effective optimizer for the online fine-tuning of large models.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2410.11551
- https://arxiv.org/pdf/2410.11551
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403574788
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403574788Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2410.11551Digital Object Identifier
- Title
-
LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-10-15Full publication date if available
- Authors
-
Hossein Abdi, Mingfei Sun, Andi Zhang, Samuel Kaski, Wei PanList of authors in order
- Landing page
-
https://arxiv.org/abs/2410.11551Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2410.11551Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2410.11551Direct OA link when available
- Concepts
-
Kalman filter, Rank (graph theory), Computer science, Mathematical optimization, Econometrics, Mathematics, Artificial intelligence, CombinatoricsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403574788 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2410.11551 |
| ids.doi | https://doi.org/10.48550/arxiv.2410.11551 |
| ids.openalex | https://openalex.org/W4403574788 |
| fwci | |
| type | preprint |
| title | LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11195 |
| topics[0].field.id | https://openalex.org/fields/18 |
| topics[0].field.display_name | Decision Sciences |
| topics[0].score | 0.7355999946594238 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1803 |
| topics[0].subfield.display_name | Management Science and Operations Research |
| topics[0].display_name | Simulation Techniques and Applications |
| topics[1].id | https://openalex.org/T10481 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.7070000171661377 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1704 |
| topics[1].subfield.display_name | Computer Graphics and Computer-Aided Design |
| topics[1].display_name | Computer Graphics and Visualization Techniques |
| topics[2].id | https://openalex.org/T10715 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.683899998664856 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1705 |
| topics[2].subfield.display_name | Computer Networks and Communications |
| topics[2].display_name | Distributed and Parallel Computing Systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C157286648 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7761422991752625 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q846780 |
| concepts[0].display_name | Kalman filter |
| concepts[1].id | https://openalex.org/C164226766 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6395654678344727 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q7293202 |
| concepts[1].display_name | Rank (graph theory) |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.4475715756416321 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C126255220 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3388872742652893 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[3].display_name | Mathematical optimization |
| concepts[4].id | https://openalex.org/C149782125 |
| concepts[4].level | 1 |
| concepts[4].score | 0.3351840376853943 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q160039 |
| concepts[4].display_name | Econometrics |
| concepts[5].id | https://openalex.org/C33923547 |
| concepts[5].level | 0 |
| concepts[5].score | 0.3110656142234802 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[5].display_name | Mathematics |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.20969069004058838 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C114614502 |
| concepts[7].level | 1 |
| concepts[7].score | 0.073635995388031 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q76592 |
| concepts[7].display_name | Combinatorics |
| keywords[0].id | https://openalex.org/keywords/kalman-filter |
| keywords[0].score | 0.7761422991752625 |
| keywords[0].display_name | Kalman filter |
| keywords[1].id | https://openalex.org/keywords/rank |
| keywords[1].score | 0.6395654678344727 |
| keywords[1].display_name | Rank (graph theory) |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.4475715756416321 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[3].score | 0.3388872742652893 |
| keywords[3].display_name | Mathematical optimization |
| keywords[4].id | https://openalex.org/keywords/econometrics |
| keywords[4].score | 0.3351840376853943 |
| keywords[4].display_name | Econometrics |
| keywords[5].id | https://openalex.org/keywords/mathematics |
| keywords[5].score | 0.3110656142234802 |
| keywords[5].display_name | Mathematics |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.20969069004058838 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/combinatorics |
| keywords[7].score | 0.073635995388031 |
| keywords[7].display_name | Combinatorics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2410.11551 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2410.11551 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2410.11551 |
| locations[1].id | doi:10.48550/arxiv.2410.11551 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2410.11551 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101853179 |
| authorships[0].author.orcid | https://orcid.org/0009-0000-9427-8828 |
| authorships[0].author.display_name | Hossein Abdi |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Abdi, Hossein |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101591811 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-5925-5425 |
| authorships[1].author.display_name | Mingfei Sun |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Sun, Mingfei |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5077911588 |
| authorships[2].author.orcid | https://orcid.org/0009-0007-4855-5442 |
| authorships[2].author.display_name | Andi Zhang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zhang, Andi |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5018305257 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-1925-9154 |
| authorships[3].author.display_name | Samuel Kaski |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Kaski, Samuel |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100721251 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-7353-3498 |
| authorships[4].author.display_name | Wei Pan |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Pan, Wei |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2410.11551 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-11T23:18:09.558992 |
| primary_topic.id | https://openalex.org/T11195 |
| primary_topic.field.id | https://openalex.org/fields/18 |
| primary_topic.field.display_name | Decision Sciences |
| primary_topic.score | 0.7355999946594238 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1803 |
| primary_topic.subfield.display_name | Management Science and Operations Research |
| primary_topic.display_name | Simulation Techniques and Applications |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W1979597421, https://openalex.org/W2007980826, https://openalex.org/W2061531152, https://openalex.org/W3002753104, https://openalex.org/W2077600819, https://openalex.org/W2142036596, https://openalex.org/W2072657027, https://openalex.org/W2600246793, https://openalex.org/W4238204885 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2410.11551 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2410.11551 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2410.11551 |
| primary_location.id | pmh:oai:arXiv.org:2410.11551 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2410.11551 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2410.11551 |
| publication_date | 2024-10-15 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 31, 89, 149 |
| abstract_inverted_index.In | 42 |
| abstract_inverted_index.We | 70 |
| abstract_inverted_index.an | 49, 67, 199 |
| abstract_inverted_index.as | 48, 198 |
| abstract_inverted_index.by | 28 |
| abstract_inverted_index.in | 66, 75, 82, 105, 137, 180 |
| abstract_inverted_index.of | 8, 34, 92, 108, 117, 129, 152, 193, 206 |
| abstract_inverted_index.on | 88 |
| abstract_inverted_index.or | 5 |
| abstract_inverted_index.to | 36, 60, 77, 96, 103, 174 |
| abstract_inverted_index.up | 190 |
| abstract_inverted_index.we | 45, 112, 141 |
| abstract_inverted_index.Our | 159, 187 |
| abstract_inverted_index.and | 54, 85, 125, 140, 156, 168, 184 |
| abstract_inverted_index.are | 134 |
| abstract_inverted_index.for | 202 |
| abstract_inverted_index.the | 62, 72, 93, 106, 115, 118, 122, 126, 130, 135, 191, 195, 203 |
| abstract_inverted_index.LoKO | 163 |
| abstract_inverted_index.LoRA | 76, 179 |
| abstract_inverted_index.PEFT | 47 |
| abstract_inverted_index.both | 181 |
| abstract_inverted_index.cast | 46 |
| abstract_inverted_index.even | 6 |
| abstract_inverted_index.from | 10, 101 |
| abstract_inverted_index.keys | 136 |
| abstract_inverted_index.only | 30 |
| abstract_inverted_index.show | 161 |
| abstract_inverted_index.that | 114, 145, 162 |
| abstract_inverted_index.this | 26, 43, 138 |
| abstract_inverted_index.used | 176 |
| abstract_inverted_index.vast | 150 |
| abstract_inverted_index.well | 147 |
| abstract_inverted_index.with | 3, 39, 165, 178 |
| abstract_inverted_index.work | 146 |
| abstract_inverted_index.fewer | 166 |
| abstract_inverted_index.image | 182 |
| abstract_inverted_index.large | 1, 207 |
| abstract_inverted_index.noise | 132 |
| abstract_inverted_index.opens | 189 |
| abstract_inverted_index.range | 151 |
| abstract_inverted_index.sizes | 81 |
| abstract_inverted_index.study | 188 |
| abstract_inverted_index.tasks | 38 |
| abstract_inverted_index.(LoKO) | 59 |
| abstract_inverted_index.(PEFT) | 19 |
| abstract_inverted_index.Kalman | 57, 83, 123, 196 |
| abstract_inverted_index.across | 148 |
| abstract_inverted_index.better | 170 |
| abstract_inverted_index.costs. | 15 |
| abstract_inverted_index.filter | 197 |
| abstract_inverted_index.incurs | 12 |
| abstract_inverted_index.linear | 104 |
| abstract_inverted_index.matrix | 80, 95, 120 |
| abstract_inverted_index.models | 2, 172 |
| abstract_inverted_index.number | 33, 107 |
| abstract_inverted_index.online | 68, 204 |
| abstract_inverted_index.paper, | 44 |
| abstract_inverted_index.reduce | 79 |
| abstract_inverted_index.robust | 143 |
| abstract_inverted_index.tasks. | 186 |
| abstract_inverted_index.vision | 155 |
| abstract_inverted_index.within | 121 |
| abstract_inverted_index.yields | 169 |
| abstract_inverted_index.(LoRA), | 24 |
| abstract_inverted_index.address | 25 |
| abstract_inverted_index.further | 86 |
| abstract_inverted_index.manner. | 69 |
| abstract_inverted_index.models. | 158, 208 |
| abstract_inverted_index.optimal | 50, 63 |
| abstract_inverted_index.present | 55 |
| abstract_inverted_index.problem | 53 |
| abstract_inverted_index.propose | 142 |
| abstract_inverted_index.reduced | 32 |
| abstract_inverted_index.results | 160 |
| abstract_inverted_index.scratch | 11 |
| abstract_inverted_index.Low-Rank | 22, 56 |
| abstract_inverted_index.Training | 0 |
| abstract_inverted_index.accurate | 127 |
| abstract_inverted_index.adapting | 29 |
| abstract_inverted_index.billions | 7 |
| abstract_inverted_index.commonly | 175 |
| abstract_inverted_index.compared | 173 |
| abstract_inverted_index.computer | 154 |
| abstract_inverted_index.decrease | 98 |
| abstract_inverted_index.diagonal | 90 |
| abstract_inverted_index.estimate | 61 |
| abstract_inverted_index.language | 157, 185 |
| abstract_inverted_index.leverage | 71 |
| abstract_inverted_index.low-rank | 73 |
| abstract_inverted_index.methods, | 20 |
| abstract_inverted_index.millions | 4 |
| abstract_inverted_index.specific | 37 |
| abstract_inverted_index.Efficient | 17 |
| abstract_inverted_index.Moreover, | 111 |
| abstract_inverted_index.Optimizer | 58 |
| abstract_inverted_index.Parameter | 16 |
| abstract_inverted_index.algorithm | 124 |
| abstract_inverted_index.challenge | 27 |
| abstract_inverted_index.converges | 164 |
| abstract_inverted_index.effective | 200 |
| abstract_inverted_index.optimizer | 201 |
| abstract_inverted_index.quadratic | 102 |
| abstract_inverted_index.trainable | 64, 109 |
| abstract_inverted_index.Adaptation | 23 |
| abstract_inverted_index.approaches | 144 |
| abstract_inverted_index.capitalize | 87 |
| abstract_inverted_index.complexity | 100 |
| abstract_inverted_index.covariance | 94, 119, 133 |
| abstract_inverted_index.discovered | 113 |
| abstract_inverted_index.estimation | 52, 128 |
| abstract_inverted_index.iterations | 84, 167 |
| abstract_inverted_index.leveraging | 194 |
| abstract_inverted_index.optimizers | 177 |
| abstract_inverted_index.parameters | 9, 35, 65 |
| abstract_inverted_index.Fine-Tuning | 18 |
| abstract_inverted_index.effectively | 97 |
| abstract_inverted_index.fine-tuning | 205 |
| abstract_inverted_index.observation | 131 |
| abstract_inverted_index.optimizers. | 41 |
| abstract_inverted_index.parameters. | 110 |
| abstract_inverted_index.performance | 171 |
| abstract_inverted_index.possibility | 192 |
| abstract_inverted_index.substantial | 13 |
| abstract_inverted_index.formulation, | 139 |
| abstract_inverted_index.particularly | 21 |
| abstract_inverted_index.approximation | 91 |
| abstract_inverted_index.computational | 14, 99 |
| abstract_inverted_index.decomposition | 74 |
| abstract_inverted_index.significantly | 78 |
| abstract_inverted_index.gradient-based | 40 |
| abstract_inverted_index.initialization | 116 |
| abstract_inverted_index.classifications | 183 |
| abstract_inverted_index.filtering/state | 51 |
| abstract_inverted_index.well-established | 153 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |