Step-size Adaptation Using Exponentiated Gradient Updates Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2202.00145
Optimizers like Adam and AdaGrad have been very successful in training large-scale neural networks. Yet, the performance of these methods is heavily dependent on a carefully tuned learning rate schedule. We show that in many large-scale applications, augmenting a given optimizer with an adaptive tuning method of the step-size greatly improves the performance. More precisely, we maintain a global step-size scale for the update as well as a gain factor for each coordinate. We adjust the global scale based on the alignment of the average gradient and the current gradient vectors. A similar approach is used for updating the local gain factors. This type of step-size scale tuning has been done before with gradient descent updates. In this paper, we update the step-size scale and the gain variables with exponentiated gradient updates instead. Experimentally, we show that our approach can achieve compelling accuracy on standard models without using any specially tuned learning rate schedule. We also show the effectiveness of our approach for quickly adapting to distribution shifts in the data during training.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2202.00145
- https://arxiv.org/pdf/2202.00145
- OA Status
- green
- Cited By
- 4
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4221163212
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4221163212Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2202.00145Digital Object Identifier
- Title
-
Step-size Adaptation Using Exponentiated Gradient UpdatesWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-01-31Full publication date if available
- Authors
-
Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. WarmuthList of authors in order
- Landing page
-
https://arxiv.org/abs/2202.00145Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2202.00145Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2202.00145Direct OA link when available
- Concepts
-
Computer science, Gradient descent, Scale (ratio), Schedule, Adaptation (eye), Scale factor (cosmology), Algorithm, Artificial neural network, Mathematical optimization, Artificial intelligence, Mathematics, Cosmology, Metric expansion of space, Quantum mechanics, Operating system, Dark energy, Physics, OpticsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
4Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1, 2023: 2, 2022: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4221163212 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2202.00145 |
| ids.doi | https://doi.org/10.48550/arxiv.2202.00145 |
| ids.openalex | https://openalex.org/W4221163212 |
| fwci | |
| type | preprint |
| title | Step-size Adaptation Using Exponentiated Gradient Updates |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12676 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9955999851226807 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Machine Learning and ELM |
| topics[1].id | https://openalex.org/T10320 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9939000010490417 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Neural Networks and Applications |
| topics[2].id | https://openalex.org/T10036 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9916999936103821 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Advanced Neural Network Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.6843476295471191 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C153258448 |
| concepts[1].level | 3 |
| concepts[1].score | 0.6654772162437439 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1199743 |
| concepts[1].display_name | Gradient descent |
| concepts[2].id | https://openalex.org/C2778755073 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6561452746391296 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q10858537 |
| concepts[2].display_name | Scale (ratio) |
| concepts[3].id | https://openalex.org/C68387754 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6535001397132874 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q7271585 |
| concepts[3].display_name | Schedule |
| concepts[4].id | https://openalex.org/C139807058 |
| concepts[4].level | 2 |
| concepts[4].score | 0.513901948928833 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q352374 |
| concepts[4].display_name | Adaptation (eye) |
| concepts[5].id | https://openalex.org/C144386022 |
| concepts[5].level | 5 |
| concepts[5].score | 0.43868914246559143 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1332997 |
| concepts[5].display_name | Scale factor (cosmology) |
| concepts[6].id | https://openalex.org/C11413529 |
| concepts[6].level | 1 |
| concepts[6].score | 0.39010095596313477 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[6].display_name | Algorithm |
| concepts[7].id | https://openalex.org/C50644808 |
| concepts[7].level | 2 |
| concepts[7].score | 0.3638228178024292 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[7].display_name | Artificial neural network |
| concepts[8].id | https://openalex.org/C126255220 |
| concepts[8].level | 1 |
| concepts[8].score | 0.3510492146015167 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[8].display_name | Mathematical optimization |
| concepts[9].id | https://openalex.org/C154945302 |
| concepts[9].level | 1 |
| concepts[9].score | 0.3342996835708618 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[9].display_name | Artificial intelligence |
| concepts[10].id | https://openalex.org/C33923547 |
| concepts[10].level | 0 |
| concepts[10].score | 0.26454591751098633 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[10].display_name | Mathematics |
| concepts[11].id | https://openalex.org/C26405456 |
| concepts[11].level | 2 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q338 |
| concepts[11].display_name | Cosmology |
| concepts[12].id | https://openalex.org/C20154449 |
| concepts[12].level | 4 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q1129469 |
| concepts[12].display_name | Metric expansion of space |
| concepts[13].id | https://openalex.org/C62520636 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[13].display_name | Quantum mechanics |
| concepts[14].id | https://openalex.org/C111919701 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[14].display_name | Operating system |
| concepts[15].id | https://openalex.org/C172790937 |
| concepts[15].level | 3 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q18343 |
| concepts[15].display_name | Dark energy |
| concepts[16].id | https://openalex.org/C121332964 |
| concepts[16].level | 0 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[16].display_name | Physics |
| concepts[17].id | https://openalex.org/C120665830 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q14620 |
| concepts[17].display_name | Optics |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.6843476295471191 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/gradient-descent |
| keywords[1].score | 0.6654772162437439 |
| keywords[1].display_name | Gradient descent |
| keywords[2].id | https://openalex.org/keywords/scale |
| keywords[2].score | 0.6561452746391296 |
| keywords[2].display_name | Scale (ratio) |
| keywords[3].id | https://openalex.org/keywords/schedule |
| keywords[3].score | 0.6535001397132874 |
| keywords[3].display_name | Schedule |
| keywords[4].id | https://openalex.org/keywords/adaptation |
| keywords[4].score | 0.513901948928833 |
| keywords[4].display_name | Adaptation (eye) |
| keywords[5].id | https://openalex.org/keywords/scale-factor |
| keywords[5].score | 0.43868914246559143 |
| keywords[5].display_name | Scale factor (cosmology) |
| keywords[6].id | https://openalex.org/keywords/algorithm |
| keywords[6].score | 0.39010095596313477 |
| keywords[6].display_name | Algorithm |
| keywords[7].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[7].score | 0.3638228178024292 |
| keywords[7].display_name | Artificial neural network |
| keywords[8].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[8].score | 0.3510492146015167 |
| keywords[8].display_name | Mathematical optimization |
| keywords[9].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[9].score | 0.3342996835708618 |
| keywords[9].display_name | Artificial intelligence |
| keywords[10].id | https://openalex.org/keywords/mathematics |
| keywords[10].score | 0.26454591751098633 |
| keywords[10].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2202.00145 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2202.00145 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2202.00145 |
| locations[1].id | doi:10.48550/arxiv.2202.00145 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2202.00145 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5056776503 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-6097-0226 |
| authorships[0].author.display_name | Ehsan Amid |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Amid, Ehsan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5104083306 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Rohan Anil |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Anil, Rohan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5021315343 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Christopher Fifty |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Fifty, Christopher |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5108549518 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Manfred K. Warmuth |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Warmuth, Manfred K. |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2202.00145 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Step-size Adaptation Using Exponentiated Gradient Updates |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12676 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9955999851226807 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Machine Learning and ELM |
| related_works | https://openalex.org/W2997567050, https://openalex.org/W1483272040, https://openalex.org/W4283377908, https://openalex.org/W1533421371, https://openalex.org/W2003050223, https://openalex.org/W2091777911, https://openalex.org/W2766405861, https://openalex.org/W2360975119, https://openalex.org/W2912421143, https://openalex.org/W1998698147 |
| cited_by_count | 4 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2023 |
| counts_by_year[1].cited_by_count | 2 |
| counts_by_year[2].year | 2022 |
| counts_by_year[2].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2202.00145 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2202.00145 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2202.00145 |
| primary_location.id | pmh:oai:arXiv.org:2202.00145 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2202.00145 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2202.00145 |
| publication_date | 2022-01-31 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.A | 91 |
| abstract_inverted_index.a | 24, 38, 57, 67 |
| abstract_inverted_index.In | 116 |
| abstract_inverted_index.We | 30, 73, 154 |
| abstract_inverted_index.an | 42 |
| abstract_inverted_index.as | 64, 66 |
| abstract_inverted_index.in | 9, 33, 168 |
| abstract_inverted_index.is | 20, 94 |
| abstract_inverted_index.of | 17, 46, 82, 104, 159 |
| abstract_inverted_index.on | 23, 79, 143 |
| abstract_inverted_index.to | 165 |
| abstract_inverted_index.we | 55, 119, 134 |
| abstract_inverted_index.and | 3, 86, 124 |
| abstract_inverted_index.any | 148 |
| abstract_inverted_index.can | 139 |
| abstract_inverted_index.for | 61, 70, 96, 162 |
| abstract_inverted_index.has | 108 |
| abstract_inverted_index.our | 137, 160 |
| abstract_inverted_index.the | 15, 47, 51, 62, 75, 80, 83, 87, 98, 121, 125, 157, 169 |
| abstract_inverted_index.Adam | 2 |
| abstract_inverted_index.More | 53 |
| abstract_inverted_index.This | 102 |
| abstract_inverted_index.Yet, | 14 |
| abstract_inverted_index.also | 155 |
| abstract_inverted_index.been | 6, 109 |
| abstract_inverted_index.data | 170 |
| abstract_inverted_index.done | 110 |
| abstract_inverted_index.each | 71 |
| abstract_inverted_index.gain | 68, 100, 126 |
| abstract_inverted_index.have | 5 |
| abstract_inverted_index.like | 1 |
| abstract_inverted_index.many | 34 |
| abstract_inverted_index.rate | 28, 152 |
| abstract_inverted_index.show | 31, 135, 156 |
| abstract_inverted_index.that | 32, 136 |
| abstract_inverted_index.this | 117 |
| abstract_inverted_index.type | 103 |
| abstract_inverted_index.used | 95 |
| abstract_inverted_index.very | 7 |
| abstract_inverted_index.well | 65 |
| abstract_inverted_index.with | 41, 112, 128 |
| abstract_inverted_index.based | 78 |
| abstract_inverted_index.given | 39 |
| abstract_inverted_index.local | 99 |
| abstract_inverted_index.scale | 60, 77, 106, 123 |
| abstract_inverted_index.these | 18 |
| abstract_inverted_index.tuned | 26, 150 |
| abstract_inverted_index.using | 147 |
| abstract_inverted_index.adjust | 74 |
| abstract_inverted_index.before | 111 |
| abstract_inverted_index.during | 171 |
| abstract_inverted_index.factor | 69 |
| abstract_inverted_index.global | 58, 76 |
| abstract_inverted_index.method | 45 |
| abstract_inverted_index.models | 145 |
| abstract_inverted_index.neural | 12 |
| abstract_inverted_index.paper, | 118 |
| abstract_inverted_index.shifts | 167 |
| abstract_inverted_index.tuning | 44, 107 |
| abstract_inverted_index.update | 63, 120 |
| abstract_inverted_index.AdaGrad | 4 |
| abstract_inverted_index.achieve | 140 |
| abstract_inverted_index.average | 84 |
| abstract_inverted_index.current | 88 |
| abstract_inverted_index.descent | 114 |
| abstract_inverted_index.greatly | 49 |
| abstract_inverted_index.heavily | 21 |
| abstract_inverted_index.methods | 19 |
| abstract_inverted_index.quickly | 163 |
| abstract_inverted_index.similar | 92 |
| abstract_inverted_index.updates | 131 |
| abstract_inverted_index.without | 146 |
| abstract_inverted_index.accuracy | 142 |
| abstract_inverted_index.adapting | 164 |
| abstract_inverted_index.adaptive | 43 |
| abstract_inverted_index.approach | 93, 138, 161 |
| abstract_inverted_index.factors. | 101 |
| abstract_inverted_index.gradient | 85, 89, 113, 130 |
| abstract_inverted_index.improves | 50 |
| abstract_inverted_index.instead. | 132 |
| abstract_inverted_index.learning | 27, 151 |
| abstract_inverted_index.maintain | 56 |
| abstract_inverted_index.standard | 144 |
| abstract_inverted_index.training | 10 |
| abstract_inverted_index.updates. | 115 |
| abstract_inverted_index.updating | 97 |
| abstract_inverted_index.vectors. | 90 |
| abstract_inverted_index.alignment | 81 |
| abstract_inverted_index.carefully | 25 |
| abstract_inverted_index.dependent | 22 |
| abstract_inverted_index.networks. | 13 |
| abstract_inverted_index.optimizer | 40 |
| abstract_inverted_index.schedule. | 29, 153 |
| abstract_inverted_index.specially | 149 |
| abstract_inverted_index.step-size | 48, 59, 105, 122 |
| abstract_inverted_index.training. | 172 |
| abstract_inverted_index.variables | 127 |
| abstract_inverted_index.Optimizers | 0 |
| abstract_inverted_index.augmenting | 37 |
| abstract_inverted_index.compelling | 141 |
| abstract_inverted_index.precisely, | 54 |
| abstract_inverted_index.successful | 8 |
| abstract_inverted_index.coordinate. | 72 |
| abstract_inverted_index.large-scale | 11, 35 |
| abstract_inverted_index.performance | 16 |
| abstract_inverted_index.distribution | 166 |
| abstract_inverted_index.performance. | 52 |
| abstract_inverted_index.applications, | 36 |
| abstract_inverted_index.effectiveness | 158 |
| abstract_inverted_index.exponentiated | 129 |
| abstract_inverted_index.Experimentally, | 133 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |