Global Convergence of SGD On Two Layer Neural Nets Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2210.11452
In this note, we consider appropriately regularized $\ell_2-$empirical risk of depth $2$ nets with any number of gates and show bounds on how the empirical loss evolves for SGD iterates on it -- for arbitrary data and if the activation is adequately smooth and bounded like sigmoid and tanh. This in turn leads to a proof of global convergence of SGD for a special class of initializations. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives. Most critically the amount of regularization required for our analysis is independent of the size of the net.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2210.11452
- https://arxiv.org/pdf/2210.11452
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4307079350
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4307079350Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2210.11452Digital Object Identifier
- Title
-
Global Convergence of SGD On Two Layer Neural NetsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-10-20Full publication date if available
- Authors
-
Pulkit Gopalani, Anirbit MukherjeeList of authors in order
- Landing page
-
https://arxiv.org/abs/2210.11452Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2210.11452Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2210.11452Direct OA link when available
- Concepts
-
Convergence (economics), Layer (electronics), Artificial neural network, Computer science, Artificial intelligence, Economics, Materials science, Nanotechnology, Economic growthTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4307079350 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2210.11452 |
| ids.doi | https://doi.org/10.48550/arxiv.2210.11452 |
| ids.openalex | https://openalex.org/W4307079350 |
| fwci | 0.0 |
| type | preprint |
| title | Global Convergence of SGD On Two Layer Neural Nets |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10320 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9215999841690063 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Neural Networks and Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2777303404 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7177873849868774 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q759757 |
| concepts[0].display_name | Convergence (economics) |
| concepts[1].id | https://openalex.org/C2779227376 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5719895958900452 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q6505497 |
| concepts[1].display_name | Layer (electronics) |
| concepts[2].id | https://openalex.org/C50644808 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5159071683883667 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[2].display_name | Artificial neural network |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.4593205451965332 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.28735870122909546 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C162324750 |
| concepts[5].level | 0 |
| concepts[5].score | 0.16848424077033997 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[5].display_name | Economics |
| concepts[6].id | https://openalex.org/C192562407 |
| concepts[6].level | 0 |
| concepts[6].score | 0.08493128418922424 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q228736 |
| concepts[6].display_name | Materials science |
| concepts[7].id | https://openalex.org/C171250308 |
| concepts[7].level | 1 |
| concepts[7].score | 0.053293824195861816 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11468 |
| concepts[7].display_name | Nanotechnology |
| concepts[8].id | https://openalex.org/C50522688 |
| concepts[8].level | 1 |
| concepts[8].score | 0.0 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q189833 |
| concepts[8].display_name | Economic growth |
| keywords[0].id | https://openalex.org/keywords/convergence |
| keywords[0].score | 0.7177873849868774 |
| keywords[0].display_name | Convergence (economics) |
| keywords[1].id | https://openalex.org/keywords/layer |
| keywords[1].score | 0.5719895958900452 |
| keywords[1].display_name | Layer (electronics) |
| keywords[2].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[2].score | 0.5159071683883667 |
| keywords[2].display_name | Artificial neural network |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.4593205451965332 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.28735870122909546 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/economics |
| keywords[5].score | 0.16848424077033997 |
| keywords[5].display_name | Economics |
| keywords[6].id | https://openalex.org/keywords/materials-science |
| keywords[6].score | 0.08493128418922424 |
| keywords[6].display_name | Materials science |
| keywords[7].id | https://openalex.org/keywords/nanotechnology |
| keywords[7].score | 0.053293824195861816 |
| keywords[7].display_name | Nanotechnology |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2210.11452 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2210.11452 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2210.11452 |
| locations[1].id | doi:10.48550/arxiv.2210.11452 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article-journal |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2210.11452 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5061501068 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-9662-224X |
| authorships[0].author.display_name | Pulkit Gopalani |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Gopalani, Pulkit |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5084835559 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-5189-8939 |
| authorships[1].author.display_name | Anirbit Mukherjee |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Mukherjee, Anirbit |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2210.11452 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Global Convergence of SGD On Two Layer Neural Nets |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10320 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9215999841690063 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Neural Networks and Applications |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W2382290278, https://openalex.org/W2478288626, https://openalex.org/W4391913857, https://openalex.org/W2350741829 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2210.11452 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2210.11452 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2210.11452 |
| primary_location.id | pmh:oai:arXiv.org:2210.11452 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2210.11452 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2210.11452 |
| publication_date | 2022-10-20 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 54, 62 |
| abstract_inverted_index.-- | 32 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.We | 67 |
| abstract_inverted_index.an | 70 |
| abstract_inverted_index.be | 112 |
| abstract_inverted_index.if | 37 |
| abstract_inverted_index.in | 50 |
| abstract_inverted_index.is | 40, 91, 135 |
| abstract_inverted_index.it | 31 |
| abstract_inverted_index.of | 9, 16, 56, 59, 65, 96, 129, 137, 140 |
| abstract_inverted_index.on | 21, 30, 102, 116, 122 |
| abstract_inverted_index.to | 53, 82, 92, 114 |
| abstract_inverted_index.we | 3 |
| abstract_inverted_index.$2$ | 11 |
| abstract_inverted_index.Our | 88 |
| abstract_inverted_index.SGD | 28, 60, 78, 121 |
| abstract_inverted_index.and | 18, 36, 43, 47, 110 |
| abstract_inverted_index.any | 14 |
| abstract_inverted_index.are | 107 |
| abstract_inverted_index.for | 27, 33, 61, 75, 132 |
| abstract_inverted_index.how | 22 |
| abstract_inverted_index.key | 89 |
| abstract_inverted_index.our | 133 |
| abstract_inverted_index.the | 23, 38, 94, 127, 138, 141 |
| abstract_inverted_index.Most | 125 |
| abstract_inverted_index.This | 49 |
| abstract_inverted_index.able | 113 |
| abstract_inverted_index.also | 68, 80 |
| abstract_inverted_index.data | 35 |
| abstract_inverted_index.fast | 72 |
| abstract_inverted_index.idea | 90 |
| abstract_inverted_index.like | 45, 86 |
| abstract_inverted_index.loss | 25, 100 |
| abstract_inverted_index.net. | 142 |
| abstract_inverted_index.nets | 12, 105 |
| abstract_inverted_index.norm | 98 |
| abstract_inverted_index.rate | 74 |
| abstract_inverted_index.risk | 8 |
| abstract_inverted_index.show | 19, 93 |
| abstract_inverted_index.size | 139 |
| abstract_inverted_index.such | 123 |
| abstract_inverted_index.that | 79 |
| abstract_inverted_index.this | 1 |
| abstract_inverted_index.thus | 111 |
| abstract_inverted_index.time | 77 |
| abstract_inverted_index.turn | 51 |
| abstract_inverted_index.with | 13, 119 |
| abstract_inverted_index.build | 115 |
| abstract_inverted_index.class | 64 |
| abstract_inverted_index.depth | 10 |
| abstract_inverted_index.gates | 17 |
| abstract_inverted_index.leads | 52 |
| abstract_inverted_index.note, | 2 |
| abstract_inverted_index.proof | 55 |
| abstract_inverted_index.prove | 69 |
| abstract_inverted_index.tanh. | 48 |
| abstract_inverted_index.which | 106 |
| abstract_inverted_index.amount | 128 |
| abstract_inverted_index.bounds | 20 |
| abstract_inverted_index.global | 57 |
| abstract_inverted_index.neural | 104 |
| abstract_inverted_index.number | 15 |
| abstract_inverted_index.recent | 117 |
| abstract_inverted_index.smooth | 42, 83 |
| abstract_inverted_index.applies | 81 |
| abstract_inverted_index.bounded | 44 |
| abstract_inverted_index.evolves | 26 |
| abstract_inverted_index.sigmoid | 46 |
| abstract_inverted_index.special | 63 |
| abstract_inverted_index."Villani | 108 |
| abstract_inverted_index.analysis | 134 |
| abstract_inverted_index.consider | 4 |
| abstract_inverted_index.iterates | 29 |
| abstract_inverted_index.progress | 118 |
| abstract_inverted_index.required | 131 |
| abstract_inverted_index.Frobenius | 97 |
| abstract_inverted_index.SoftPlus. | 87 |
| abstract_inverted_index.analyzing | 120 |
| abstract_inverted_index.arbitrary | 34 |
| abstract_inverted_index.empirical | 24 |
| abstract_inverted_index.existence | 95 |
| abstract_inverted_index.functions | 101 |
| abstract_inverted_index.unbounded | 84 |
| abstract_inverted_index.activation | 39 |
| abstract_inverted_index.adequately | 41 |
| abstract_inverted_index.continuous | 76 |
| abstract_inverted_index.critically | 126 |
| abstract_inverted_index.functions" | 109 |
| abstract_inverted_index.activations | 85 |
| abstract_inverted_index.convergence | 58, 73 |
| abstract_inverted_index.independent | 136 |
| abstract_inverted_index.objectives. | 124 |
| abstract_inverted_index.regularized | 6, 99 |
| abstract_inverted_index.appropriately | 5 |
| abstract_inverted_index.exponentially | 71 |
| abstract_inverted_index.constant-sized | 103 |
| abstract_inverted_index.regularization | 130 |
| abstract_inverted_index.initializations. | 66 |
| abstract_inverted_index.$\ell_2-$empirical | 7 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/10 |
| sustainable_development_goals[0].score | 0.4699999988079071 |
| sustainable_development_goals[0].display_name | Reduced inequalities |
| citation_normalized_percentile.value | 0.12633031 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |