Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear Models Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2409.07434
This paper proposes an asymptotic theory for online inference of the stochastic gradient descent (SGD) iterates with dropout regularization in linear regression. Specifically, we establish the geometric-moment contraction (GMC) for constant step-size SGD dropout iterates to show the existence of a unique stationary distribution of the dropout recursive function. By the GMC property, we provide quenched central limit theorems (CLT) for the difference between dropout and $\ell^2$-regularized iterates, regardless of initialization. The CLT for the difference between the Ruppert-Polyak averaged SGD (ASGD) with dropout and $\ell^2$-regularized iterates is also presented. Based on these asymptotic normality results, we further introduce an online estimator for the long-run covariance matrix of ASGD dropout to facilitate inference in a recursive manner with efficiency in computational time and memory. The numerical experiments demonstrate that for sufficiently large samples, the proposed confidence intervals for ASGD with dropout nearly achieve the nominal coverage probability.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2409.07434
- https://arxiv.org/pdf/2409.07434
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403622212
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403622212Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2409.07434Digital Object Identifier
- Title
-
Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-09-11Full publication date if available
- Authors
-
J. Jenny Li, Johannes Schmidt-Hieber, Wei Biao WuList of authors in order
- Landing page
-
https://arxiv.org/abs/2409.07434Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2409.07434Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2409.07434Direct OA link when available
- Concepts
-
Regularization (linguistics), Dropout (neural networks), Stochastic gradient descent, Applied mathematics, Mathematics, Gradient descent, Computer science, Artificial intelligence, Machine learning, Artificial neural networkTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403622212 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2409.07434 |
| ids.doi | https://doi.org/10.48550/arxiv.2409.07434 |
| ids.openalex | https://openalex.org/W4403622212 |
| fwci | |
| type | preprint |
| title | Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11830 |
| topics[0].field.id | https://openalex.org/fields/26 |
| topics[0].field.display_name | Mathematics |
| topics[0].score | 0.9258999824523926 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2604 |
| topics[0].subfield.display_name | Applied Mathematics |
| topics[0].display_name | Point processes and geometric inequalities |
| topics[1].id | https://openalex.org/T10067 |
| topics[1].field.id | https://openalex.org/fields/20 |
| topics[1].field.display_name | Economics, Econometrics and Finance |
| topics[1].score | 0.9014000296592712 |
| topics[1].domain.id | https://openalex.org/domains/2 |
| topics[1].domain.display_name | Social Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2003 |
| topics[1].subfield.display_name | Finance |
| topics[1].display_name | Stochastic processes and financial applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2776135515 |
| concepts[0].level | 2 |
| concepts[0].score | 0.5864959955215454 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q17143721 |
| concepts[0].display_name | Regularization (linguistics) |
| concepts[1].id | https://openalex.org/C2776145597 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5815275311470032 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q25339462 |
| concepts[1].display_name | Dropout (neural networks) |
| concepts[2].id | https://openalex.org/C206688291 |
| concepts[2].level | 3 |
| concepts[2].score | 0.5255019664764404 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q7617819 |
| concepts[2].display_name | Stochastic gradient descent |
| concepts[3].id | https://openalex.org/C28826006 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5217828750610352 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q33521 |
| concepts[3].display_name | Applied mathematics |
| concepts[4].id | https://openalex.org/C33923547 |
| concepts[4].level | 0 |
| concepts[4].score | 0.4722152352333069 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[4].display_name | Mathematics |
| concepts[5].id | https://openalex.org/C153258448 |
| concepts[5].level | 3 |
| concepts[5].score | 0.451107382774353 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1199743 |
| concepts[5].display_name | Gradient descent |
| concepts[6].id | https://openalex.org/C41008148 |
| concepts[6].level | 0 |
| concepts[6].score | 0.32511425018310547 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[6].display_name | Computer science |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.17515221238136292 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C119857082 |
| concepts[8].level | 1 |
| concepts[8].score | 0.09162768721580505 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[8].display_name | Machine learning |
| concepts[9].id | https://openalex.org/C50644808 |
| concepts[9].level | 2 |
| concepts[9].score | 0.061553627252578735 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[9].display_name | Artificial neural network |
| keywords[0].id | https://openalex.org/keywords/regularization |
| keywords[0].score | 0.5864959955215454 |
| keywords[0].display_name | Regularization (linguistics) |
| keywords[1].id | https://openalex.org/keywords/dropout |
| keywords[1].score | 0.5815275311470032 |
| keywords[1].display_name | Dropout (neural networks) |
| keywords[2].id | https://openalex.org/keywords/stochastic-gradient-descent |
| keywords[2].score | 0.5255019664764404 |
| keywords[2].display_name | Stochastic gradient descent |
| keywords[3].id | https://openalex.org/keywords/applied-mathematics |
| keywords[3].score | 0.5217828750610352 |
| keywords[3].display_name | Applied mathematics |
| keywords[4].id | https://openalex.org/keywords/mathematics |
| keywords[4].score | 0.4722152352333069 |
| keywords[4].display_name | Mathematics |
| keywords[5].id | https://openalex.org/keywords/gradient-descent |
| keywords[5].score | 0.451107382774353 |
| keywords[5].display_name | Gradient descent |
| keywords[6].id | https://openalex.org/keywords/computer-science |
| keywords[6].score | 0.32511425018310547 |
| keywords[6].display_name | Computer science |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.17515221238136292 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/machine-learning |
| keywords[8].score | 0.09162768721580505 |
| keywords[8].display_name | Machine learning |
| keywords[9].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[9].score | 0.061553627252578735 |
| keywords[9].display_name | Artificial neural network |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2409.07434 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2409.07434 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2409.07434 |
| locations[1].id | doi:10.48550/arxiv.2409.07434 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2409.07434 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5037457741 |
| authorships[0].author.orcid | https://orcid.org/0009-0003-0597-0970 |
| authorships[0].author.display_name | J. Jenny Li |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Li, Jiaqi |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5002981992 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-2699-4990 |
| authorships[1].author.display_name | Johannes Schmidt-Hieber |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Schmidt-Hieber, Johannes |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5036683934 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-4310-9965 |
| authorships[2].author.display_name | Wei Biao Wu |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Wu, Wei Biao |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2409.07434 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-10-22T00:00:00 |
| display_name | Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11830 |
| primary_topic.field.id | https://openalex.org/fields/26 |
| primary_topic.field.display_name | Mathematics |
| primary_topic.score | 0.9258999824523926 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2604 |
| primary_topic.subfield.display_name | Applied Mathematics |
| primary_topic.display_name | Point processes and geometric inequalities |
| related_works | https://openalex.org/W2918829236, https://openalex.org/W4206903459, https://openalex.org/W2754816816, https://openalex.org/W4366280654, https://openalex.org/W3160167280, https://openalex.org/W4231621013, https://openalex.org/W4362706668, https://openalex.org/W3008318776, https://openalex.org/W1977633006, https://openalex.org/W2041416246 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2409.07434 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2409.07434 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2409.07434 |
| primary_location.id | pmh:oai:arXiv.org:2409.07434 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2409.07434 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2409.07434 |
| publication_date | 2024-09-11 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 40, 114 |
| abstract_inverted_index.By | 49 |
| abstract_inverted_index.an | 3, 99 |
| abstract_inverted_index.in | 19, 113, 119 |
| abstract_inverted_index.is | 87 |
| abstract_inverted_index.of | 9, 39, 44, 69, 107 |
| abstract_inverted_index.on | 91 |
| abstract_inverted_index.to | 35, 110 |
| abstract_inverted_index.we | 23, 53, 96 |
| abstract_inverted_index.CLT | 72 |
| abstract_inverted_index.GMC | 51 |
| abstract_inverted_index.SGD | 32, 80 |
| abstract_inverted_index.The | 71, 124 |
| abstract_inverted_index.and | 65, 84, 122 |
| abstract_inverted_index.for | 6, 29, 60, 73, 102, 129, 137 |
| abstract_inverted_index.the | 10, 25, 37, 45, 50, 61, 74, 77, 103, 133, 143 |
| abstract_inverted_index.ASGD | 108, 138 |
| abstract_inverted_index.This | 0 |
| abstract_inverted_index.also | 88 |
| abstract_inverted_index.show | 36 |
| abstract_inverted_index.that | 128 |
| abstract_inverted_index.time | 121 |
| abstract_inverted_index.with | 16, 82, 117, 139 |
| abstract_inverted_index.(CLT) | 59 |
| abstract_inverted_index.(GMC) | 28 |
| abstract_inverted_index.(SGD) | 14 |
| abstract_inverted_index.Based | 90 |
| abstract_inverted_index.large | 131 |
| abstract_inverted_index.limit | 57 |
| abstract_inverted_index.paper | 1 |
| abstract_inverted_index.these | 92 |
| abstract_inverted_index.(ASGD) | 81 |
| abstract_inverted_index.linear | 20 |
| abstract_inverted_index.manner | 116 |
| abstract_inverted_index.matrix | 106 |
| abstract_inverted_index.nearly | 141 |
| abstract_inverted_index.online | 7, 100 |
| abstract_inverted_index.theory | 5 |
| abstract_inverted_index.unique | 41 |
| abstract_inverted_index.achieve | 142 |
| abstract_inverted_index.between | 63, 76 |
| abstract_inverted_index.central | 56 |
| abstract_inverted_index.descent | 13 |
| abstract_inverted_index.dropout | 17, 33, 46, 64, 83, 109, 140 |
| abstract_inverted_index.further | 97 |
| abstract_inverted_index.memory. | 123 |
| abstract_inverted_index.nominal | 144 |
| abstract_inverted_index.provide | 54 |
| abstract_inverted_index.averaged | 79 |
| abstract_inverted_index.constant | 30 |
| abstract_inverted_index.coverage | 145 |
| abstract_inverted_index.gradient | 12 |
| abstract_inverted_index.iterates | 15, 34, 86 |
| abstract_inverted_index.long-run | 104 |
| abstract_inverted_index.proposed | 134 |
| abstract_inverted_index.proposes | 2 |
| abstract_inverted_index.quenched | 55 |
| abstract_inverted_index.results, | 95 |
| abstract_inverted_index.samples, | 132 |
| abstract_inverted_index.theorems | 58 |
| abstract_inverted_index.establish | 24 |
| abstract_inverted_index.estimator | 101 |
| abstract_inverted_index.existence | 38 |
| abstract_inverted_index.function. | 48 |
| abstract_inverted_index.inference | 8, 112 |
| abstract_inverted_index.intervals | 136 |
| abstract_inverted_index.introduce | 98 |
| abstract_inverted_index.iterates, | 67 |
| abstract_inverted_index.normality | 94 |
| abstract_inverted_index.numerical | 125 |
| abstract_inverted_index.property, | 52 |
| abstract_inverted_index.recursive | 47, 115 |
| abstract_inverted_index.step-size | 31 |
| abstract_inverted_index.asymptotic | 4, 93 |
| abstract_inverted_index.confidence | 135 |
| abstract_inverted_index.covariance | 105 |
| abstract_inverted_index.difference | 62, 75 |
| abstract_inverted_index.efficiency | 118 |
| abstract_inverted_index.facilitate | 111 |
| abstract_inverted_index.presented. | 89 |
| abstract_inverted_index.regardless | 68 |
| abstract_inverted_index.stationary | 42 |
| abstract_inverted_index.stochastic | 11 |
| abstract_inverted_index.contraction | 27 |
| abstract_inverted_index.demonstrate | 127 |
| abstract_inverted_index.experiments | 126 |
| abstract_inverted_index.regression. | 21 |
| abstract_inverted_index.distribution | 43 |
| abstract_inverted_index.probability. | 146 |
| abstract_inverted_index.sufficiently | 130 |
| abstract_inverted_index.Specifically, | 22 |
| abstract_inverted_index.computational | 120 |
| abstract_inverted_index.Ruppert-Polyak | 78 |
| abstract_inverted_index.regularization | 18 |
| abstract_inverted_index.initialization. | 70 |
| abstract_inverted_index.geometric-moment | 26 |
| abstract_inverted_index.$\ell^2$-regularized | 66, 85 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |