Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace Article Swipe
Yucong Liu
,
Shixing Yu
,
Tong Lin
·
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2208.05924
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2208.05924
In this paper, we develop a novel regularization method for deep neural networks by penalizing the trace of Hessian. This regularizer is motivated by a recent guarantee bound of the generalization error. We explain its benefits in finding flat minima and avoiding Lyapunov stability in dynamical systems. We adopt the Hutchinson method as a classical unbiased estimator for the trace of a matrix and further accelerate its calculation using a dropout scheme. Experiments demonstrate that our method outperforms existing regularizers and data augmentation methods, such as Jacobian, Confidence Penalty, Label Smoothing, Cutout, and Mixup.
Related Topics
Concepts
Hessian matrix
Estimator
TRACE (psycholinguistics)
Maxima and minima
Smoothing
Regularization (linguistics)
Computer science
Artificial neural network
Generalization
Jacobian matrix and determinant
Mathematical optimization
Applied mathematics
Algorithm
Mathematics
Artificial intelligence
Mathematical analysis
Linguistics
Computer vision
Statistics
Philosophy
Metadata
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2208.05924
- https://arxiv.org/pdf/2208.05924
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4291238339
All OpenAlex metadata
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4291238339Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2208.05924Digital Object Identifier
- Title
-
Regularizing Deep Neural Networks with Stochastic Estimators of Hessian TraceWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-08-11Full publication date if available
- Authors
-
Yucong Liu, Shixing Yu, Tong LinList of authors in order
- Landing page
-
https://arxiv.org/abs/2208.05924Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2208.05924Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2208.05924Direct OA link when available
- Concepts
-
Hessian matrix, Estimator, TRACE (psycholinguistics), Maxima and minima, Smoothing, Regularization (linguistics), Computer science, Artificial neural network, Generalization, Jacobian matrix and determinant, Mathematical optimization, Applied mathematics, Algorithm, Mathematics, Artificial intelligence, Mathematical analysis, Linguistics, Computer vision, Statistics, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4291238339 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2208.05924 |
| ids.doi | https://doi.org/10.48550/arxiv.2208.05924 |
| ids.openalex | https://openalex.org/W4291238339 |
| fwci | |
| type | preprint |
| title | Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11206 |
| topics[0].field.id | https://openalex.org/fields/31 |
| topics[0].field.display_name | Physics and Astronomy |
| topics[0].score | 0.9988999962806702 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/3109 |
| topics[0].subfield.display_name | Statistical and Nonlinear Physics |
| topics[0].display_name | Model Reduction and Neural Networks |
| topics[1].id | https://openalex.org/T11612 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9973999857902527 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Stochastic Gradient Optimization Techniques |
| topics[2].id | https://openalex.org/T10500 |
| topics[2].field.id | https://openalex.org/fields/22 |
| topics[2].field.display_name | Engineering |
| topics[2].score | 0.996399998664856 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2206 |
| topics[2].subfield.display_name | Computational Mechanics |
| topics[2].display_name | Sparse and Compressive Sensing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C203616005 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8976180553436279 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q620495 |
| concepts[0].display_name | Hessian matrix |
| concepts[1].id | https://openalex.org/C185429906 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6502673029899597 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1130160 |
| concepts[1].display_name | Estimator |
| concepts[2].id | https://openalex.org/C75291252 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6490644216537476 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1315756 |
| concepts[2].display_name | TRACE (psycholinguistics) |
| concepts[3].id | https://openalex.org/C186633575 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6135305762290955 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q845060 |
| concepts[3].display_name | Maxima and minima |
| concepts[4].id | https://openalex.org/C3770464 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5465686917304993 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q775963 |
| concepts[4].display_name | Smoothing |
| concepts[5].id | https://openalex.org/C2776135515 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5354077816009521 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q17143721 |
| concepts[5].display_name | Regularization (linguistics) |
| concepts[6].id | https://openalex.org/C41008148 |
| concepts[6].level | 0 |
| concepts[6].score | 0.529630720615387 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[6].display_name | Computer science |
| concepts[7].id | https://openalex.org/C50644808 |
| concepts[7].level | 2 |
| concepts[7].score | 0.5219835042953491 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[7].display_name | Artificial neural network |
| concepts[8].id | https://openalex.org/C177148314 |
| concepts[8].level | 2 |
| concepts[8].score | 0.5112767219543457 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q170084 |
| concepts[8].display_name | Generalization |
| concepts[9].id | https://openalex.org/C200331156 |
| concepts[9].level | 2 |
| concepts[9].score | 0.46678102016448975 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q506041 |
| concepts[9].display_name | Jacobian matrix and determinant |
| concepts[10].id | https://openalex.org/C126255220 |
| concepts[10].level | 1 |
| concepts[10].score | 0.42786240577697754 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[10].display_name | Mathematical optimization |
| concepts[11].id | https://openalex.org/C28826006 |
| concepts[11].level | 1 |
| concepts[11].score | 0.40243685245513916 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q33521 |
| concepts[11].display_name | Applied mathematics |
| concepts[12].id | https://openalex.org/C11413529 |
| concepts[12].level | 1 |
| concepts[12].score | 0.3488650321960449 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[12].display_name | Algorithm |
| concepts[13].id | https://openalex.org/C33923547 |
| concepts[13].level | 0 |
| concepts[13].score | 0.32431337237358093 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[13].display_name | Mathematics |
| concepts[14].id | https://openalex.org/C154945302 |
| concepts[14].level | 1 |
| concepts[14].score | 0.3062822222709656 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[14].display_name | Artificial intelligence |
| concepts[15].id | https://openalex.org/C134306372 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[15].display_name | Mathematical analysis |
| concepts[16].id | https://openalex.org/C41895202 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[16].display_name | Linguistics |
| concepts[17].id | https://openalex.org/C31972630 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[17].display_name | Computer vision |
| concepts[18].id | https://openalex.org/C105795698 |
| concepts[18].level | 1 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[18].display_name | Statistics |
| concepts[19].id | https://openalex.org/C138885662 |
| concepts[19].level | 0 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[19].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/hessian-matrix |
| keywords[0].score | 0.8976180553436279 |
| keywords[0].display_name | Hessian matrix |
| keywords[1].id | https://openalex.org/keywords/estimator |
| keywords[1].score | 0.6502673029899597 |
| keywords[1].display_name | Estimator |
| keywords[2].id | https://openalex.org/keywords/trace |
| keywords[2].score | 0.6490644216537476 |
| keywords[2].display_name | TRACE (psycholinguistics) |
| keywords[3].id | https://openalex.org/keywords/maxima-and-minima |
| keywords[3].score | 0.6135305762290955 |
| keywords[3].display_name | Maxima and minima |
| keywords[4].id | https://openalex.org/keywords/smoothing |
| keywords[4].score | 0.5465686917304993 |
| keywords[4].display_name | Smoothing |
| keywords[5].id | https://openalex.org/keywords/regularization |
| keywords[5].score | 0.5354077816009521 |
| keywords[5].display_name | Regularization (linguistics) |
| keywords[6].id | https://openalex.org/keywords/computer-science |
| keywords[6].score | 0.529630720615387 |
| keywords[6].display_name | Computer science |
| keywords[7].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[7].score | 0.5219835042953491 |
| keywords[7].display_name | Artificial neural network |
| keywords[8].id | https://openalex.org/keywords/generalization |
| keywords[8].score | 0.5112767219543457 |
| keywords[8].display_name | Generalization |
| keywords[9].id | https://openalex.org/keywords/jacobian-matrix-and-determinant |
| keywords[9].score | 0.46678102016448975 |
| keywords[9].display_name | Jacobian matrix and determinant |
| keywords[10].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[10].score | 0.42786240577697754 |
| keywords[10].display_name | Mathematical optimization |
| keywords[11].id | https://openalex.org/keywords/applied-mathematics |
| keywords[11].score | 0.40243685245513916 |
| keywords[11].display_name | Applied mathematics |
| keywords[12].id | https://openalex.org/keywords/algorithm |
| keywords[12].score | 0.3488650321960449 |
| keywords[12].display_name | Algorithm |
| keywords[13].id | https://openalex.org/keywords/mathematics |
| keywords[13].score | 0.32431337237358093 |
| keywords[13].display_name | Mathematics |
| keywords[14].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[14].score | 0.3062822222709656 |
| keywords[14].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2208.05924 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2208.05924 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2208.05924 |
| locations[1].id | doi:10.48550/arxiv.2208.05924 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2208.05924 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5025152292 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-3753-9670 |
| authorships[0].author.display_name | Yucong Liu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Liu, Yucong |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5008881455 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-4335-7520 |
| authorships[1].author.display_name | Shixing Yu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yu, Shixing |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100769664 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-7186-7045 |
| authorships[2].author.display_name | Tong Lin |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Lin, Tong |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2208.05924 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2022-08-13T00:00:00 |
| display_name | Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11206 |
| primary_topic.field.id | https://openalex.org/fields/31 |
| primary_topic.field.display_name | Physics and Astronomy |
| primary_topic.score | 0.9988999962806702 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/3109 |
| primary_topic.subfield.display_name | Statistical and Nonlinear Physics |
| primary_topic.display_name | Model Reduction and Neural Networks |
| related_works | https://openalex.org/W2375684291, https://openalex.org/W2611031068, https://openalex.org/W2350095335, https://openalex.org/W2544528198, https://openalex.org/W2153649672, https://openalex.org/W2800988248, https://openalex.org/W4213275102, https://openalex.org/W2051752773, https://openalex.org/W4385064145, https://openalex.org/W2151138761 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2208.05924 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2208.05924 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2208.05924 |
| primary_location.id | pmh:oai:arXiv.org:2208.05924 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2208.05924 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2208.05924 |
| publication_date | 2022-08-11 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 5, 24, 53, 61, 69 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.We | 32, 47 |
| abstract_inverted_index.as | 52, 85 |
| abstract_inverted_index.by | 13, 23 |
| abstract_inverted_index.in | 36, 44 |
| abstract_inverted_index.is | 21 |
| abstract_inverted_index.of | 17, 28, 60 |
| abstract_inverted_index.we | 3 |
| abstract_inverted_index.and | 40, 63, 80, 92 |
| abstract_inverted_index.for | 9, 57 |
| abstract_inverted_index.its | 34, 66 |
| abstract_inverted_index.our | 75 |
| abstract_inverted_index.the | 15, 29, 49, 58 |
| abstract_inverted_index.This | 19 |
| abstract_inverted_index.data | 81 |
| abstract_inverted_index.deep | 10 |
| abstract_inverted_index.flat | 38 |
| abstract_inverted_index.such | 84 |
| abstract_inverted_index.that | 74 |
| abstract_inverted_index.this | 1 |
| abstract_inverted_index.Label | 89 |
| abstract_inverted_index.adopt | 48 |
| abstract_inverted_index.bound | 27 |
| abstract_inverted_index.novel | 6 |
| abstract_inverted_index.trace | 16, 59 |
| abstract_inverted_index.using | 68 |
| abstract_inverted_index.Mixup. | 93 |
| abstract_inverted_index.error. | 31 |
| abstract_inverted_index.matrix | 62 |
| abstract_inverted_index.method | 8, 51, 76 |
| abstract_inverted_index.minima | 39 |
| abstract_inverted_index.neural | 11 |
| abstract_inverted_index.paper, | 2 |
| abstract_inverted_index.recent | 25 |
| abstract_inverted_index.Cutout, | 91 |
| abstract_inverted_index.develop | 4 |
| abstract_inverted_index.dropout | 70 |
| abstract_inverted_index.explain | 33 |
| abstract_inverted_index.finding | 37 |
| abstract_inverted_index.further | 64 |
| abstract_inverted_index.scheme. | 71 |
| abstract_inverted_index.Hessian. | 18 |
| abstract_inverted_index.Lyapunov | 42 |
| abstract_inverted_index.Penalty, | 88 |
| abstract_inverted_index.avoiding | 41 |
| abstract_inverted_index.benefits | 35 |
| abstract_inverted_index.existing | 78 |
| abstract_inverted_index.methods, | 83 |
| abstract_inverted_index.networks | 12 |
| abstract_inverted_index.systems. | 46 |
| abstract_inverted_index.unbiased | 55 |
| abstract_inverted_index.Jacobian, | 86 |
| abstract_inverted_index.classical | 54 |
| abstract_inverted_index.dynamical | 45 |
| abstract_inverted_index.estimator | 56 |
| abstract_inverted_index.guarantee | 26 |
| abstract_inverted_index.motivated | 22 |
| abstract_inverted_index.stability | 43 |
| abstract_inverted_index.Confidence | 87 |
| abstract_inverted_index.Hutchinson | 50 |
| abstract_inverted_index.Smoothing, | 90 |
| abstract_inverted_index.accelerate | 65 |
| abstract_inverted_index.penalizing | 14 |
| abstract_inverted_index.Experiments | 72 |
| abstract_inverted_index.calculation | 67 |
| abstract_inverted_index.demonstrate | 73 |
| abstract_inverted_index.outperforms | 77 |
| abstract_inverted_index.regularizer | 20 |
| abstract_inverted_index.augmentation | 82 |
| abstract_inverted_index.regularizers | 79 |
| abstract_inverted_index.generalization | 30 |
| abstract_inverted_index.regularization | 7 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.550000011920929 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |