Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2509.17738
Neural collapse, i.e., the emergence of highly symmetric, class-wise clustered representations, is frequently observed in deep networks and is often assumed to reflect or enable generalization. In parallel, flatness of the loss landscape has been theoretically and empirically linked to generalization. Yet, the causal role of either phenomenon remains unclear: Are they prerequisites for generalization, or merely by-products of training dynamics? We disentangle these questions using grokking, a training regime in which memorization precedes generalization, allowing us to temporally separate generalization from training dynamics and we find that while both neural collapse and relative flatness emerge near the onset of generalization, only flatness consistently predicts it. Models encouraged to collapse or prevented from collapsing generalize equally well, whereas models regularized away from flat solutions exhibit delayed generalization, resembling grokking, even in architectures and datasets where it does not typically occur. Furthermore, we show theoretically that neural collapse leads to relative flatness under classical assumptions, explaining their empirical co-occurrence. Our results support the view that relative flatness is a potentially necessary and more fundamental property for generalization, and demonstrate how grokking can serve as a powerful probe for isolating its geometric underpinnings.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2509.17738
- https://arxiv.org/pdf/2509.17738
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415255117
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415255117Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2509.17738Digital Object Identifier
- Title
-
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via GrokkingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-09-22Full publication date if available
- Authors
-
Ting Han, Linara Adilova, Henning Petzka, Jens Kleesiek, Michael KampList of authors in order
- Landing page
-
https://arxiv.org/abs/2509.17738Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2509.17738Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2509.17738Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415255117 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2509.17738 |
| ids.doi | https://doi.org/10.48550/arxiv.2509.17738 |
| ids.openalex | https://openalex.org/W4415255117 |
| fwci | |
| type | preprint |
| title | Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10320 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.2556000053882599 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Neural Networks and Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2509.17738 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2509.17738 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2509.17738 |
| locations[1].id | doi:10.48550/arxiv.2509.17738 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2509.17738 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5103927609 |
| authorships[0].author.orcid | https://orcid.org/0009-0008-3069-5217 |
| authorships[0].author.display_name | Ting Han |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Han, Ting |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5040989782 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-0689-6138 |
| authorships[1].author.display_name | Linara Adilova |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Adilova, Linara |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5086123147 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-6162-8526 |
| authorships[2].author.display_name | Henning Petzka |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Petzka, Henning |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5017161970 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-8686-0682 |
| authorships[3].author.display_name | Jens Kleesiek |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Kleesiek, Jens |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5080777994 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-6231-0694 |
| authorships[4].author.display_name | Michael Kamp |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Kamp, Michael |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2509.17738 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-16T00:00:00 |
| display_name | Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10320 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.2556000053882599 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Neural Networks and Applications |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2509.17738 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2509.17738 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2509.17738 |
| primary_location.id | pmh:oai:arXiv.org:2509.17738 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2509.17738 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2509.17738 |
| publication_date | 2025-09-22 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 67, 167, 183 |
| abstract_inverted_index.In | 26 |
| abstract_inverted_index.We | 61 |
| abstract_inverted_index.as | 182 |
| abstract_inverted_index.in | 14, 70, 130 |
| abstract_inverted_index.is | 11, 18, 166 |
| abstract_inverted_index.it | 135 |
| abstract_inverted_index.of | 5, 29, 45, 58, 99 |
| abstract_inverted_index.or | 23, 55, 110 |
| abstract_inverted_index.to | 21, 39, 77, 108, 148 |
| abstract_inverted_index.us | 76 |
| abstract_inverted_index.we | 85, 141 |
| abstract_inverted_index.Are | 50 |
| abstract_inverted_index.Our | 158 |
| abstract_inverted_index.and | 17, 36, 84, 92, 132, 170, 176 |
| abstract_inverted_index.can | 180 |
| abstract_inverted_index.for | 53, 174, 186 |
| abstract_inverted_index.has | 33 |
| abstract_inverted_index.how | 178 |
| abstract_inverted_index.it. | 105 |
| abstract_inverted_index.its | 188 |
| abstract_inverted_index.not | 137 |
| abstract_inverted_index.the | 3, 30, 42, 97, 161 |
| abstract_inverted_index.Yet, | 41 |
| abstract_inverted_index.away | 120 |
| abstract_inverted_index.been | 34 |
| abstract_inverted_index.both | 89 |
| abstract_inverted_index.deep | 15 |
| abstract_inverted_index.does | 136 |
| abstract_inverted_index.even | 129 |
| abstract_inverted_index.find | 86 |
| abstract_inverted_index.flat | 122 |
| abstract_inverted_index.from | 81, 112, 121 |
| abstract_inverted_index.loss | 31 |
| abstract_inverted_index.more | 171 |
| abstract_inverted_index.near | 96 |
| abstract_inverted_index.only | 101 |
| abstract_inverted_index.role | 44 |
| abstract_inverted_index.show | 142 |
| abstract_inverted_index.that | 87, 144, 163 |
| abstract_inverted_index.they | 51 |
| abstract_inverted_index.view | 162 |
| abstract_inverted_index.i.e., | 2 |
| abstract_inverted_index.leads | 147 |
| abstract_inverted_index.often | 19 |
| abstract_inverted_index.onset | 98 |
| abstract_inverted_index.probe | 185 |
| abstract_inverted_index.serve | 181 |
| abstract_inverted_index.their | 155 |
| abstract_inverted_index.these | 63 |
| abstract_inverted_index.under | 151 |
| abstract_inverted_index.using | 65 |
| abstract_inverted_index.well, | 116 |
| abstract_inverted_index.where | 134 |
| abstract_inverted_index.which | 71 |
| abstract_inverted_index.while | 88 |
| abstract_inverted_index.Models | 106 |
| abstract_inverted_index.Neural | 0 |
| abstract_inverted_index.causal | 43 |
| abstract_inverted_index.either | 46 |
| abstract_inverted_index.emerge | 95 |
| abstract_inverted_index.enable | 24 |
| abstract_inverted_index.highly | 6 |
| abstract_inverted_index.linked | 38 |
| abstract_inverted_index.merely | 56 |
| abstract_inverted_index.models | 118 |
| abstract_inverted_index.neural | 90, 145 |
| abstract_inverted_index.occur. | 139 |
| abstract_inverted_index.regime | 69 |
| abstract_inverted_index.assumed | 20 |
| abstract_inverted_index.delayed | 125 |
| abstract_inverted_index.equally | 115 |
| abstract_inverted_index.exhibit | 124 |
| abstract_inverted_index.reflect | 22 |
| abstract_inverted_index.remains | 48 |
| abstract_inverted_index.results | 159 |
| abstract_inverted_index.support | 160 |
| abstract_inverted_index.whereas | 117 |
| abstract_inverted_index.allowing | 75 |
| abstract_inverted_index.collapse | 91, 109, 146 |
| abstract_inverted_index.datasets | 133 |
| abstract_inverted_index.dynamics | 83 |
| abstract_inverted_index.flatness | 28, 94, 102, 150, 165 |
| abstract_inverted_index.grokking | 179 |
| abstract_inverted_index.networks | 16 |
| abstract_inverted_index.observed | 13 |
| abstract_inverted_index.powerful | 184 |
| abstract_inverted_index.precedes | 73 |
| abstract_inverted_index.predicts | 104 |
| abstract_inverted_index.property | 173 |
| abstract_inverted_index.relative | 93, 149, 164 |
| abstract_inverted_index.separate | 79 |
| abstract_inverted_index.training | 59, 68, 82 |
| abstract_inverted_index.unclear: | 49 |
| abstract_inverted_index.classical | 152 |
| abstract_inverted_index.clustered | 9 |
| abstract_inverted_index.collapse, | 1 |
| abstract_inverted_index.dynamics? | 60 |
| abstract_inverted_index.emergence | 4 |
| abstract_inverted_index.empirical | 156 |
| abstract_inverted_index.geometric | 189 |
| abstract_inverted_index.grokking, | 66, 128 |
| abstract_inverted_index.isolating | 187 |
| abstract_inverted_index.landscape | 32 |
| abstract_inverted_index.necessary | 169 |
| abstract_inverted_index.parallel, | 27 |
| abstract_inverted_index.prevented | 111 |
| abstract_inverted_index.questions | 64 |
| abstract_inverted_index.solutions | 123 |
| abstract_inverted_index.typically | 138 |
| abstract_inverted_index.class-wise | 8 |
| abstract_inverted_index.collapsing | 113 |
| abstract_inverted_index.encouraged | 107 |
| abstract_inverted_index.explaining | 154 |
| abstract_inverted_index.frequently | 12 |
| abstract_inverted_index.generalize | 114 |
| abstract_inverted_index.phenomenon | 47 |
| abstract_inverted_index.resembling | 127 |
| abstract_inverted_index.symmetric, | 7 |
| abstract_inverted_index.temporally | 78 |
| abstract_inverted_index.by-products | 57 |
| abstract_inverted_index.demonstrate | 177 |
| abstract_inverted_index.disentangle | 62 |
| abstract_inverted_index.empirically | 37 |
| abstract_inverted_index.fundamental | 172 |
| abstract_inverted_index.potentially | 168 |
| abstract_inverted_index.regularized | 119 |
| abstract_inverted_index.Furthermore, | 140 |
| abstract_inverted_index.assumptions, | 153 |
| abstract_inverted_index.consistently | 103 |
| abstract_inverted_index.memorization | 72 |
| abstract_inverted_index.architectures | 131 |
| abstract_inverted_index.prerequisites | 52 |
| abstract_inverted_index.theoretically | 35, 143 |
| abstract_inverted_index.co-occurrence. | 157 |
| abstract_inverted_index.generalization | 80 |
| abstract_inverted_index.underpinnings. | 190 |
| abstract_inverted_index.generalization, | 54, 74, 100, 126, 175 |
| abstract_inverted_index.generalization. | 25, 40 |
| abstract_inverted_index.representations, | 10 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |