Grokking vs. Learning: Same Features, Different Encodings Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2502.01739
Grokking typically achieves similar loss to ordinary, "steady", learning. We ask whether these different learning paths - grokking versus ordinary training - lead to fundamental differences in the learned models. To do so we compare the features, compressibility, and learning dynamics of models trained via each path in two tasks. We find that grokked and steadily trained models learn the same features, but there can be large differences in the efficiency with which these features are encoded. In particular, we find a novel "compressive regime" of steady training in which there emerges a linear trade-off between model loss and compressibility, and which is absent in grokking. In this regime, we can achieve compression factors 25x times the base model, and 5x times the compression achieved in grokking. We then track how model features and compressibility develop through training. We show that model development in grokking is task-dependent, and that peak compressibility is achieved immediately after the grokking plateau. Finally, novel information-geometric measures are introduced which demonstrate that models undergoing grokking follow a straight path in information space.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2502.01739
- https://arxiv.org/pdf/2502.01739
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4407184786
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4407184786Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2502.01739Digital Object Identifier
- Title
-
Grokking vs. Learning: Same Features, Different EncodingsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-03Full publication date if available
- Authors
-
Dmitry Manning-Coe, Jacopo Gliozzi, Alexander G. Stapleton, Edward Hirst, Giuseppe De Tomasi, Barry Bradlyn, David S. BermanList of authors in order
- Landing page
-
https://arxiv.org/abs/2502.01739Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2502.01739Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2502.01739Direct OA link when available
- Concepts
-
Computer science, Artificial intelligenceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4407184786 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2502.01739 |
| ids.doi | https://doi.org/10.48550/arxiv.2502.01739 |
| ids.openalex | https://openalex.org/W4407184786 |
| fwci | |
| type | preprint |
| title | Grokking vs. Learning: Same Features, Different Encodings |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.5496460199356079 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C154945302 |
| concepts[1].level | 1 |
| concepts[1].score | 0.4064268469810486 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[1].display_name | Artificial intelligence |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.5496460199356079 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[1].score | 0.4064268469810486 |
| keywords[1].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2502.01739 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2502.01739 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2502.01739 |
| locations[1].id | doi:10.48550/arxiv.2502.01739 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2502.01739 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5067312862 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Dmitry Manning-Coe |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Manning-Coe, Dmitry |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5068083761 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Jacopo Gliozzi |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Gliozzi, Jacopo |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5091983021 |
| authorships[2].author.orcid | https://orcid.org/0009-0009-6784-7779 |
| authorships[2].author.display_name | Alexander G. Stapleton |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Stapleton, Alexander G. |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5021626559 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-1699-4399 |
| authorships[3].author.display_name | Edward Hirst |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Hirst, Edward |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5008914714 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Giuseppe De Tomasi |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | De Tomasi, Giuseppe |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5064275485 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-6327-1076 |
| authorships[5].author.display_name | Barry Bradlyn |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Bradlyn, Barry |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5009524117 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-5382-1668 |
| authorships[6].author.display_name | David S. Berman |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Berman, David S. |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2502.01739 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Grokking vs. Learning: Same Features, Different Encodings |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic | |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2502.01739 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2502.01739 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2502.01739 |
| primary_location.id | pmh:oai:arXiv.org:2502.01739 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2502.01739 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2502.01739 |
| publication_date | 2025-02-03 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.- | 16, 21 |
| abstract_inverted_index.a | 81, 92, 171 |
| abstract_inverted_index.5x | 120 |
| abstract_inverted_index.In | 77, 106 |
| abstract_inverted_index.To | 30 |
| abstract_inverted_index.We | 9, 50, 127, 138 |
| abstract_inverted_index.be | 65 |
| abstract_inverted_index.do | 31 |
| abstract_inverted_index.in | 26, 47, 68, 88, 104, 125, 143, 174 |
| abstract_inverted_index.is | 102, 145, 151 |
| abstract_inverted_index.of | 41, 85 |
| abstract_inverted_index.so | 32 |
| abstract_inverted_index.to | 5, 23 |
| abstract_inverted_index.we | 33, 79, 109 |
| abstract_inverted_index.25x | 114 |
| abstract_inverted_index.and | 38, 54, 98, 100, 119, 133, 147 |
| abstract_inverted_index.are | 75, 162 |
| abstract_inverted_index.ask | 10 |
| abstract_inverted_index.but | 62 |
| abstract_inverted_index.can | 64, 110 |
| abstract_inverted_index.how | 130 |
| abstract_inverted_index.the | 27, 35, 59, 69, 116, 122, 155 |
| abstract_inverted_index.two | 48 |
| abstract_inverted_index.via | 44 |
| abstract_inverted_index.base | 117 |
| abstract_inverted_index.each | 45 |
| abstract_inverted_index.find | 51, 80 |
| abstract_inverted_index.lead | 22 |
| abstract_inverted_index.loss | 4, 97 |
| abstract_inverted_index.path | 46, 173 |
| abstract_inverted_index.peak | 149 |
| abstract_inverted_index.same | 60 |
| abstract_inverted_index.show | 139 |
| abstract_inverted_index.that | 52, 140, 148, 166 |
| abstract_inverted_index.then | 128 |
| abstract_inverted_index.this | 107 |
| abstract_inverted_index.with | 71 |
| abstract_inverted_index.after | 154 |
| abstract_inverted_index.large | 66 |
| abstract_inverted_index.learn | 58 |
| abstract_inverted_index.model | 96, 131, 141 |
| abstract_inverted_index.novel | 82, 159 |
| abstract_inverted_index.paths | 15 |
| abstract_inverted_index.there | 63, 90 |
| abstract_inverted_index.these | 12, 73 |
| abstract_inverted_index.times | 115, 121 |
| abstract_inverted_index.track | 129 |
| abstract_inverted_index.which | 72, 89, 101, 164 |
| abstract_inverted_index.absent | 103 |
| abstract_inverted_index.follow | 170 |
| abstract_inverted_index.linear | 93 |
| abstract_inverted_index.model, | 118 |
| abstract_inverted_index.models | 42, 57, 167 |
| abstract_inverted_index.space. | 176 |
| abstract_inverted_index.steady | 86 |
| abstract_inverted_index.tasks. | 49 |
| abstract_inverted_index.versus | 18 |
| abstract_inverted_index.achieve | 111 |
| abstract_inverted_index.between | 95 |
| abstract_inverted_index.compare | 34 |
| abstract_inverted_index.develop | 135 |
| abstract_inverted_index.emerges | 91 |
| abstract_inverted_index.factors | 113 |
| abstract_inverted_index.grokked | 53 |
| abstract_inverted_index.learned | 28 |
| abstract_inverted_index.models. | 29 |
| abstract_inverted_index.regime" | 84 |
| abstract_inverted_index.regime, | 108 |
| abstract_inverted_index.similar | 3 |
| abstract_inverted_index.through | 136 |
| abstract_inverted_index.trained | 43, 56 |
| abstract_inverted_index.whether | 11 |
| abstract_inverted_index.Finally, | 158 |
| abstract_inverted_index.Grokking | 0 |
| abstract_inverted_index.achieved | 124, 152 |
| abstract_inverted_index.achieves | 2 |
| abstract_inverted_index.dynamics | 40 |
| abstract_inverted_index.encoded. | 76 |
| abstract_inverted_index.features | 74, 132 |
| abstract_inverted_index.grokking | 17, 144, 156, 169 |
| abstract_inverted_index.learning | 14, 39 |
| abstract_inverted_index.measures | 161 |
| abstract_inverted_index.ordinary | 19 |
| abstract_inverted_index.plateau. | 157 |
| abstract_inverted_index.steadily | 55 |
| abstract_inverted_index.straight | 172 |
| abstract_inverted_index.training | 20, 87 |
| abstract_inverted_index."steady", | 7 |
| abstract_inverted_index.different | 13 |
| abstract_inverted_index.features, | 36, 61 |
| abstract_inverted_index.grokking. | 105, 126 |
| abstract_inverted_index.learning. | 8 |
| abstract_inverted_index.ordinary, | 6 |
| abstract_inverted_index.trade-off | 94 |
| abstract_inverted_index.training. | 137 |
| abstract_inverted_index.typically | 1 |
| abstract_inverted_index.efficiency | 70 |
| abstract_inverted_index.introduced | 163 |
| abstract_inverted_index.undergoing | 168 |
| abstract_inverted_index.compression | 112, 123 |
| abstract_inverted_index.demonstrate | 165 |
| abstract_inverted_index.development | 142 |
| abstract_inverted_index.differences | 25, 67 |
| abstract_inverted_index.fundamental | 24 |
| abstract_inverted_index.immediately | 153 |
| abstract_inverted_index.information | 175 |
| abstract_inverted_index.particular, | 78 |
| abstract_inverted_index."compressive | 83 |
| abstract_inverted_index.compressibility | 134, 150 |
| abstract_inverted_index.task-dependent, | 146 |
| abstract_inverted_index.compressibility, | 37, 99 |
| abstract_inverted_index.information-geometric | 160 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |