Diffusion Beats Autoregressive in Data-Constrained Settings Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2507.15857
Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings where training involves repeated passes over limited data and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior downstream performance. We find new scaling laws for diffusion models and derive a closed-form expression for the critical compute threshold at which diffusion begins to outperform AR. Finally, we explain why diffusion models excel in this regime: their randomized masking objective implicitly trains over a rich distribution of token orderings, acting as an implicit data augmentation that AR's fixed left-to-right factorization lacks. Our results suggest that when data, not compute, is the bottleneck, diffusion models offer a compelling alternative to the standard AR paradigm. Our code is available at: https://diffusion-scaling.github.io.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2507.15857
- https://arxiv.org/pdf/2507.15857
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415202908
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415202908Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2507.15857Digital Object Identifier
- Title
-
Diffusion Beats Autoregressive in Data-Constrained SettingsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-07-21Full publication date if available
- Authors
-
Mihir Prabhudesai, Margaret Wu, Amir Hassan Zadeh, Katerina Fragkiadaki, Deepak PathakList of authors in order
- Landing page
-
https://arxiv.org/abs/2507.15857Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2507.15857Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2507.15857Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415202908 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2507.15857 |
| ids.doi | https://doi.org/10.48550/arxiv.2507.15857 |
| ids.openalex | https://openalex.org/W4415202908 |
| fwci | |
| type | preprint |
| title | Diffusion Beats Autoregressive in Data-Constrained Settings |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10320 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.0786999985575676 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Neural Networks and Applications |
| topics[1].id | https://openalex.org/T11206 |
| topics[1].field.id | https://openalex.org/fields/31 |
| topics[1].field.display_name | Physics and Astronomy |
| topics[1].score | 0.07129999995231628 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/3109 |
| topics[1].subfield.display_name | Statistical and Nonlinear Physics |
| topics[1].display_name | Model Reduction and Neural Networks |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2507.15857 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2507.15857 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2507.15857 |
| locations[1].id | doi:10.48550/arxiv.2507.15857 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2507.15857 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5070975297 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Mihir Prabhudesai |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Prabhudesai, Mihir |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5035469955 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Margaret Wu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wu, Mengning |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5016675768 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-3171-5629 |
| authorships[2].author.display_name | Amir Hassan Zadeh |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zadeh, Amir |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5008661738 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Katerina Fragkiadaki |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Fragkiadaki, Katerina |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5101851026 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-2496-0690 |
| authorships[4].author.display_name | Deepak Pathak |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Pathak, Deepak |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2507.15857 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-16T00:00:00 |
| display_name | Diffusion Beats Autoregressive in Data-Constrained Settings |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10320 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.0786999985575676 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Neural Networks and Applications |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2507.15857 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2507.15857 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2507.15857 |
| primary_location.id | pmh:oai:arXiv.org:2507.15857 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2507.15857 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2507.15857 |
| publication_date | 2025-07-21 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 15, 27, 100, 132, 164 |
| abstract_inverted_index.AR | 34, 64, 170 |
| abstract_inverted_index.In | 38 |
| abstract_inverted_index.We | 90 |
| abstract_inverted_index.an | 140 |
| abstract_inverted_index.as | 26, 139 |
| abstract_inverted_index.at | 108 |
| abstract_inverted_index.in | 47, 122 |
| abstract_inverted_index.is | 68, 72, 158, 174 |
| abstract_inverted_index.of | 8, 18, 79, 135 |
| abstract_inverted_index.to | 112, 167 |
| abstract_inverted_index.we | 41, 116 |
| abstract_inverted_index.AR. | 114 |
| abstract_inverted_index.Our | 150, 172 |
| abstract_inverted_index.and | 58, 86, 98 |
| abstract_inverted_index.at: | 176 |
| abstract_inverted_index.but | 70 |
| abstract_inverted_index.for | 95, 103 |
| abstract_inverted_index.new | 92 |
| abstract_inverted_index.not | 156 |
| abstract_inverted_index.the | 6, 104, 159, 168 |
| abstract_inverted_index.use | 78 |
| abstract_inverted_index.why | 118 |
| abstract_inverted_index.(AR) | 1 |
| abstract_inverted_index.AR's | 145 |
| abstract_inverted_index.code | 173 |
| abstract_inverted_index.data | 57, 71, 142 |
| abstract_inverted_index.find | 59, 91 |
| abstract_inverted_index.have | 3, 24 |
| abstract_inverted_index.laws | 94 |
| abstract_inverted_index.long | 4 |
| abstract_inverted_index.loss | 85 |
| abstract_inverted_index.make | 76 |
| abstract_inverted_index.over | 33, 55, 131 |
| abstract_inverted_index.rich | 133 |
| abstract_inverted_index.that | 60, 144, 153 |
| abstract_inverted_index.they | 61 |
| abstract_inverted_index.this | 39, 123 |
| abstract_inverted_index.when | 66, 154 |
| abstract_inverted_index.wide | 16 |
| abstract_inverted_index.data, | 81, 155 |
| abstract_inverted_index.excel | 121 |
| abstract_inverted_index.fixed | 146 |
| abstract_inverted_index.large | 9 |
| abstract_inverted_index.lower | 83 |
| abstract_inverted_index.offer | 163 |
| abstract_inverted_index.range | 17 |
| abstract_inverted_index.study | 43 |
| abstract_inverted_index.their | 31, 125 |
| abstract_inverted_index.token | 136 |
| abstract_inverted_index.where | 50 |
| abstract_inverted_index.which | 109 |
| abstract_inverted_index.across | 14 |
| abstract_inverted_index.acting | 138 |
| abstract_inverted_index.begins | 111 |
| abstract_inverted_index.better | 77 |
| abstract_inverted_index.derive | 99 |
| abstract_inverted_index.lacks. | 149 |
| abstract_inverted_index.masked | 44 |
| abstract_inverted_index.models | 2, 23, 35, 46, 65, 75, 97, 120, 162 |
| abstract_inverted_index.paper, | 40 |
| abstract_inverted_index.passes | 54 |
| abstract_inverted_index.remain | 36 |
| abstract_inverted_index.tasks. | 19 |
| abstract_inverted_index.though | 30 |
| abstract_inverted_index.trains | 130 |
| abstract_inverted_index.compute | 67, 106 |
| abstract_inverted_index.driving | 12 |
| abstract_inverted_index.emerged | 25 |
| abstract_inverted_index.explain | 117 |
| abstract_inverted_index.limited | 56 |
| abstract_inverted_index.masking | 127 |
| abstract_inverted_index.models, | 11 |
| abstract_inverted_index.regime: | 124 |
| abstract_inverted_index.results | 151 |
| abstract_inverted_index.scaling | 93 |
| abstract_inverted_index.scarce. | 73 |
| abstract_inverted_index.suggest | 152 |
| abstract_inverted_index.Finally, | 115 |
| abstract_inverted_index.abundant | 69 |
| abstract_inverted_index.compute, | 157 |
| abstract_inverted_index.critical | 105 |
| abstract_inverted_index.implicit | 141 |
| abstract_inverted_index.involves | 52 |
| abstract_inverted_index.language | 10, 22 |
| abstract_inverted_index.progress | 13 |
| abstract_inverted_index.repeated | 53, 80 |
| abstract_inverted_index.settings | 49 |
| abstract_inverted_index.standard | 169 |
| abstract_inverted_index.superior | 87 |
| abstract_inverted_index.training | 51 |
| abstract_inverted_index.Diffusion | 74 |
| abstract_inverted_index.Recently, | 20 |
| abstract_inverted_index.achieving | 82 |
| abstract_inverted_index.available | 175 |
| abstract_inverted_index.diffusion | 45, 96, 110, 119, 161 |
| abstract_inverted_index.dominated | 5 |
| abstract_inverted_index.landscape | 7 |
| abstract_inverted_index.objective | 128 |
| abstract_inverted_index.paradigm. | 171 |
| abstract_inverted_index.promising | 28 |
| abstract_inverted_index.threshold | 107 |
| abstract_inverted_index.advantages | 32 |
| abstract_inverted_index.compelling | 165 |
| abstract_inverted_index.downstream | 88 |
| abstract_inverted_index.expression | 102 |
| abstract_inverted_index.implicitly | 129 |
| abstract_inverted_index.orderings, | 137 |
| abstract_inverted_index.outperform | 63, 113 |
| abstract_inverted_index.randomized | 126 |
| abstract_inverted_index.validation | 84 |
| abstract_inverted_index.alternative | 166 |
| abstract_inverted_index.bottleneck, | 160 |
| abstract_inverted_index.closed-form | 101 |
| abstract_inverted_index.alternative, | 29 |
| abstract_inverted_index.augmentation | 143 |
| abstract_inverted_index.distribution | 134 |
| abstract_inverted_index.performance. | 89 |
| abstract_inverted_index.factorization | 148 |
| abstract_inverted_index.left-to-right | 147 |
| abstract_inverted_index.significantly | 62 |
| abstract_inverted_index.Autoregressive | 0 |
| abstract_inverted_index.systematically | 42 |
| abstract_inverted_index.underexplored. | 37 |
| abstract_inverted_index.diffusion-based | 21 |
| abstract_inverted_index.data-constrained | 48 |
| abstract_inverted_index.https://diffusion-scaling.github.io. | 177 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |