Symbolic-Diffusion: Deep Learning Based Symbolic Regression with D3PM Discrete Token Diffusion Article Swipe

PDF

Ryan T. Tymkow , Benjamin Schnapp , Mojtaba Valipour , Ali Ghodshi ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2510.07570

Symbolic regression refers to the task of finding a closed-form mathematical expression to fit a set of data points. Genetic programming based techniques are the most common algorithms used to tackle this problem, but recently, neural-network based approaches have gained popularity. Most of the leading neural-network based models used for symbolic regression utilize transformer-based autoregressive models to generate an equation conditioned on encoded input points. However, autoregressive generation is limited to generating tokens left-to-right, and future generated tokens are conditioned only on previously generated tokens. Motivated by the desire to generate all tokens simultaneously to produce improved closed-form equations, we propose Symbolic Diffusion, a D3PM based discrete state-space diffusion model which simultaneously generates all tokens of the equation at once using discrete token diffusion. Using the bivariate dataset developed for SymbolicGPT, we compared our diffusion-based generation approach to an autoregressive model based on SymbolicGPT, using equivalent encoder and transformer architectures. We demonstrate that our novel approach of using diffusion-based generation for symbolic regression can offer comparable and, by some metrics, improved performance over autoregressive generation in models using similar underlying architectures, opening new research opportunities in neural-network based symbolic regression.

Related Topics

Experiential Learning

Reinforcement Learning

Deep Blue Sea (1999 Film)

Deep Throat (Watergate)

Learning

Deep-Sea Gigantism

Deep Blue (Chess Computer)

Concepts

No concepts available.

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2510.07570
PDF: https://arxiv.org/pdf/2510.07570
OA Status: green
OpenAlex ID: https://openalex.org/W4415318471

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4415318471

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2510.07570

Digital Object Identifier
Title: Symbolic-Diffusion: Deep Learning Based Symbolic Regression with D3PM Discrete Token Diffusion

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2025

Year of publication
Publication date: 2025-10-08

Full publication date if available
Authors: Ryan T. Tymkow, Benjamin Schnapp, Mojtaba Valipour, Ali Ghodshi

List of authors in order
Landing page: https://arxiv.org/abs/2510.07570

Publisher landing page
PDF URL: https://arxiv.org/pdf/2510.07570

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2510.07570

Direct OA link when available
Cited by: 0

Total citation count in OpenAlex

Full payload

id	https://openalex.org/W4415318471
doi	https://doi.org/10.48550/arxiv.2510.07570
ids.doi	https://doi.org/10.48550/arxiv.2510.07570
ids.openalex	https://openalex.org/W4415318471
fwci
type	preprint
title	Symbolic-Diffusion: Deep Learning Based Symbolic Regression with D3PM Discrete Token Diffusion
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10775
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.5442000031471252
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1707
topics[0].subfield.display_name	Computer Vision and Pattern Recognition
topics[0].display_name	Generative Adversarial Networks and Image Synthesis
is_xpac	False
apc_list
apc_paid
language	en
locations[0].id	pmh:oai:arXiv.org:2510.07570
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2510.07570
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2510.07570
locations[1].id	doi:10.48550/arxiv.2510.07570
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license	cc-by
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id	https://openalex.org/licenses/cc-by
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2510.07570
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5120050563
authorships[0].author.orcid
authorships[0].author.display_name	Ryan T. Tymkow
authorships[0].author_position	first
authorships[0].raw_author_name	Tymkow, Ryan T.
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5091561751
authorships[1].author.orcid	https://orcid.org/0000-0001-5031-8269
authorships[1].author.display_name	Benjamin Schnapp
authorships[1].author_position	middle
authorships[1].raw_author_name	Schnapp, Benjamin D.
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5075681233
authorships[2].author.orcid	https://orcid.org/0000-0002-5877-2869
authorships[2].author.display_name	Mojtaba Valipour
authorships[2].author_position	middle
authorships[2].raw_author_name	Valipour, Mojtaba
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5120050564
authorships[3].author.orcid
authorships[3].author.display_name	Ali Ghodshi
authorships[3].author_position	last
authorships[3].raw_author_name	Ghodshi, Ali
authorships[3].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2510.07570
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-18T00:00:00
display_name	Symbolic-Diffusion: Deep Learning Based Symbolic Regression with D3PM Discrete Token Diffusion
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10775
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.5442000031471252
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1707
primary_topic.subfield.display_name	Computer Vision and Pattern Recognition
primary_topic.display_name	Generative Adversarial Networks and Image Synthesis
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2510.07570
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2510.07570
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2510.07570
primary_location.id	pmh:oai:arXiv.org:2510.07570
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2510.07570
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2510.07570
publication_date	2025-10-08
publication_year	2025
referenced_works_count	0
abstract_inverted_index.a	8, 14, 103
abstract_inverted_index.We	150
abstract_inverted_index.an	58, 138
abstract_inverted_index.at	118
abstract_inverted_index.by	86, 167
abstract_inverted_index.in	175, 185
abstract_inverted_index.is	68
abstract_inverted_index.of	6, 16, 42, 115, 156
abstract_inverted_index.on	61, 81, 142
abstract_inverted_index.to	3, 12, 29, 56, 70, 89, 94, 137
abstract_inverted_index.we	99, 131
abstract_inverted_index.all	91, 113
abstract_inverted_index.and	74, 147
abstract_inverted_index.are	23, 78
abstract_inverted_index.but	33
abstract_inverted_index.can	163
abstract_inverted_index.fit	13
abstract_inverted_index.for	49, 129, 160
abstract_inverted_index.new	182
abstract_inverted_index.our	133, 153
abstract_inverted_index.set	15
abstract_inverted_index.the	4, 24, 43, 87, 116, 125
abstract_inverted_index.D3PM	104
abstract_inverted_index.Most	41
abstract_inverted_index.and,	166
abstract_inverted_index.data	17
abstract_inverted_index.have	38
abstract_inverted_index.most	25
abstract_inverted_index.once	119
abstract_inverted_index.only	80
abstract_inverted_index.over	172
abstract_inverted_index.some	168
abstract_inverted_index.task	5
abstract_inverted_index.that	152
abstract_inverted_index.this	31
abstract_inverted_index.used	28, 48
abstract_inverted_index.Using	124
abstract_inverted_index.based	21, 36, 46, 105, 141, 187
abstract_inverted_index.input	63
abstract_inverted_index.model	109, 140
abstract_inverted_index.novel	154
abstract_inverted_index.offer	164
abstract_inverted_index.token	122
abstract_inverted_index.using	120, 144, 157, 177
abstract_inverted_index.which	110
abstract_inverted_index.common	26
abstract_inverted_index.desire	88
abstract_inverted_index.future	75
abstract_inverted_index.gained	39
abstract_inverted_index.models	47, 55, 176
abstract_inverted_index.refers	2
abstract_inverted_index.tackle	30
abstract_inverted_index.tokens	72, 77, 92, 114
abstract_inverted_index.Genetic	19
abstract_inverted_index.dataset	127
abstract_inverted_index.encoded	62
abstract_inverted_index.encoder	146
abstract_inverted_index.finding	7
abstract_inverted_index.leading	44
abstract_inverted_index.limited	69
abstract_inverted_index.opening	181
abstract_inverted_index.points.	18, 64
abstract_inverted_index.produce	95
abstract_inverted_index.propose	100
abstract_inverted_index.similar	178
abstract_inverted_index.tokens.	84
abstract_inverted_index.utilize	52
abstract_inverted_index.However,	65
abstract_inverted_index.Symbolic	0, 101
abstract_inverted_index.approach	136, 155
abstract_inverted_index.compared	132
abstract_inverted_index.discrete	106, 121
abstract_inverted_index.equation	59, 117
abstract_inverted_index.generate	57, 90
abstract_inverted_index.improved	96, 170
abstract_inverted_index.metrics,	169
abstract_inverted_index.problem,	32
abstract_inverted_index.research	183
abstract_inverted_index.symbolic	50, 161, 188
abstract_inverted_index.Motivated	85
abstract_inverted_index.bivariate	126
abstract_inverted_index.developed	128
abstract_inverted_index.diffusion	108
abstract_inverted_index.generated	76, 83
abstract_inverted_index.generates	112
abstract_inverted_index.recently,	34
abstract_inverted_index.Diffusion,	102
abstract_inverted_index.algorithms	27
abstract_inverted_index.approaches	37
abstract_inverted_index.comparable	165
abstract_inverted_index.diffusion.	123
abstract_inverted_index.equations,	98
abstract_inverted_index.equivalent	145
abstract_inverted_index.expression	11
abstract_inverted_index.generating	71
abstract_inverted_index.generation	67, 135, 159, 174
abstract_inverted_index.previously	82
abstract_inverted_index.regression	1, 51, 162
abstract_inverted_index.techniques	22
abstract_inverted_index.underlying	179
abstract_inverted_index.closed-form	9, 97
abstract_inverted_index.conditioned	60, 79
abstract_inverted_index.demonstrate	151
abstract_inverted_index.performance	171
abstract_inverted_index.popularity.	40
abstract_inverted_index.programming	20
abstract_inverted_index.regression.	189
abstract_inverted_index.state-space	107
abstract_inverted_index.transformer	148
abstract_inverted_index.SymbolicGPT,	130, 143
abstract_inverted_index.mathematical	10
abstract_inverted_index.opportunities	184
abstract_inverted_index.architectures,	180
abstract_inverted_index.architectures.	149
abstract_inverted_index.autoregressive	54, 66, 139, 173
abstract_inverted_index.left-to-right,	73
abstract_inverted_index.neural-network	35, 45, 186
abstract_inverted_index.simultaneously	93, 111
abstract_inverted_index.diffusion-based	134, 158
abstract_inverted_index.transformer-based	53
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	4
citation_normalized_percentile