When Is Prior Knowledge Helpful? Exploring the Evaluation and Selection of Unsupervised Pretext Tasks from a Neuro-Symbolic Perspective Article Swipe

PDF

Lin-Han Jia , Siyu Han , Wenchao Hu , Jie-Jing Shao , Wei Wei , Zhi Zhou , Lan-Zhe Guo , Yu-Feng Li ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2508.07299

Neuro-symbolic (Nesy) learning improves the target task performance of models by enabling them to satisfy knowledge, while semi/self-supervised learning (SSL) improves the target task performance by designing unsupervised pretext tasks for unlabeled data to make models satisfy corresponding assumptions. We extend the Nesy theory based on reliable knowledge to the scenario of unreliable knowledge (i.e., assumptions), thereby unifying the theoretical frameworks of SSL and Nesy. Through rigorous theoretical analysis, we demonstrate that, in theory, the impact of pretext tasks on target performance hinges on three factors: knowledge learnability with respect to the model, knowledge reliability with respect to the data, and knowledge completeness with respect to the target. We further propose schemes to operationalize these theoretical metrics, and thereby develop a method that can predict the effectiveness of pretext tasks in advance. This will change the current status quo in practical applications, where the selections of unsupervised tasks are heuristic-based rather than theory-based, and it is difficult to evaluate the rationality of unsupervised pretext task selection before testing the model on the target task. In experiments, we verify a high correlation between the predicted performance-estimated using minimal data-and the actual performance achieved after large-scale semi-supervised or self-supervised learning, thus confirming the validity of the theory and the effectiveness of the evaluation method.

Related Topics

When Life Gives You Tangerines

Knowledge Economy

Sociology Of Scientific Knowledge

When Marnie Was There (Film)

Umineko When They Cry

When The Pawn...

When You Say Nothing At All

Carnal Knowledge (Film)

Concepts

No concepts available.

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2508.07299
PDF: https://arxiv.org/pdf/2508.07299
OA Status: green
OpenAlex ID: https://openalex.org/W4416241982

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4416241982

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2508.07299

Digital Object Identifier
Title: When Is Prior Knowledge Helpful? Exploring the Evaluation and Selection of Unsupervised Pretext Tasks from a Neuro-Symbolic Perspective

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2025

Year of publication
Publication date: 2025-08-10

Full publication date if available
Authors: Lin-Han Jia, Siyu Han, Wenchao Hu, Jie-Jing Shao, Wei Wei, Zhi Zhou, Lan-Zhe Guo, Yu-Feng Li

List of authors in order
Landing page: https://arxiv.org/abs/2508.07299

Publisher landing page
PDF URL: https://arxiv.org/pdf/2508.07299

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2508.07299

Direct OA link when available
Cited by: 0

Total citation count in OpenAlex

Full payload

id	https://openalex.org/W4416241982
doi	https://doi.org/10.48550/arxiv.2508.07299
ids.doi	https://doi.org/10.48550/arxiv.2508.07299
ids.openalex	https://openalex.org/W4416241982
fwci
type	preprint
title	When Is Prior Knowledge Helpful? Exploring the Evaluation and Selection of Unsupervised Pretext Tasks from a Neuro-Symbolic Perspective
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
is_xpac	False
apc_list
apc_paid
language	en
locations[0].id	pmh:oai:arXiv.org:2508.07299
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2508.07299
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2508.07299
locations[1].id	doi:10.48550/arxiv.2508.07299
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license	cc-by
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id	https://openalex.org/licenses/cc-by
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2508.07299
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5100595775
authorships[0].author.orcid
authorships[0].author.display_name	Lin-Han Jia
authorships[0].author_position	first
authorships[0].raw_author_name	Jia, Lin-Han
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5006922566
authorships[1].author.orcid	https://orcid.org/0000-0001-8709-5564
authorships[1].author.display_name	Siyu Han
authorships[1].author_position	middle
authorships[1].raw_author_name	Han, Si-Yu
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5112700933
authorships[2].author.orcid
authorships[2].author.display_name	Wenchao Hu
authorships[2].author_position	middle
authorships[2].raw_author_name	Hu, Wen-Chao
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5087294333
authorships[3].author.orcid	https://orcid.org/0000-0001-8107-114X
authorships[3].author.display_name	Jie-Jing Shao
authorships[3].author_position	middle
authorships[3].raw_author_name	Shao, Jie-Jing
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5100323678
authorships[4].author.orcid	https://orcid.org/0000-0002-0960-7269
authorships[4].author.display_name	Wei Wei
authorships[4].author_position	middle
authorships[4].raw_author_name	Wei, Wen-Da
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5101040069
authorships[5].author.orcid
authorships[5].author.display_name	Zhi Zhou
authorships[5].author_position	middle
authorships[5].raw_author_name	Zhou, Zhi
authorships[5].is_corresponding	False
authorships[6].author.id	https://openalex.org/A5047808444
authorships[6].author.orcid	https://orcid.org/0000-0001-8965-1288
authorships[6].author.display_name	Lan-Zhe Guo
authorships[6].author_position	middle
authorships[6].raw_author_name	Guo, Lan-Zhe
authorships[6].is_corresponding	False
authorships[7].author.id	https://openalex.org/A5100355152
authorships[7].author.orcid	https://orcid.org/0000-0002-7727-4304
authorships[7].author.display_name	Yu-Feng Li
authorships[7].author_position	last
authorships[7].raw_author_name	Li, Yu-Feng
authorships[7].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2508.07299
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	When Is Prior Knowledge Helpful? Exploring the Evaluation and Selection of Unsupervised Pretext Tasks from a Neuro-Symbolic Perspective
has_fulltext	False
is_retracted	False
updated_date	2025-11-28T09:05:27.415739
primary_topic
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2508.07299
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2508.07299
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2508.07299
primary_location.id	pmh:oai:arXiv.org:2508.07299
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2508.07299
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2508.07299
publication_date	2025-08-10
publication_year	2025
referenced_works_count	0
abstract_inverted_index.a	120, 178
abstract_inverted_index.In	174
abstract_inverted_index.We	39, 108
abstract_inverted_index.by	10, 25
abstract_inverted_index.in	72, 130, 139
abstract_inverted_index.is	155
abstract_inverted_index.it	154
abstract_inverted_index.of	8, 51, 61, 76, 127, 145, 161, 202, 208
abstract_inverted_index.on	45, 79, 83, 170
abstract_inverted_index.or	195
abstract_inverted_index.to	13, 33, 48, 90, 97, 105, 112, 157
abstract_inverted_index.we	69, 176
abstract_inverted_index.SSL	62
abstract_inverted_index.and	63, 100, 117, 153, 205
abstract_inverted_index.are	148
abstract_inverted_index.can	123
abstract_inverted_index.for	30
abstract_inverted_index.quo	138
abstract_inverted_index.the	4, 21, 41, 49, 58, 74, 91, 98, 106, 125, 135, 143, 159, 168, 171, 182, 188, 200, 203, 206, 209
abstract_inverted_index.Nesy	42
abstract_inverted_index.This	132
abstract_inverted_index.data	32
abstract_inverted_index.high	179
abstract_inverted_index.make	34
abstract_inverted_index.task	6, 23, 164
abstract_inverted_index.than	151
abstract_inverted_index.that	122
abstract_inverted_index.them	12
abstract_inverted_index.thus	198
abstract_inverted_index.will	133
abstract_inverted_index.with	88, 95, 103
abstract_inverted_index.(SSL)	19
abstract_inverted_index.Nesy.	64
abstract_inverted_index.after	192
abstract_inverted_index.based	44
abstract_inverted_index.data,	99
abstract_inverted_index.model	169
abstract_inverted_index.task.	173
abstract_inverted_index.tasks	29, 78, 129, 147
abstract_inverted_index.that,	71
abstract_inverted_index.these	114
abstract_inverted_index.three	84
abstract_inverted_index.using	185
abstract_inverted_index.where	142
abstract_inverted_index.while	16
abstract_inverted_index.(Nesy)	1
abstract_inverted_index.(i.e.,	54
abstract_inverted_index.actual	189
abstract_inverted_index.before	166
abstract_inverted_index.change	134
abstract_inverted_index.extend	40
abstract_inverted_index.hinges	82
abstract_inverted_index.impact	75
abstract_inverted_index.method	121
abstract_inverted_index.model,	92
abstract_inverted_index.models	9, 35
abstract_inverted_index.rather	150
abstract_inverted_index.status	137
abstract_inverted_index.target	5, 22, 80, 172
abstract_inverted_index.theory	43, 204
abstract_inverted_index.verify	177
abstract_inverted_index.Through	65
abstract_inverted_index.between	181
abstract_inverted_index.current	136
abstract_inverted_index.develop	119
abstract_inverted_index.further	109
abstract_inverted_index.method.	211
abstract_inverted_index.minimal	186
abstract_inverted_index.predict	124
abstract_inverted_index.pretext	28, 77, 128, 163
abstract_inverted_index.propose	110
abstract_inverted_index.respect	89, 96, 104
abstract_inverted_index.satisfy	14, 36
abstract_inverted_index.schemes	111
abstract_inverted_index.target.	107
abstract_inverted_index.testing	167
abstract_inverted_index.theory,	73
abstract_inverted_index.thereby	56, 118
abstract_inverted_index.achieved	191
abstract_inverted_index.advance.	131
abstract_inverted_index.data-and	187
abstract_inverted_index.enabling	11
abstract_inverted_index.evaluate	158
abstract_inverted_index.factors:	85
abstract_inverted_index.improves	3, 20
abstract_inverted_index.learning	2, 18
abstract_inverted_index.metrics,	116
abstract_inverted_index.reliable	46
abstract_inverted_index.rigorous	66
abstract_inverted_index.scenario	50
abstract_inverted_index.unifying	57
abstract_inverted_index.validity	201
abstract_inverted_index.analysis,	68
abstract_inverted_index.designing	26
abstract_inverted_index.difficult	156
abstract_inverted_index.knowledge	47, 53, 86, 93, 101
abstract_inverted_index.learning,	197
abstract_inverted_index.practical	140
abstract_inverted_index.predicted	183
abstract_inverted_index.selection	165
abstract_inverted_index.unlabeled	31
abstract_inverted_index.confirming	199
abstract_inverted_index.evaluation	210
abstract_inverted_index.frameworks	60
abstract_inverted_index.knowledge,	15
abstract_inverted_index.selections	144
abstract_inverted_index.unreliable	52
abstract_inverted_index.correlation	180
abstract_inverted_index.demonstrate	70
abstract_inverted_index.large-scale	193
abstract_inverted_index.performance	7, 24, 81, 190
abstract_inverted_index.rationality	160
abstract_inverted_index.reliability	94
abstract_inverted_index.theoretical	59, 67, 115
abstract_inverted_index.assumptions.	38
abstract_inverted_index.completeness	102
abstract_inverted_index.experiments,	175
abstract_inverted_index.learnability	87
abstract_inverted_index.unsupervised	27, 146, 162
abstract_inverted_index.applications,	141
abstract_inverted_index.assumptions),	55
abstract_inverted_index.corresponding	37
abstract_inverted_index.effectiveness	126, 207
abstract_inverted_index.theory-based,	152
abstract_inverted_index.Neuro-symbolic	0
abstract_inverted_index.operationalize	113
abstract_inverted_index.heuristic-based	149
abstract_inverted_index.self-supervised	196
abstract_inverted_index.semi-supervised	194
abstract_inverted_index.semi/self-supervised	17
abstract_inverted_index.performance-estimated	184
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	8
citation_normalized_percentile