GazeCLIP: Enhancing Gaze Estimation Through Text-Guided Multimodal Learning Article Swipe

PDF

Jun Wang , Hao Ruan , Mingjie Wang , Chuanghui Zhang , Chunhua Li , Jun Zhou ·

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2401.00260

Visual gaze estimation, with its wide-ranging application scenarios, has garnered increasing attention within the research community. Although existing approaches infer gaze solely from image signals, recent advances in visual-language collaboration have demonstrated that the integration of linguistic information can significantly enhance performance across various visual tasks. Leveraging the remarkable transferability of large-scale Contrastive Language-Image Pre-training (CLIP) models, we address the open and urgent question of how to effectively apply linguistic cues to gaze estimation. In this work, we propose GazeCLIP, a novel gaze estimation framework that deeply explores text-face collaboration. Specifically, we introduce a meticulously designed linguistic description generator to produce text signals enriched with coarse directional cues. Furthermore, we present a CLIP-based backbone adept at characterizing text-face pairs for gaze estimation, complemented by a fine-grained multimodal fusion module that models the intricate interrelationships between heterogeneous inputs. Extensive experiments on three challenging datasets demonstrate the superiority of GazeCLIP, which achieves state-of-the-art accuracy. Our findings underscore the potential of using visual-language collaboration to advance gaze estimation and open new avenues for future research in multimodal learning for visual tasks. The implementation code and the pre-trained model will be made publicly available.

Related Topics

Gaze

Computer Science

Artificial Intelligence

Estimation

Machine Learning

Human–Computer Interaction

Concepts

Gaze Computer science Artificial intelligence Generator (circuit theory) Prior probability Estimation Machine learning Human–computer interaction Natural language processing Bayesian probability Power (physics) Management Physics Economics Quantum mechanics

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2401.00260
PDF: https://arxiv.org/pdf/2401.00260
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4390528629

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4390528629

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2401.00260

Digital Object Identifier
Title: GazeCLIP: Enhancing Gaze Estimation Through Text-Guided Multimodal Learning

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2023

Year of publication
Publication date: 2023-12-30

Full publication date if available
Authors: Jun Wang, Hao Ruan, Mingjie Wang, Chuanghui Zhang, Chunhua Li, Jun Zhou

List of authors in order
Landing page: https://arxiv.org/abs/2401.00260

Publisher landing page
PDF URL: https://arxiv.org/pdf/2401.00260

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2401.00260

Direct OA link when available
Concepts: Gaze, Computer science, Artificial intelligence, Generator (circuit theory), Prior probability, Estimation, Machine learning, Human–computer interaction, Natural language processing, Bayesian probability, Power (physics), Management, Physics, Economics, Quantum mechanics

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4390528629
doi	https://doi.org/10.48550/arxiv.2401.00260
ids.doi	https://doi.org/10.48550/arxiv.2401.00260
ids.openalex	https://openalex.org/W4390528629
fwci
type	preprint
title	GazeCLIP: Enhancing Gaze Estimation Through Text-Guided Multimodal Learning
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T11707
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9994999766349792
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1709
topics[0].subfield.display_name	Human-Computer Interaction
topics[0].display_name	Gaze Tracking and Assistive Technology
topics[1].id	https://openalex.org/T13731
topics[1].field.id	https://openalex.org/fields/33
topics[1].field.display_name	Social Sciences
topics[1].score	0.9763000011444092
topics[1].domain.id	https://openalex.org/domains/2
topics[1].domain.display_name	Social Sciences
topics[1].subfield.id	https://openalex.org/subfields/3322
topics[1].subfield.display_name	Urban Studies
topics[1].display_name	Advanced Computing and Algorithms
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C2779916870
concepts[0].level	2
concepts[0].score	0.8124886751174927
concepts[0].wikidata	https://www.wikidata.org/wiki/Q14467155
concepts[0].display_name	Gaze
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.7907688617706299
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C154945302
concepts[2].level	1
concepts[2].score	0.6104615926742554
concepts[2].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[2].display_name	Artificial intelligence
concepts[3].id	https://openalex.org/C2780992000
concepts[3].level	3
concepts[3].score	0.5356160402297974
concepts[3].wikidata	https://www.wikidata.org/wiki/Q17016113
concepts[3].display_name	Generator (circuit theory)
concepts[4].id	https://openalex.org/C177769412
concepts[4].level	3
concepts[4].score	0.5236108899116516
concepts[4].wikidata	https://www.wikidata.org/wiki/Q278090
concepts[4].display_name	Prior probability
concepts[5].id	https://openalex.org/C96250715
concepts[5].level	2
concepts[5].score	0.430474191904068
concepts[5].wikidata	https://www.wikidata.org/wiki/Q965330
concepts[5].display_name	Estimation
concepts[6].id	https://openalex.org/C119857082
concepts[6].level	1
concepts[6].score	0.37378475069999695
concepts[6].wikidata	https://www.wikidata.org/wiki/Q2539
concepts[6].display_name	Machine learning
concepts[7].id	https://openalex.org/C107457646
concepts[7].level	1
concepts[7].score	0.35526666045188904
concepts[7].wikidata	https://www.wikidata.org/wiki/Q207434
concepts[7].display_name	Human–computer interaction
concepts[8].id	https://openalex.org/C204321447
concepts[8].level	1
concepts[8].score	0.3272772431373596
concepts[8].wikidata	https://www.wikidata.org/wiki/Q30642
concepts[8].display_name	Natural language processing
concepts[9].id	https://openalex.org/C107673813
concepts[9].level	2
concepts[9].score	0.2843281030654907
concepts[9].wikidata	https://www.wikidata.org/wiki/Q812534
concepts[9].display_name	Bayesian probability
concepts[10].id	https://openalex.org/C163258240
concepts[10].level	2
concepts[10].score	0.11688888072967529
concepts[10].wikidata	https://www.wikidata.org/wiki/Q25342
concepts[10].display_name	Power (physics)
concepts[11].id	https://openalex.org/C187736073
concepts[11].level	1
concepts[11].score	0.0
concepts[11].wikidata	https://www.wikidata.org/wiki/Q2920921
concepts[11].display_name	Management
concepts[12].id	https://openalex.org/C121332964
concepts[12].level	0
concepts[12].score	0.0
concepts[12].wikidata	https://www.wikidata.org/wiki/Q413
concepts[12].display_name	Physics
concepts[13].id	https://openalex.org/C162324750
concepts[13].level	0
concepts[13].score	0.0
concepts[13].wikidata	https://www.wikidata.org/wiki/Q8134
concepts[13].display_name	Economics
concepts[14].id	https://openalex.org/C62520636
concepts[14].level	1
concepts[14].score	0.0
concepts[14].wikidata	https://www.wikidata.org/wiki/Q944
concepts[14].display_name	Quantum mechanics
keywords[0].id	https://openalex.org/keywords/gaze
keywords[0].score	0.8124886751174927
keywords[0].display_name	Gaze
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.7907688617706299
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/artificial-intelligence
keywords[2].score	0.6104615926742554
keywords[2].display_name	Artificial intelligence
keywords[3].id	https://openalex.org/keywords/generator
keywords[3].score	0.5356160402297974
keywords[3].display_name	Generator (circuit theory)
keywords[4].id	https://openalex.org/keywords/prior-probability
keywords[4].score	0.5236108899116516
keywords[4].display_name	Prior probability
keywords[5].id	https://openalex.org/keywords/estimation
keywords[5].score	0.430474191904068
keywords[5].display_name	Estimation
keywords[6].id	https://openalex.org/keywords/machine-learning
keywords[6].score	0.37378475069999695
keywords[6].display_name	Machine learning
keywords[7].id	https://openalex.org/keywords/human–computer-interaction
keywords[7].score	0.35526666045188904
keywords[7].display_name	Human–computer interaction
keywords[8].id	https://openalex.org/keywords/natural-language-processing
keywords[8].score	0.3272772431373596
keywords[8].display_name	Natural language processing
keywords[9].id	https://openalex.org/keywords/bayesian-probability
keywords[9].score	0.2843281030654907
keywords[9].display_name	Bayesian probability
keywords[10].id	https://openalex.org/keywords/power
keywords[10].score	0.11688888072967529
keywords[10].display_name	Power (physics)
language	en
locations[0].id	pmh:oai:arXiv.org:2401.00260
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2401.00260
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2401.00260
locations[1].id	doi:10.48550/arxiv.2401.00260
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license	cc-by
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id	https://openalex.org/licenses/cc-by
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2401.00260
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5100712167
authorships[0].author.orcid	https://orcid.org/0000-0001-9548-0411
authorships[0].author.display_name	Jun Wang
authorships[0].author_position	first
authorships[0].raw_author_name	Wang, Jun
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5102473660
authorships[1].author.orcid	https://orcid.org/0009-0005-5369-9462
authorships[1].author.display_name	Hao Ruan
authorships[1].author_position	middle
authorships[1].raw_author_name	Ruan, Hao
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5100634347
authorships[2].author.orcid	https://orcid.org/0000-0002-2419-8117
authorships[2].author.display_name	Mingjie Wang
authorships[2].author_position	middle
authorships[2].raw_author_name	Wang, Mingjie
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5024059870
authorships[3].author.orcid
authorships[3].author.display_name	Chuanghui Zhang
authorships[3].author_position	middle
authorships[3].raw_author_name	Zhang, Chuanghui
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5100781358
authorships[4].author.orcid	https://orcid.org/0000-0002-9947-6855
authorships[4].author.display_name	Chunhua Li
authorships[4].author_position	middle
authorships[4].raw_author_name	Li, Chunhua
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5101917144
authorships[5].author.orcid	https://orcid.org/0000-0001-9352-9584
authorships[5].author.display_name	Jun Zhou
authorships[5].author_position	last
authorships[5].raw_author_name	Zhou, Jun
authorships[5].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2401.00260
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2024-01-03T00:00:00
display_name	GazeCLIP: Enhancing Gaze Estimation Through Text-Guided Multimodal Learning
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T11707
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9994999766349792
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1709
primary_topic.subfield.display_name	Human-Computer Interaction
primary_topic.display_name	Gaze Tracking and Assistive Technology
related_works	https://openalex.org/W2580650124, https://openalex.org/W4386190339, https://openalex.org/W1880689012, https://openalex.org/W2968424575, https://openalex.org/W3142333283, https://openalex.org/W3122088529, https://openalex.org/W3014378845, https://openalex.org/W4240909707, https://openalex.org/W3041320102, https://openalex.org/W2111669074
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2401.00260
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2401.00260
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2401.00260
primary_location.id	pmh:oai:arXiv.org:2401.00260
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2401.00260
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2401.00260
publication_date	2023-12-30
publication_year	2023
referenced_works_count	0
abstract_inverted_index.a	80, 93, 111, 124
abstract_inverted_index.In	74
abstract_inverted_index.at	115
abstract_inverted_index.be	186
abstract_inverted_index.by	123
abstract_inverted_index.in	27, 172
abstract_inverted_index.of	35, 50, 64, 146, 157
abstract_inverted_index.on	139
abstract_inverted_index.to	66, 71, 99, 161
abstract_inverted_index.we	57, 77, 91, 109
abstract_inverted_index.Our	152
abstract_inverted_index.The	178
abstract_inverted_index.and	61, 165, 181
abstract_inverted_index.can	38
abstract_inverted_index.for	119, 169, 175
abstract_inverted_index.has	8
abstract_inverted_index.how	65
abstract_inverted_index.its	4
abstract_inverted_index.new	167
abstract_inverted_index.the	13, 33, 47, 59, 131, 144, 155, 182
abstract_inverted_index.code	180
abstract_inverted_index.cues	70
abstract_inverted_index.from	22
abstract_inverted_index.gaze	1, 20, 72, 82, 120, 163
abstract_inverted_index.have	30
abstract_inverted_index.made	187
abstract_inverted_index.open	60, 166
abstract_inverted_index.text	101
abstract_inverted_index.that	32, 85, 129
abstract_inverted_index.this	75
abstract_inverted_index.will	185
abstract_inverted_index.with	3, 104
abstract_inverted_index.adept	114
abstract_inverted_index.apply	68
abstract_inverted_index.cues.	107
abstract_inverted_index.image	23
abstract_inverted_index.infer	19
abstract_inverted_index.model	184
abstract_inverted_index.novel	81
abstract_inverted_index.pairs	118
abstract_inverted_index.three	140
abstract_inverted_index.using	158
abstract_inverted_index.which	148
abstract_inverted_index.work,	76
abstract_inverted_index.(CLIP)	55
abstract_inverted_index.Visual	0
abstract_inverted_index.across	42
abstract_inverted_index.coarse	105
abstract_inverted_index.deeply	86
abstract_inverted_index.fusion	127
abstract_inverted_index.future	170
abstract_inverted_index.models	130
abstract_inverted_index.module	128
abstract_inverted_index.recent	25
abstract_inverted_index.solely	21
abstract_inverted_index.tasks.	45, 177
abstract_inverted_index.urgent	62
abstract_inverted_index.visual	44, 176
abstract_inverted_index.within	12
abstract_inverted_index.address	58
abstract_inverted_index.advance	162
abstract_inverted_index.avenues	168
abstract_inverted_index.between	134
abstract_inverted_index.enhance	40
abstract_inverted_index.inputs.	136
abstract_inverted_index.models,	56
abstract_inverted_index.present	110
abstract_inverted_index.produce	100
abstract_inverted_index.propose	78
abstract_inverted_index.signals	102
abstract_inverted_index.various	43
abstract_inverted_index.Although	16
abstract_inverted_index.achieves	149
abstract_inverted_index.advances	26
abstract_inverted_index.backbone	113
abstract_inverted_index.datasets	142
abstract_inverted_index.designed	95
abstract_inverted_index.enriched	103
abstract_inverted_index.existing	17
abstract_inverted_index.explores	87
abstract_inverted_index.findings	153
abstract_inverted_index.garnered	9
abstract_inverted_index.learning	174
abstract_inverted_index.publicly	188
abstract_inverted_index.question	63
abstract_inverted_index.research	14, 171
abstract_inverted_index.signals,	24
abstract_inverted_index.Extensive	137
abstract_inverted_index.GazeCLIP,	79, 147
abstract_inverted_index.accuracy.	151
abstract_inverted_index.attention	11
abstract_inverted_index.framework	84
abstract_inverted_index.generator	98
abstract_inverted_index.intricate	132
abstract_inverted_index.introduce	92
abstract_inverted_index.potential	156
abstract_inverted_index.text-face	88, 117
abstract_inverted_index.CLIP-based	112
abstract_inverted_index.Leveraging	46
abstract_inverted_index.approaches	18
abstract_inverted_index.available.	189
abstract_inverted_index.community.	15
abstract_inverted_index.estimation	83, 164
abstract_inverted_index.increasing	10
abstract_inverted_index.linguistic	36, 69, 96
abstract_inverted_index.multimodal	126, 173
abstract_inverted_index.remarkable	48
abstract_inverted_index.scenarios,	7
abstract_inverted_index.underscore	154
abstract_inverted_index.Contrastive	52
abstract_inverted_index.application	6
abstract_inverted_index.challenging	141
abstract_inverted_index.demonstrate	143
abstract_inverted_index.description	97
abstract_inverted_index.directional	106
abstract_inverted_index.effectively	67
abstract_inverted_index.estimation,	2, 121
abstract_inverted_index.estimation.	73
abstract_inverted_index.experiments	138
abstract_inverted_index.information	37
abstract_inverted_index.integration	34
abstract_inverted_index.large-scale	51
abstract_inverted_index.performance	41
abstract_inverted_index.pre-trained	183
abstract_inverted_index.superiority	145
abstract_inverted_index.Furthermore,	108
abstract_inverted_index.Pre-training	54
abstract_inverted_index.complemented	122
abstract_inverted_index.demonstrated	31
abstract_inverted_index.fine-grained	125
abstract_inverted_index.meticulously	94
abstract_inverted_index.wide-ranging	5
abstract_inverted_index.Specifically,	90
abstract_inverted_index.collaboration	29, 160
abstract_inverted_index.heterogeneous	135
abstract_inverted_index.significantly	39
abstract_inverted_index.Language-Image	53
abstract_inverted_index.characterizing	116
abstract_inverted_index.collaboration.	89
abstract_inverted_index.implementation	179
abstract_inverted_index.transferability	49
abstract_inverted_index.visual-language	28, 159
abstract_inverted_index.state-of-the-art	150
abstract_inverted_index.interrelationships	133
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	6
citation_normalized_percentile