Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2412.09586
We address the problem of gaze target estimation, which aims to predict where a person is looking in a scene. Predicting a person's gaze target requires reasoning both about the person's appearance and the contents of the scene. Prior works have developed increasingly complex, hand-crafted pipelines for gaze target estimation that carefully fuse features from separate scene encoders, head encoders, and auxiliary models for signals like depth and pose. Motivated by the success of general-purpose feature extractors on a variety of visual tasks, we propose Gaze-LLE, a novel transformer framework that streamlines gaze target estimation by leveraging features from a frozen DINOv2 encoder. We extract a single feature representation for the scene, and apply a person-specific positional prompt to decode gaze with a lightweight module. We demonstrate state-of-the-art performance across several gaze benchmarks and provide extensive analysis to validate our design choices. Our code is available at: http://github.com/fkryan/gazelle .
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2412.09586
- https://arxiv.org/pdf/2412.09586
- OA Status
- green
- Cited By
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4405355355
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4405355355Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2412.09586Digital Object Identifier
- Title
-
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned EncodersWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-12-12Full publication date if available
- Authors
-
Fiona Ryan, Ajay Bati, Sang Min Lee, Daniel Bolya, Judy Hoffman, James M. RehgList of authors in order
- Landing page
-
https://arxiv.org/abs/2412.09586Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2412.09586Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2412.09586Direct OA link when available
- Concepts
-
Gaze, Computer science, Computer vision, Scale (ratio), Artificial intelligence, Encoder, Estimation, Geography, Cartography, Economics, Management, Operating systemTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1, 2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4405355355 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2412.09586 |
| ids.doi | https://doi.org/10.48550/arxiv.2412.09586 |
| ids.openalex | https://openalex.org/W4405355355 |
| fwci | |
| type | preprint |
| title | Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11707 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9916999936103821 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1709 |
| topics[0].subfield.display_name | Human-Computer Interaction |
| topics[0].display_name | Gaze Tracking and Assistive Technology |
| topics[1].id | https://openalex.org/T11398 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9836999773979187 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1709 |
| topics[1].subfield.display_name | Human-Computer Interaction |
| topics[1].display_name | Hand Gesture Recognition Systems |
| topics[2].id | https://openalex.org/T12740 |
| topics[2].field.id | https://openalex.org/fields/22 |
| topics[2].field.display_name | Engineering |
| topics[2].score | 0.9692999720573425 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2204 |
| topics[2].subfield.display_name | Biomedical Engineering |
| topics[2].display_name | Gait Recognition and Analysis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2779916870 |
| concepts[0].level | 2 |
| concepts[0].score | 0.93171226978302 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q14467155 |
| concepts[0].display_name | Gaze |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6223486065864563 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C31972630 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5598348379135132 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[2].display_name | Computer vision |
| concepts[3].id | https://openalex.org/C2778755073 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5545628070831299 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q10858537 |
| concepts[3].display_name | Scale (ratio) |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5468870401382446 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C118505674 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5257015228271484 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q42586063 |
| concepts[5].display_name | Encoder |
| concepts[6].id | https://openalex.org/C96250715 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4518432915210724 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q965330 |
| concepts[6].display_name | Estimation |
| concepts[7].id | https://openalex.org/C205649164 |
| concepts[7].level | 0 |
| concepts[7].score | 0.2774772047996521 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[7].display_name | Geography |
| concepts[8].id | https://openalex.org/C58640448 |
| concepts[8].level | 1 |
| concepts[8].score | 0.1752343475818634 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q42515 |
| concepts[8].display_name | Cartography |
| concepts[9].id | https://openalex.org/C162324750 |
| concepts[9].level | 0 |
| concepts[9].score | 0.10992354154586792 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[9].display_name | Economics |
| concepts[10].id | https://openalex.org/C187736073 |
| concepts[10].level | 1 |
| concepts[10].score | 0.03344103693962097 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q2920921 |
| concepts[10].display_name | Management |
| concepts[11].id | https://openalex.org/C111919701 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[11].display_name | Operating system |
| keywords[0].id | https://openalex.org/keywords/gaze |
| keywords[0].score | 0.93171226978302 |
| keywords[0].display_name | Gaze |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6223486065864563 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/computer-vision |
| keywords[2].score | 0.5598348379135132 |
| keywords[2].display_name | Computer vision |
| keywords[3].id | https://openalex.org/keywords/scale |
| keywords[3].score | 0.5545628070831299 |
| keywords[3].display_name | Scale (ratio) |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.5468870401382446 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/encoder |
| keywords[5].score | 0.5257015228271484 |
| keywords[5].display_name | Encoder |
| keywords[6].id | https://openalex.org/keywords/estimation |
| keywords[6].score | 0.4518432915210724 |
| keywords[6].display_name | Estimation |
| keywords[7].id | https://openalex.org/keywords/geography |
| keywords[7].score | 0.2774772047996521 |
| keywords[7].display_name | Geography |
| keywords[8].id | https://openalex.org/keywords/cartography |
| keywords[8].score | 0.1752343475818634 |
| keywords[8].display_name | Cartography |
| keywords[9].id | https://openalex.org/keywords/economics |
| keywords[9].score | 0.10992354154586792 |
| keywords[9].display_name | Economics |
| keywords[10].id | https://openalex.org/keywords/management |
| keywords[10].score | 0.03344103693962097 |
| keywords[10].display_name | Management |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2412.09586 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2412.09586 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2412.09586 |
| locations[1].id | doi:10.48550/arxiv.2412.09586 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2412.09586 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5016641535 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Fiona Ryan |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ryan, Fiona |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5112972252 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Ajay Bati |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Bati, Ajay |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100342118 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-9568-2096 |
| authorships[2].author.display_name | Sang Min Lee |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Lee, Sangmin |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5063326998 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-0223-3599 |
| authorships[3].author.display_name | Daniel Bolya |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Bolya, Daniel |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5038068459 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-1971-1606 |
| authorships[4].author.display_name | Judy Hoffman |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Hoffman, Judy |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5002228469 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-1793-5462 |
| authorships[5].author.display_name | James M. Rehg |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Rehg, James M. |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2412.09586 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11707 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9916999936103821 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1709 |
| primary_topic.subfield.display_name | Human-Computer Interaction |
| primary_topic.display_name | Gaze Tracking and Assistive Technology |
| related_works | https://openalex.org/W1880689012, https://openalex.org/W3014378845, https://openalex.org/W4240909707, https://openalex.org/W2059546927, https://openalex.org/W3207760378, https://openalex.org/W1986970529, https://openalex.org/W2562758970, https://openalex.org/W1563178652, https://openalex.org/W2947492009, https://openalex.org/W2385108104 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2412.09586 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2412.09586 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2412.09586 |
| primary_location.id | pmh:oai:arXiv.org:2412.09586 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2412.09586 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2412.09586 |
| publication_date | 2024-12-12 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.. | 148 |
| abstract_inverted_index.a | 13, 18, 21, 78, 86, 99, 105, 114, 122 |
| abstract_inverted_index.We | 0, 103, 125 |
| abstract_inverted_index.by | 70, 95 |
| abstract_inverted_index.in | 17 |
| abstract_inverted_index.is | 15, 144 |
| abstract_inverted_index.of | 4, 35, 73, 80 |
| abstract_inverted_index.on | 77 |
| abstract_inverted_index.to | 10, 118, 137 |
| abstract_inverted_index.we | 83 |
| abstract_inverted_index.Our | 142 |
| abstract_inverted_index.and | 32, 60, 67, 112, 133 |
| abstract_inverted_index.at: | 146 |
| abstract_inverted_index.for | 46, 63, 109 |
| abstract_inverted_index.our | 139 |
| abstract_inverted_index.the | 2, 29, 33, 36, 71, 110 |
| abstract_inverted_index.aims | 9 |
| abstract_inverted_index.both | 27 |
| abstract_inverted_index.code | 143 |
| abstract_inverted_index.from | 54, 98 |
| abstract_inverted_index.fuse | 52 |
| abstract_inverted_index.gaze | 5, 23, 47, 92, 120, 131 |
| abstract_inverted_index.have | 40 |
| abstract_inverted_index.head | 58 |
| abstract_inverted_index.like | 65 |
| abstract_inverted_index.that | 50, 90 |
| abstract_inverted_index.with | 121 |
| abstract_inverted_index.Prior | 38 |
| abstract_inverted_index.about | 28 |
| abstract_inverted_index.apply | 113 |
| abstract_inverted_index.depth | 66 |
| abstract_inverted_index.novel | 87 |
| abstract_inverted_index.pose. | 68 |
| abstract_inverted_index.scene | 56 |
| abstract_inverted_index.where | 12 |
| abstract_inverted_index.which | 8 |
| abstract_inverted_index.works | 39 |
| abstract_inverted_index.DINOv2 | 101 |
| abstract_inverted_index.across | 129 |
| abstract_inverted_index.decode | 119 |
| abstract_inverted_index.design | 140 |
| abstract_inverted_index.frozen | 100 |
| abstract_inverted_index.models | 62 |
| abstract_inverted_index.person | 14 |
| abstract_inverted_index.prompt | 117 |
| abstract_inverted_index.scene, | 111 |
| abstract_inverted_index.scene. | 19, 37 |
| abstract_inverted_index.single | 106 |
| abstract_inverted_index.target | 6, 24, 48, 93 |
| abstract_inverted_index.tasks, | 82 |
| abstract_inverted_index.visual | 81 |
| abstract_inverted_index.address | 1 |
| abstract_inverted_index.extract | 104 |
| abstract_inverted_index.feature | 75, 107 |
| abstract_inverted_index.looking | 16 |
| abstract_inverted_index.module. | 124 |
| abstract_inverted_index.predict | 11 |
| abstract_inverted_index.problem | 3 |
| abstract_inverted_index.propose | 84 |
| abstract_inverted_index.provide | 134 |
| abstract_inverted_index.several | 130 |
| abstract_inverted_index.signals | 64 |
| abstract_inverted_index.success | 72 |
| abstract_inverted_index.variety | 79 |
| abstract_inverted_index.analysis | 136 |
| abstract_inverted_index.choices. | 141 |
| abstract_inverted_index.complex, | 43 |
| abstract_inverted_index.contents | 34 |
| abstract_inverted_index.encoder. | 102 |
| abstract_inverted_index.features | 53, 97 |
| abstract_inverted_index.person's | 22, 30 |
| abstract_inverted_index.requires | 25 |
| abstract_inverted_index.separate | 55 |
| abstract_inverted_index.validate | 138 |
| abstract_inverted_index.Gaze-LLE, | 85 |
| abstract_inverted_index.Motivated | 69 |
| abstract_inverted_index.auxiliary | 61 |
| abstract_inverted_index.available | 145 |
| abstract_inverted_index.carefully | 51 |
| abstract_inverted_index.developed | 41 |
| abstract_inverted_index.encoders, | 57, 59 |
| abstract_inverted_index.extensive | 135 |
| abstract_inverted_index.framework | 89 |
| abstract_inverted_index.pipelines | 45 |
| abstract_inverted_index.reasoning | 26 |
| abstract_inverted_index.Predicting | 20 |
| abstract_inverted_index.appearance | 31 |
| abstract_inverted_index.benchmarks | 132 |
| abstract_inverted_index.estimation | 49, 94 |
| abstract_inverted_index.extractors | 76 |
| abstract_inverted_index.leveraging | 96 |
| abstract_inverted_index.positional | 116 |
| abstract_inverted_index.demonstrate | 126 |
| abstract_inverted_index.estimation, | 7 |
| abstract_inverted_index.lightweight | 123 |
| abstract_inverted_index.performance | 128 |
| abstract_inverted_index.streamlines | 91 |
| abstract_inverted_index.transformer | 88 |
| abstract_inverted_index.hand-crafted | 44 |
| abstract_inverted_index.increasingly | 42 |
| abstract_inverted_index.representation | 108 |
| abstract_inverted_index.general-purpose | 74 |
| abstract_inverted_index.person-specific | 115 |
| abstract_inverted_index.state-of-the-art | 127 |
| abstract_inverted_index.http://github.com/fkryan/gazelle | 147 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |