Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2509.24473
Spatial intelligence spans a rich suite of abilities, including visualising and transforming shapes, mentally rotating objects, judging relational positions and containment, and estimating numerosity. However, it still remains a critical unresolved challenge for Multimodal Large Language Models (MLLMs). To fill this gap, we propose to treat Euclidean geometry problem-solving as a surrogate task. Specifically, we meticulously constructed a curated multimodal dataset, called Euclid30K, comprising approximately 30K plane and solid geometry problems. Furthermore, to enable the model to learn and apply Euclidean principles from these geometry problems, we fine-tuned seven model variants (spanning 3--72B parameters) from the Qwen2.5VL, Qwen3VL, and RoboBrain2.0 families using Group Relative Policy Optimization (GRPO), inspiring the models to identify shapes, count, and relate entities, and perform multi-step deductive reasoning using Euclidean principles. Our experiments demonstrate that the resulting models achieve substantial zero-shot gains across four spatial reasoning benchmarks (Super-CLEVR, Omni3DBench, VSI-Bench, and MindCube) without any task-specific adaptations. Notably, after training on the Euclid30K, the mean VSI-Bench accuracy rose from 36.6\% to 41.8\% (+5.2\%), and the mean MindCube accuracy rose from 31.4\% to 38.1\% (+6.7\%). To our knowledge, this is the first systematic study showing that geometry-centric fine-tuning can confer vision-language models with broadly transferable spatial skills. Code and Euclid30K dataset can be found in \href{https://zgca-ai4edu.github.io/Euclids_Gift}{this}.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2509.24473
- https://arxiv.org/pdf/2509.24473
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415336589
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415336589Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2509.24473Digital Object Identifier
- Title
-
Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate TasksWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-09-29Full publication date if available
- Authors
-
Shijie Lian, Changti Wu, Laurence T. Yang, Hang Yuan, B. X. Yu, Lei Zhang, Kai ChenList of authors in order
- Landing page
-
https://arxiv.org/abs/2509.24473Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2509.24473Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2509.24473Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415336589 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2509.24473 |
| ids.doi | https://doi.org/10.48550/arxiv.2509.24473 |
| ids.openalex | https://openalex.org/W4415336589 |
| fwci | |
| type | preprint |
| title | Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11904 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.9717000126838684 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2203 |
| topics[0].subfield.display_name | Automotive Engineering |
| topics[0].display_name | Spatial Cognition and Navigation |
| topics[1].id | https://openalex.org/T11714 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9627000093460083 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Multimodal Machine Learning Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2509.24473 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2509.24473 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2509.24473 |
| locations[1].id | doi:10.48550/arxiv.2509.24473 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2509.24473 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5112968072 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Shijie Lian |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Lian, Shijie |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5069583018 |
| authorships[1].author.orcid | https://orcid.org/0009-0009-9448-6657 |
| authorships[1].author.display_name | Changti Wu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wu, Changti |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5049154222 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-7986-4244 |
| authorships[2].author.display_name | Laurence T. Yang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Yang, Laurence Tianruo |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5005822593 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-5536-0909 |
| authorships[3].author.display_name | Hang Yuan |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Yuan, Hang |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5107927787 |
| authorships[4].author.orcid | https://orcid.org/0009-0006-0869-4184 |
| authorships[4].author.display_name | B. X. Yu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Yu, Bin |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100639046 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-9586-595X |
| authorships[5].author.display_name | Lei Zhang |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Zhang, Lei |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5100437988 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-8647-1182 |
| authorships[6].author.display_name | Kai Chen |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Chen, Kai |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2509.24473 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-19T00:00:00 |
| display_name | Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-23T05:10:03.516525 |
| primary_topic.id | https://openalex.org/T11904 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.9717000126838684 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2203 |
| primary_topic.subfield.display_name | Automotive Engineering |
| primary_topic.display_name | Spatial Cognition and Navigation |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2509.24473 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2509.24473 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2509.24473 |
| primary_location.id | pmh:oai:arXiv.org:2509.24473 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2509.24473 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2509.24473 |
| publication_date | 2025-09-29 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 3, 28, 50, 57 |
| abstract_inverted_index.To | 38, 177 |
| abstract_inverted_index.as | 49 |
| abstract_inverted_index.be | 204 |
| abstract_inverted_index.in | 206 |
| abstract_inverted_index.is | 181 |
| abstract_inverted_index.it | 25 |
| abstract_inverted_index.of | 6 |
| abstract_inverted_index.on | 153 |
| abstract_inverted_index.to | 44, 72, 76, 110, 163, 174 |
| abstract_inverted_index.we | 42, 54, 86 |
| abstract_inverted_index.30K | 65 |
| abstract_inverted_index.Our | 125 |
| abstract_inverted_index.and | 10, 19, 21, 67, 78, 98, 114, 117, 144, 166, 200 |
| abstract_inverted_index.any | 147 |
| abstract_inverted_index.can | 190, 203 |
| abstract_inverted_index.for | 32 |
| abstract_inverted_index.our | 178 |
| abstract_inverted_index.the | 74, 95, 108, 129, 154, 156, 167, 182 |
| abstract_inverted_index.Code | 199 |
| abstract_inverted_index.fill | 39 |
| abstract_inverted_index.four | 137 |
| abstract_inverted_index.from | 82, 94, 161, 172 |
| abstract_inverted_index.gap, | 41 |
| abstract_inverted_index.mean | 157, 168 |
| abstract_inverted_index.rich | 4 |
| abstract_inverted_index.rose | 160, 171 |
| abstract_inverted_index.that | 128, 187 |
| abstract_inverted_index.this | 40, 180 |
| abstract_inverted_index.with | 194 |
| abstract_inverted_index.Group | 102 |
| abstract_inverted_index.Large | 34 |
| abstract_inverted_index.after | 151 |
| abstract_inverted_index.apply | 79 |
| abstract_inverted_index.first | 183 |
| abstract_inverted_index.found | 205 |
| abstract_inverted_index.gains | 135 |
| abstract_inverted_index.learn | 77 |
| abstract_inverted_index.model | 75, 89 |
| abstract_inverted_index.plane | 66 |
| abstract_inverted_index.seven | 88 |
| abstract_inverted_index.solid | 68 |
| abstract_inverted_index.spans | 2 |
| abstract_inverted_index.still | 26 |
| abstract_inverted_index.study | 185 |
| abstract_inverted_index.suite | 5 |
| abstract_inverted_index.task. | 52 |
| abstract_inverted_index.these | 83 |
| abstract_inverted_index.treat | 45 |
| abstract_inverted_index.using | 101, 122 |
| abstract_inverted_index.3--72B | 92 |
| abstract_inverted_index.31.4\% | 173 |
| abstract_inverted_index.36.6\% | 162 |
| abstract_inverted_index.38.1\% | 175 |
| abstract_inverted_index.41.8\% | 164 |
| abstract_inverted_index.Models | 36 |
| abstract_inverted_index.Policy | 104 |
| abstract_inverted_index.across | 136 |
| abstract_inverted_index.called | 61 |
| abstract_inverted_index.confer | 191 |
| abstract_inverted_index.count, | 113 |
| abstract_inverted_index.enable | 73 |
| abstract_inverted_index.models | 109, 131, 193 |
| abstract_inverted_index.relate | 115 |
| abstract_inverted_index.(GRPO), | 106 |
| abstract_inverted_index.Spatial | 0 |
| abstract_inverted_index.achieve | 132 |
| abstract_inverted_index.broadly | 195 |
| abstract_inverted_index.curated | 58 |
| abstract_inverted_index.dataset | 202 |
| abstract_inverted_index.judging | 16 |
| abstract_inverted_index.perform | 118 |
| abstract_inverted_index.propose | 43 |
| abstract_inverted_index.remains | 27 |
| abstract_inverted_index.shapes, | 12, 112 |
| abstract_inverted_index.showing | 186 |
| abstract_inverted_index.skills. | 198 |
| abstract_inverted_index.spatial | 138, 197 |
| abstract_inverted_index.without | 146 |
| abstract_inverted_index.(MLLMs). | 37 |
| abstract_inverted_index.However, | 24 |
| abstract_inverted_index.Language | 35 |
| abstract_inverted_index.MindCube | 169 |
| abstract_inverted_index.Notably, | 150 |
| abstract_inverted_index.Qwen3VL, | 97 |
| abstract_inverted_index.Relative | 103 |
| abstract_inverted_index.accuracy | 159, 170 |
| abstract_inverted_index.critical | 29 |
| abstract_inverted_index.dataset, | 60 |
| abstract_inverted_index.families | 100 |
| abstract_inverted_index.geometry | 47, 69, 84 |
| abstract_inverted_index.identify | 111 |
| abstract_inverted_index.mentally | 13 |
| abstract_inverted_index.objects, | 15 |
| abstract_inverted_index.rotating | 14 |
| abstract_inverted_index.training | 152 |
| abstract_inverted_index.variants | 90 |
| abstract_inverted_index.(+5.2\%), | 165 |
| abstract_inverted_index.(+6.7\%). | 176 |
| abstract_inverted_index.(spanning | 91 |
| abstract_inverted_index.Euclid30K | 201 |
| abstract_inverted_index.Euclidean | 46, 80, 123 |
| abstract_inverted_index.MindCube) | 145 |
| abstract_inverted_index.VSI-Bench | 158 |
| abstract_inverted_index.challenge | 31 |
| abstract_inverted_index.deductive | 120 |
| abstract_inverted_index.entities, | 116 |
| abstract_inverted_index.including | 8 |
| abstract_inverted_index.inspiring | 107 |
| abstract_inverted_index.positions | 18 |
| abstract_inverted_index.problems, | 85 |
| abstract_inverted_index.problems. | 70 |
| abstract_inverted_index.reasoning | 121, 139 |
| abstract_inverted_index.resulting | 130 |
| abstract_inverted_index.surrogate | 51 |
| abstract_inverted_index.zero-shot | 134 |
| abstract_inverted_index.Euclid30K, | 62, 155 |
| abstract_inverted_index.Multimodal | 33 |
| abstract_inverted_index.Qwen2.5VL, | 96 |
| abstract_inverted_index.VSI-Bench, | 143 |
| abstract_inverted_index.abilities, | 7 |
| abstract_inverted_index.benchmarks | 140 |
| abstract_inverted_index.comprising | 63 |
| abstract_inverted_index.estimating | 22 |
| abstract_inverted_index.fine-tuned | 87 |
| abstract_inverted_index.knowledge, | 179 |
| abstract_inverted_index.multi-step | 119 |
| abstract_inverted_index.multimodal | 59 |
| abstract_inverted_index.principles | 81 |
| abstract_inverted_index.relational | 17 |
| abstract_inverted_index.systematic | 184 |
| abstract_inverted_index.unresolved | 30 |
| abstract_inverted_index.constructed | 56 |
| abstract_inverted_index.demonstrate | 127 |
| abstract_inverted_index.experiments | 126 |
| abstract_inverted_index.fine-tuning | 189 |
| abstract_inverted_index.numerosity. | 23 |
| abstract_inverted_index.parameters) | 93 |
| abstract_inverted_index.principles. | 124 |
| abstract_inverted_index.substantial | 133 |
| abstract_inverted_index.visualising | 9 |
| abstract_inverted_index.Furthermore, | 71 |
| abstract_inverted_index.Omni3DBench, | 142 |
| abstract_inverted_index.Optimization | 105 |
| abstract_inverted_index.RoboBrain2.0 | 99 |
| abstract_inverted_index.adaptations. | 149 |
| abstract_inverted_index.containment, | 20 |
| abstract_inverted_index.intelligence | 1 |
| abstract_inverted_index.meticulously | 55 |
| abstract_inverted_index.transferable | 196 |
| abstract_inverted_index.transforming | 11 |
| abstract_inverted_index.(Super-CLEVR, | 141 |
| abstract_inverted_index.Specifically, | 53 |
| abstract_inverted_index.approximately | 64 |
| abstract_inverted_index.task-specific | 148 |
| abstract_inverted_index.problem-solving | 48 |
| abstract_inverted_index.vision-language | 192 |
| abstract_inverted_index.geometry-centric | 188 |
| abstract_inverted_index.\href{https://zgca-ai4edu.github.io/Euclids_Gift}{this}. | 207 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |