Can Generative Video Models Help Pose Estimation? Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2412.16155
Pairwise pose estimation from images with little or no overlap is an open challenge in computer vision. Existing methods, even those trained on large-scale datasets, struggle in these scenarios due to the lack of identifiable correspondences or visual overlap. Inspired by the human ability to infer spatial relationships from diverse scenes, we propose a novel approach, InterPose, that leverages the rich priors encoded within pre-trained generative video models. We propose to use a video model to hallucinate intermediate frames between two input images, effectively creating a dense, visual transition, which significantly simplifies the problem of pose estimation. Since current video models can still produce implausible motion or inconsistent geometry, we introduce a self-consistency score that evaluates the consistency of pose predictions from sampled videos. We demonstrate that our approach generalizes among three state-of-the-art video models and show consistent improvements over the state-of-the-art DUSt3R on four diverse datasets encompassing indoor, outdoor, and object-centric scenes. Our findings suggest a promising avenue for improving pose estimation models by leveraging large generative models trained on vast amounts of video data, which is more readily available than 3D data. See our project page for results: https://inter-pose.github.io/.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2412.16155
- https://arxiv.org/pdf/2412.16155
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4405716287
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4405716287Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2412.16155Digital Object Identifier
- Title
-
Can Generative Video Models Help Pose Estimation?Work title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-12-20Full publication date if available
- Authors
-
Ruojin Cai, Jason Zhang, Philipp Henzler, Zhengqi Li, Noah Snavely, Ricardo Martin-BruallaList of authors in order
- Landing page
-
https://arxiv.org/abs/2412.16155Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2412.16155Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2412.16155Direct OA link when available
- Concepts
-
Pose, Computer science, Generative grammar, Artificial intelligence, Estimation, Computer vision, Generative model, Machine learning, Engineering, Systems engineeringTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4405716287 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2412.16155 |
| ids.doi | https://doi.org/10.48550/arxiv.2412.16155 |
| ids.openalex | https://openalex.org/W4405716287 |
| fwci | |
| type | preprint |
| title | Can Generative Video Models Help Pose Estimation? |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10531 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9979000091552734 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Advanced Vision and Imaging |
| topics[1].id | https://openalex.org/T12290 |
| topics[1].field.id | https://openalex.org/fields/22 |
| topics[1].field.display_name | Engineering |
| topics[1].score | 0.9776999950408936 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2207 |
| topics[1].subfield.display_name | Control and Systems Engineering |
| topics[1].display_name | Human Motion and Animation |
| topics[2].id | https://openalex.org/T10812 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9754999876022339 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Human Pose and Action Recognition |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C52102323 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7021282911300659 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1671968 |
| concepts[0].display_name | Pose |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6663997173309326 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C39890363 |
| concepts[2].level | 2 |
| concepts[2].score | 0.635427713394165 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q36108 |
| concepts[2].display_name | Generative grammar |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.6295419931411743 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C96250715 |
| concepts[4].level | 2 |
| concepts[4].score | 0.559951663017273 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q965330 |
| concepts[4].display_name | Estimation |
| concepts[5].id | https://openalex.org/C31972630 |
| concepts[5].level | 1 |
| concepts[5].score | 0.5585059523582458 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[5].display_name | Computer vision |
| concepts[6].id | https://openalex.org/C167966045 |
| concepts[6].level | 3 |
| concepts[6].score | 0.465924471616745 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q5532625 |
| concepts[6].display_name | Generative model |
| concepts[7].id | https://openalex.org/C119857082 |
| concepts[7].level | 1 |
| concepts[7].score | 0.3350677788257599 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[7].display_name | Machine learning |
| concepts[8].id | https://openalex.org/C127413603 |
| concepts[8].level | 0 |
| concepts[8].score | 0.09912300109863281 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[8].display_name | Engineering |
| concepts[9].id | https://openalex.org/C201995342 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q682496 |
| concepts[9].display_name | Systems engineering |
| keywords[0].id | https://openalex.org/keywords/pose |
| keywords[0].score | 0.7021282911300659 |
| keywords[0].display_name | Pose |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6663997173309326 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/generative-grammar |
| keywords[2].score | 0.635427713394165 |
| keywords[2].display_name | Generative grammar |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.6295419931411743 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/estimation |
| keywords[4].score | 0.559951663017273 |
| keywords[4].display_name | Estimation |
| keywords[5].id | https://openalex.org/keywords/computer-vision |
| keywords[5].score | 0.5585059523582458 |
| keywords[5].display_name | Computer vision |
| keywords[6].id | https://openalex.org/keywords/generative-model |
| keywords[6].score | 0.465924471616745 |
| keywords[6].display_name | Generative model |
| keywords[7].id | https://openalex.org/keywords/machine-learning |
| keywords[7].score | 0.3350677788257599 |
| keywords[7].display_name | Machine learning |
| keywords[8].id | https://openalex.org/keywords/engineering |
| keywords[8].score | 0.09912300109863281 |
| keywords[8].display_name | Engineering |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2412.16155 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2412.16155 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2412.16155 |
| locations[1].id | doi:10.48550/arxiv.2412.16155 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2412.16155 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5000551049 |
| authorships[0].author.orcid | https://orcid.org/0009-0009-8871-3016 |
| authorships[0].author.display_name | Ruojin Cai |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Cai, Ruojin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101875766 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-4632-7730 |
| authorships[1].author.display_name | Jason Zhang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zhang, Jason Y. |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5066279452 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Philipp Henzler |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Henzler, Philipp |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101700324 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-2929-8149 |
| authorships[3].author.display_name | Zhengqi Li |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Li, Zhengqi |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5085248097 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-6921-6833 |
| authorships[4].author.display_name | Noah Snavely |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Snavely, Noah |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5054383242 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-3247-9522 |
| authorships[5].author.display_name | Ricardo Martin-Brualla |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Martin-Brualla, Ricardo |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2412.16155 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Can Generative Video Models Help Pose Estimation? |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10531 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9979000091552734 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Advanced Vision and Imaging |
| related_works | https://openalex.org/W4365211920, https://openalex.org/W3014948380, https://openalex.org/W4391584540, https://openalex.org/W4380551139, https://openalex.org/W4317695495, https://openalex.org/W4395044357, https://openalex.org/W4287117424, https://openalex.org/W4387506531, https://openalex.org/W2087346071, https://openalex.org/W2967848559 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2412.16155 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2412.16155 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2412.16155 |
| primary_location.id | pmh:oai:arXiv.org:2412.16155 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2412.16155 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2412.16155 |
| publication_date | 2024-12-20 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 53, 72, 85, 111, 156 |
| abstract_inverted_index.3D | 182 |
| abstract_inverted_index.We | 68, 124 |
| abstract_inverted_index.an | 11 |
| abstract_inverted_index.by | 40, 164 |
| abstract_inverted_index.in | 14, 26 |
| abstract_inverted_index.is | 10, 177 |
| abstract_inverted_index.no | 8 |
| abstract_inverted_index.of | 33, 94, 118, 173 |
| abstract_inverted_index.on | 22, 143, 170 |
| abstract_inverted_index.or | 7, 36, 106 |
| abstract_inverted_index.to | 30, 44, 70, 75 |
| abstract_inverted_index.we | 51, 109 |
| abstract_inverted_index.Our | 153 |
| abstract_inverted_index.See | 184 |
| abstract_inverted_index.and | 135, 150 |
| abstract_inverted_index.can | 101 |
| abstract_inverted_index.due | 29 |
| abstract_inverted_index.for | 159, 188 |
| abstract_inverted_index.our | 127, 185 |
| abstract_inverted_index.the | 31, 41, 59, 92, 116, 140 |
| abstract_inverted_index.two | 80 |
| abstract_inverted_index.use | 71 |
| abstract_inverted_index.even | 19 |
| abstract_inverted_index.four | 144 |
| abstract_inverted_index.from | 3, 48, 121 |
| abstract_inverted_index.lack | 32 |
| abstract_inverted_index.more | 178 |
| abstract_inverted_index.open | 12 |
| abstract_inverted_index.over | 139 |
| abstract_inverted_index.page | 187 |
| abstract_inverted_index.pose | 1, 95, 119, 161 |
| abstract_inverted_index.rich | 60 |
| abstract_inverted_index.show | 136 |
| abstract_inverted_index.than | 181 |
| abstract_inverted_index.that | 57, 114, 126 |
| abstract_inverted_index.vast | 171 |
| abstract_inverted_index.with | 5 |
| abstract_inverted_index.Since | 97 |
| abstract_inverted_index.among | 130 |
| abstract_inverted_index.data, | 175 |
| abstract_inverted_index.data. | 183 |
| abstract_inverted_index.human | 42 |
| abstract_inverted_index.infer | 45 |
| abstract_inverted_index.input | 81 |
| abstract_inverted_index.large | 166 |
| abstract_inverted_index.model | 74 |
| abstract_inverted_index.novel | 54 |
| abstract_inverted_index.score | 113 |
| abstract_inverted_index.still | 102 |
| abstract_inverted_index.these | 27 |
| abstract_inverted_index.those | 20 |
| abstract_inverted_index.three | 131 |
| abstract_inverted_index.video | 66, 73, 99, 133, 174 |
| abstract_inverted_index.which | 89, 176 |
| abstract_inverted_index.DUSt3R | 142 |
| abstract_inverted_index.avenue | 158 |
| abstract_inverted_index.dense, | 86 |
| abstract_inverted_index.frames | 78 |
| abstract_inverted_index.images | 4 |
| abstract_inverted_index.little | 6 |
| abstract_inverted_index.models | 100, 134, 163, 168 |
| abstract_inverted_index.motion | 105 |
| abstract_inverted_index.priors | 61 |
| abstract_inverted_index.visual | 37, 87 |
| abstract_inverted_index.within | 63 |
| abstract_inverted_index.ability | 43 |
| abstract_inverted_index.amounts | 172 |
| abstract_inverted_index.between | 79 |
| abstract_inverted_index.current | 98 |
| abstract_inverted_index.diverse | 49, 145 |
| abstract_inverted_index.encoded | 62 |
| abstract_inverted_index.images, | 82 |
| abstract_inverted_index.indoor, | 148 |
| abstract_inverted_index.models. | 67 |
| abstract_inverted_index.overlap | 9 |
| abstract_inverted_index.problem | 93 |
| abstract_inverted_index.produce | 103 |
| abstract_inverted_index.project | 186 |
| abstract_inverted_index.propose | 52, 69 |
| abstract_inverted_index.readily | 179 |
| abstract_inverted_index.sampled | 122 |
| abstract_inverted_index.scenes, | 50 |
| abstract_inverted_index.scenes. | 152 |
| abstract_inverted_index.spatial | 46 |
| abstract_inverted_index.suggest | 155 |
| abstract_inverted_index.trained | 21, 169 |
| abstract_inverted_index.videos. | 123 |
| abstract_inverted_index.vision. | 16 |
| abstract_inverted_index.Existing | 17 |
| abstract_inverted_index.Inspired | 39 |
| abstract_inverted_index.Pairwise | 0 |
| abstract_inverted_index.approach | 128 |
| abstract_inverted_index.computer | 15 |
| abstract_inverted_index.creating | 84 |
| abstract_inverted_index.datasets | 146 |
| abstract_inverted_index.findings | 154 |
| abstract_inverted_index.methods, | 18 |
| abstract_inverted_index.outdoor, | 149 |
| abstract_inverted_index.overlap. | 38 |
| abstract_inverted_index.results: | 189 |
| abstract_inverted_index.struggle | 25 |
| abstract_inverted_index.approach, | 55 |
| abstract_inverted_index.available | 180 |
| abstract_inverted_index.challenge | 13 |
| abstract_inverted_index.datasets, | 24 |
| abstract_inverted_index.evaluates | 115 |
| abstract_inverted_index.geometry, | 108 |
| abstract_inverted_index.improving | 160 |
| abstract_inverted_index.introduce | 110 |
| abstract_inverted_index.leverages | 58 |
| abstract_inverted_index.promising | 157 |
| abstract_inverted_index.scenarios | 28 |
| abstract_inverted_index.InterPose, | 56 |
| abstract_inverted_index.consistent | 137 |
| abstract_inverted_index.estimation | 2, 162 |
| abstract_inverted_index.generative | 65, 167 |
| abstract_inverted_index.leveraging | 165 |
| abstract_inverted_index.simplifies | 91 |
| abstract_inverted_index.consistency | 117 |
| abstract_inverted_index.demonstrate | 125 |
| abstract_inverted_index.effectively | 83 |
| abstract_inverted_index.estimation. | 96 |
| abstract_inverted_index.generalizes | 129 |
| abstract_inverted_index.hallucinate | 76 |
| abstract_inverted_index.implausible | 104 |
| abstract_inverted_index.large-scale | 23 |
| abstract_inverted_index.pre-trained | 64 |
| abstract_inverted_index.predictions | 120 |
| abstract_inverted_index.transition, | 88 |
| abstract_inverted_index.encompassing | 147 |
| abstract_inverted_index.identifiable | 34 |
| abstract_inverted_index.improvements | 138 |
| abstract_inverted_index.inconsistent | 107 |
| abstract_inverted_index.intermediate | 77 |
| abstract_inverted_index.relationships | 47 |
| abstract_inverted_index.significantly | 90 |
| abstract_inverted_index.object-centric | 151 |
| abstract_inverted_index.correspondences | 35 |
| abstract_inverted_index.self-consistency | 112 |
| abstract_inverted_index.state-of-the-art | 132, 141 |
| abstract_inverted_index.https://inter-pose.github.io/. | 190 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |