Object Pose Estimation via the Aggregation of Diffusion Features Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2403.18791
Estimating the pose of objects from images is a crucial task of 3D scene understanding, and recent approaches have shown promising results on very large benchmarks. However, these methods experience a significant performance drop when dealing with unseen objects. We believe that it results from the limited generalizability of image features. To address this problem, we have an in-depth analysis on the features of diffusion models, e.g. Stable Diffusion, which hold substantial potential for modeling unseen objects. Based on this analysis, we then innovatively introduce these diffusion features for object pose estimation. To achieve this, we propose three distinct architectures that can effectively capture and aggregate diffusion features of different granularity, greatly improving the generalizability of object pose estimation. Our approach outperforms the state-of-the-art methods by a considerable margin on three popular benchmark datasets, LM, O-LM, and T-LESS. In particular, our method achieves higher accuracy than the previous best arts on unseen objects: 97.9% vs. 93.5% on Unseen LM, 85.9% vs. 76.3% on Unseen O-LM, showing the strong generalizability of our method. Our code is released at https://github.com/Tianfu18/diff-feats-pose.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2403.18791
- https://arxiv.org/pdf/2403.18791
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4393300604
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4393300604Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2403.18791Digital Object Identifier
- Title
-
Object Pose Estimation via the Aggregation of Diffusion FeaturesWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-03-27Full publication date if available
- Authors
-
Tianfu Wang, Guosheng Hu, Hongguang WangList of authors in order
- Landing page
-
https://arxiv.org/abs/2403.18791Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2403.18791Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2403.18791Direct OA link when available
- Concepts
-
Pose, Computer science, Object (grammar), Artificial intelligence, Estimation, Diffusion, Computer vision, Pattern recognition (psychology), Engineering, Physics, Systems engineering, ThermodynamicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4393300604 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2403.18791 |
| ids.doi | https://doi.org/10.48550/arxiv.2403.18791 |
| ids.openalex | https://openalex.org/W4393300604 |
| fwci | |
| type | preprint |
| title | Object Pose Estimation via the Aggregation of Diffusion Features |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10812 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9980999827384949 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Human Pose and Action Recognition |
| topics[1].id | https://openalex.org/T12549 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9955000281333923 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Image and Object Detection Techniques |
| topics[2].id | https://openalex.org/T10531 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9918000102043152 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Advanced Vision and Imaging |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C52102323 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6537832021713257 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1671968 |
| concepts[0].display_name | Pose |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6223524212837219 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C2781238097 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5944611430168152 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q175026 |
| concepts[2].display_name | Object (grammar) |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5767810344696045 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C96250715 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5293874144554138 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q965330 |
| concepts[4].display_name | Estimation |
| concepts[5].id | https://openalex.org/C69357855 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5215924978256226 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q163214 |
| concepts[5].display_name | Diffusion |
| concepts[6].id | https://openalex.org/C31972630 |
| concepts[6].level | 1 |
| concepts[6].score | 0.48820143938064575 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[6].display_name | Computer vision |
| concepts[7].id | https://openalex.org/C153180895 |
| concepts[7].level | 2 |
| concepts[7].score | 0.33089759945869446 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[7].display_name | Pattern recognition (psychology) |
| concepts[8].id | https://openalex.org/C127413603 |
| concepts[8].level | 0 |
| concepts[8].score | 0.11914035677909851 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[8].display_name | Engineering |
| concepts[9].id | https://openalex.org/C121332964 |
| concepts[9].level | 0 |
| concepts[9].score | 0.07470756769180298 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[9].display_name | Physics |
| concepts[10].id | https://openalex.org/C201995342 |
| concepts[10].level | 1 |
| concepts[10].score | 0.05878084897994995 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q682496 |
| concepts[10].display_name | Systems engineering |
| concepts[11].id | https://openalex.org/C97355855 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q11473 |
| concepts[11].display_name | Thermodynamics |
| keywords[0].id | https://openalex.org/keywords/pose |
| keywords[0].score | 0.6537832021713257 |
| keywords[0].display_name | Pose |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6223524212837219 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/object |
| keywords[2].score | 0.5944611430168152 |
| keywords[2].display_name | Object (grammar) |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.5767810344696045 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/estimation |
| keywords[4].score | 0.5293874144554138 |
| keywords[4].display_name | Estimation |
| keywords[5].id | https://openalex.org/keywords/diffusion |
| keywords[5].score | 0.5215924978256226 |
| keywords[5].display_name | Diffusion |
| keywords[6].id | https://openalex.org/keywords/computer-vision |
| keywords[6].score | 0.48820143938064575 |
| keywords[6].display_name | Computer vision |
| keywords[7].id | https://openalex.org/keywords/pattern-recognition |
| keywords[7].score | 0.33089759945869446 |
| keywords[7].display_name | Pattern recognition (psychology) |
| keywords[8].id | https://openalex.org/keywords/engineering |
| keywords[8].score | 0.11914035677909851 |
| keywords[8].display_name | Engineering |
| keywords[9].id | https://openalex.org/keywords/physics |
| keywords[9].score | 0.07470756769180298 |
| keywords[9].display_name | Physics |
| keywords[10].id | https://openalex.org/keywords/systems-engineering |
| keywords[10].score | 0.05878084897994995 |
| keywords[10].display_name | Systems engineering |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2403.18791 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2403.18791 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2403.18791 |
| locations[1].id | doi:10.48550/arxiv.2403.18791 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2403.18791 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5046543349 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-1248-1214 |
| authorships[0].author.display_name | Tianfu Wang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wang, Tianfu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5075333422 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-9448-9892 |
| authorships[1].author.display_name | Guosheng Hu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Hu, Guosheng |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5004967576 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-8994-4523 |
| authorships[2].author.display_name | Hongguang Wang |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Wang, Hongguang |
| authorships[2].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2403.18791 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Object Pose Estimation via the Aggregation of Diffusion Features |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10812 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9980999827384949 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Human Pose and Action Recognition |
| related_works | https://openalex.org/W2123263858, https://openalex.org/W3127959533, https://openalex.org/W2894986065, https://openalex.org/W4387967917, https://openalex.org/W4287600488, https://openalex.org/W4386925306, https://openalex.org/W4387968151, https://openalex.org/W3132124459, https://openalex.org/W2946083937, https://openalex.org/W3110557940 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2403.18791 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2403.18791 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2403.18791 |
| primary_location.id | pmh:oai:arXiv.org:2403.18791 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2403.18791 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2403.18791 |
| publication_date | 2024-03-27 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 8, 30, 126 |
| abstract_inverted_index.3D | 12 |
| abstract_inverted_index.In | 138 |
| abstract_inverted_index.To | 51, 92 |
| abstract_inverted_index.We | 39 |
| abstract_inverted_index.an | 57 |
| abstract_inverted_index.at | 176 |
| abstract_inverted_index.by | 125 |
| abstract_inverted_index.is | 7, 174 |
| abstract_inverted_index.it | 42 |
| abstract_inverted_index.of | 3, 11, 48, 63, 108, 115, 169 |
| abstract_inverted_index.on | 22, 60, 78, 129, 150, 156, 162 |
| abstract_inverted_index.we | 55, 81, 95 |
| abstract_inverted_index.LM, | 134, 158 |
| abstract_inverted_index.Our | 119, 172 |
| abstract_inverted_index.and | 15, 104, 136 |
| abstract_inverted_index.can | 101 |
| abstract_inverted_index.for | 73, 88 |
| abstract_inverted_index.our | 140, 170 |
| abstract_inverted_index.the | 1, 45, 61, 113, 122, 146, 166 |
| abstract_inverted_index.vs. | 154, 160 |
| abstract_inverted_index.arts | 149 |
| abstract_inverted_index.best | 148 |
| abstract_inverted_index.code | 173 |
| abstract_inverted_index.drop | 33 |
| abstract_inverted_index.e.g. | 66 |
| abstract_inverted_index.from | 5, 44 |
| abstract_inverted_index.have | 18, 56 |
| abstract_inverted_index.hold | 70 |
| abstract_inverted_index.pose | 2, 90, 117 |
| abstract_inverted_index.task | 10 |
| abstract_inverted_index.than | 145 |
| abstract_inverted_index.that | 41, 100 |
| abstract_inverted_index.then | 82 |
| abstract_inverted_index.this | 53, 79 |
| abstract_inverted_index.very | 23 |
| abstract_inverted_index.when | 34 |
| abstract_inverted_index.with | 36 |
| abstract_inverted_index.76.3% | 161 |
| abstract_inverted_index.85.9% | 159 |
| abstract_inverted_index.93.5% | 155 |
| abstract_inverted_index.97.9% | 153 |
| abstract_inverted_index.Based | 77 |
| abstract_inverted_index.O-LM, | 135, 164 |
| abstract_inverted_index.image | 49 |
| abstract_inverted_index.large | 24 |
| abstract_inverted_index.scene | 13 |
| abstract_inverted_index.shown | 19 |
| abstract_inverted_index.these | 27, 85 |
| abstract_inverted_index.this, | 94 |
| abstract_inverted_index.three | 97, 130 |
| abstract_inverted_index.which | 69 |
| abstract_inverted_index.Stable | 67 |
| abstract_inverted_index.Unseen | 157, 163 |
| abstract_inverted_index.higher | 143 |
| abstract_inverted_index.images | 6 |
| abstract_inverted_index.margin | 128 |
| abstract_inverted_index.method | 141 |
| abstract_inverted_index.object | 89, 116 |
| abstract_inverted_index.recent | 16 |
| abstract_inverted_index.strong | 167 |
| abstract_inverted_index.unseen | 37, 75, 151 |
| abstract_inverted_index.T-LESS. | 137 |
| abstract_inverted_index.achieve | 93 |
| abstract_inverted_index.address | 52 |
| abstract_inverted_index.believe | 40 |
| abstract_inverted_index.capture | 103 |
| abstract_inverted_index.crucial | 9 |
| abstract_inverted_index.dealing | 35 |
| abstract_inverted_index.greatly | 111 |
| abstract_inverted_index.limited | 46 |
| abstract_inverted_index.method. | 171 |
| abstract_inverted_index.methods | 28, 124 |
| abstract_inverted_index.models, | 65 |
| abstract_inverted_index.objects | 4 |
| abstract_inverted_index.popular | 131 |
| abstract_inverted_index.propose | 96 |
| abstract_inverted_index.results | 21, 43 |
| abstract_inverted_index.showing | 165 |
| abstract_inverted_index.However, | 26 |
| abstract_inverted_index.accuracy | 144 |
| abstract_inverted_index.achieves | 142 |
| abstract_inverted_index.analysis | 59 |
| abstract_inverted_index.approach | 120 |
| abstract_inverted_index.distinct | 98 |
| abstract_inverted_index.features | 62, 87, 107 |
| abstract_inverted_index.in-depth | 58 |
| abstract_inverted_index.modeling | 74 |
| abstract_inverted_index.objects. | 38, 76 |
| abstract_inverted_index.objects: | 152 |
| abstract_inverted_index.previous | 147 |
| abstract_inverted_index.problem, | 54 |
| abstract_inverted_index.released | 175 |
| abstract_inverted_index.aggregate | 105 |
| abstract_inverted_index.analysis, | 80 |
| abstract_inverted_index.benchmark | 132 |
| abstract_inverted_index.datasets, | 133 |
| abstract_inverted_index.different | 109 |
| abstract_inverted_index.diffusion | 64, 86, 106 |
| abstract_inverted_index.features. | 50 |
| abstract_inverted_index.improving | 112 |
| abstract_inverted_index.introduce | 84 |
| abstract_inverted_index.potential | 72 |
| abstract_inverted_index.promising | 20 |
| abstract_inverted_index.Diffusion, | 68 |
| abstract_inverted_index.Estimating | 0 |
| abstract_inverted_index.approaches | 17 |
| abstract_inverted_index.experience | 29 |
| abstract_inverted_index.benchmarks. | 25 |
| abstract_inverted_index.effectively | 102 |
| abstract_inverted_index.estimation. | 91, 118 |
| abstract_inverted_index.outperforms | 121 |
| abstract_inverted_index.particular, | 139 |
| abstract_inverted_index.performance | 32 |
| abstract_inverted_index.significant | 31 |
| abstract_inverted_index.substantial | 71 |
| abstract_inverted_index.considerable | 127 |
| abstract_inverted_index.granularity, | 110 |
| abstract_inverted_index.innovatively | 83 |
| abstract_inverted_index.architectures | 99 |
| abstract_inverted_index.understanding, | 14 |
| abstract_inverted_index.generalizability | 47, 114, 168 |
| abstract_inverted_index.state-of-the-art | 123 |
| abstract_inverted_index.https://github.com/Tianfu18/diff-feats-pose. | 177 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |