Object Pose Estimation via the Aggregation of Diffusion Features Article Swipe

PDF

Tianfu Wang , Guosheng Hu , Hongguang Wang ·

YOU? · · 2024 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2403.18791

Estimating the pose of objects from images is a crucial task of 3D scene understanding, and recent approaches have shown promising results on very large benchmarks. However, these methods experience a significant performance drop when dealing with unseen objects. We believe that it results from the limited generalizability of image features. To address this problem, we have an in-depth analysis on the features of diffusion models, e.g. Stable Diffusion, which hold substantial potential for modeling unseen objects. Based on this analysis, we then innovatively introduce these diffusion features for object pose estimation. To achieve this, we propose three distinct architectures that can effectively capture and aggregate diffusion features of different granularity, greatly improving the generalizability of object pose estimation. Our approach outperforms the state-of-the-art methods by a considerable margin on three popular benchmark datasets, LM, O-LM, and T-LESS. In particular, our method achieves higher accuracy than the previous best arts on unseen objects: 97.9% vs. 93.5% on Unseen LM, 85.9% vs. 76.3% on Unseen O-LM, showing the strong generalizability of our method. Our code is released at https://github.com/Tianfu18/diff-feats-pose.

Related Topics

General-Purpose Machine Gun

Computer Science

Artificial Intelligence

Concepts

Pose Computer science Object (grammar) Artificial intelligence Estimation Diffusion Computer vision Pattern recognition (psychology) Engineering Physics Systems engineering Thermodynamics

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2403.18791
PDF: https://arxiv.org/pdf/2403.18791
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4393300604

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4393300604

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2403.18791

Digital Object Identifier
Title: Object Pose Estimation via the Aggregation of Diffusion Features

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2024

Year of publication
Publication date: 2024-03-27

Full publication date if available
Authors: Tianfu Wang, Guosheng Hu, Hongguang Wang

List of authors in order
Landing page: https://arxiv.org/abs/2403.18791

Publisher landing page
PDF URL: https://arxiv.org/pdf/2403.18791

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2403.18791

Direct OA link when available
Concepts: Pose, Computer science, Object (grammar), Artificial intelligence, Estimation, Diffusion, Computer vision, Pattern recognition (psychology), Engineering, Physics, Systems engineering, Thermodynamics

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4393300604
doi	https://doi.org/10.48550/arxiv.2403.18791
ids.doi	https://doi.org/10.48550/arxiv.2403.18791
ids.openalex	https://openalex.org/W4393300604
fwci
type	preprint
title	Object Pose Estimation via the Aggregation of Diffusion Features
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10812
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9980999827384949
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1707
topics[0].subfield.display_name	Computer Vision and Pattern Recognition
topics[0].display_name	Human Pose and Action Recognition
topics[1].id	https://openalex.org/T12549
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9955000281333923
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1707
topics[1].subfield.display_name	Computer Vision and Pattern Recognition
topics[1].display_name	Image and Object Detection Techniques
topics[2].id	https://openalex.org/T10531
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.9918000102043152
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1707
topics[2].subfield.display_name	Computer Vision and Pattern Recognition
topics[2].display_name	Advanced Vision and Imaging
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C52102323
concepts[0].level	2
concepts[0].score	0.6537832021713257
concepts[0].wikidata	https://www.wikidata.org/wiki/Q1671968
concepts[0].display_name	Pose
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.6223524212837219
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C2781238097
concepts[2].level	2
concepts[2].score	0.5944611430168152
concepts[2].wikidata	https://www.wikidata.org/wiki/Q175026
concepts[2].display_name	Object (grammar)
concepts[3].id	https://openalex.org/C154945302
concepts[3].level	1
concepts[3].score	0.5767810344696045
concepts[3].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[3].display_name	Artificial intelligence
concepts[4].id	https://openalex.org/C96250715
concepts[4].level	2
concepts[4].score	0.5293874144554138
concepts[4].wikidata	https://www.wikidata.org/wiki/Q965330
concepts[4].display_name	Estimation
concepts[5].id	https://openalex.org/C69357855
concepts[5].level	2
concepts[5].score	0.5215924978256226
concepts[5].wikidata	https://www.wikidata.org/wiki/Q163214
concepts[5].display_name	Diffusion
concepts[6].id	https://openalex.org/C31972630
concepts[6].level	1
concepts[6].score	0.48820143938064575
concepts[6].wikidata	https://www.wikidata.org/wiki/Q844240
concepts[6].display_name	Computer vision
concepts[7].id	https://openalex.org/C153180895
concepts[7].level	2
concepts[7].score	0.33089759945869446
concepts[7].wikidata	https://www.wikidata.org/wiki/Q7148389
concepts[7].display_name	Pattern recognition (psychology)
concepts[8].id	https://openalex.org/C127413603
concepts[8].level	0
concepts[8].score	0.11914035677909851
concepts[8].wikidata	https://www.wikidata.org/wiki/Q11023
concepts[8].display_name	Engineering
concepts[9].id	https://openalex.org/C121332964
concepts[9].level	0
concepts[9].score	0.07470756769180298
concepts[9].wikidata	https://www.wikidata.org/wiki/Q413
concepts[9].display_name	Physics
concepts[10].id	https://openalex.org/C201995342
concepts[10].level	1
concepts[10].score	0.05878084897994995
concepts[10].wikidata	https://www.wikidata.org/wiki/Q682496
concepts[10].display_name	Systems engineering
concepts[11].id	https://openalex.org/C97355855
concepts[11].level	1
concepts[11].score	0.0
concepts[11].wikidata	https://www.wikidata.org/wiki/Q11473
concepts[11].display_name	Thermodynamics
keywords[0].id	https://openalex.org/keywords/pose
keywords[0].score	0.6537832021713257
keywords[0].display_name	Pose
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.6223524212837219
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/object
keywords[2].score	0.5944611430168152
keywords[2].display_name	Object (grammar)
keywords[3].id	https://openalex.org/keywords/artificial-intelligence
keywords[3].score	0.5767810344696045
keywords[3].display_name	Artificial intelligence
keywords[4].id	https://openalex.org/keywords/estimation
keywords[4].score	0.5293874144554138
keywords[4].display_name	Estimation
keywords[5].id	https://openalex.org/keywords/diffusion
keywords[5].score	0.5215924978256226
keywords[5].display_name	Diffusion
keywords[6].id	https://openalex.org/keywords/computer-vision
keywords[6].score	0.48820143938064575
keywords[6].display_name	Computer vision
keywords[7].id	https://openalex.org/keywords/pattern-recognition
keywords[7].score	0.33089759945869446
keywords[7].display_name	Pattern recognition (psychology)
keywords[8].id	https://openalex.org/keywords/engineering
keywords[8].score	0.11914035677909851
keywords[8].display_name	Engineering
keywords[9].id	https://openalex.org/keywords/physics
keywords[9].score	0.07470756769180298
keywords[9].display_name	Physics
keywords[10].id	https://openalex.org/keywords/systems-engineering
keywords[10].score	0.05878084897994995
keywords[10].display_name	Systems engineering
language	en
locations[0].id	pmh:oai:arXiv.org:2403.18791
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2403.18791
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2403.18791
locations[1].id	doi:10.48550/arxiv.2403.18791
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2403.18791
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5046543349
authorships[0].author.orcid	https://orcid.org/0000-0002-1248-1214
authorships[0].author.display_name	Tianfu Wang
authorships[0].author_position	first
authorships[0].raw_author_name	Wang, Tianfu
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5075333422
authorships[1].author.orcid	https://orcid.org/0000-0002-9448-9892
authorships[1].author.display_name	Guosheng Hu
authorships[1].author_position	middle
authorships[1].raw_author_name	Hu, Guosheng
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5004967576
authorships[2].author.orcid	https://orcid.org/0000-0001-8994-4523
authorships[2].author.display_name	Hongguang Wang
authorships[2].author_position	last
authorships[2].raw_author_name	Wang, Hongguang
authorships[2].is_corresponding	False
has_content.pdf	True
has_content.grobid_xml	True
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2403.18791
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Object Pose Estimation via the Aggregation of Diffusion Features
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10812
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9980999827384949
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1707
primary_topic.subfield.display_name	Computer Vision and Pattern Recognition
primary_topic.display_name	Human Pose and Action Recognition
related_works	https://openalex.org/W2123263858, https://openalex.org/W3127959533, https://openalex.org/W2894986065, https://openalex.org/W4387967917, https://openalex.org/W4287600488, https://openalex.org/W4386925306, https://openalex.org/W4387968151, https://openalex.org/W3132124459, https://openalex.org/W2946083937, https://openalex.org/W3110557940
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2403.18791
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2403.18791
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2403.18791
primary_location.id	pmh:oai:arXiv.org:2403.18791
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2403.18791
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2403.18791
publication_date	2024-03-27
publication_year	2024
referenced_works_count	0
abstract_inverted_index.a	8, 30, 126
abstract_inverted_index.3D	12
abstract_inverted_index.In	138
abstract_inverted_index.To	51, 92
abstract_inverted_index.We	39
abstract_inverted_index.an	57
abstract_inverted_index.at	176
abstract_inverted_index.by	125
abstract_inverted_index.is	7, 174
abstract_inverted_index.it	42
abstract_inverted_index.of	3, 11, 48, 63, 108, 115, 169
abstract_inverted_index.on	22, 60, 78, 129, 150, 156, 162
abstract_inverted_index.we	55, 81, 95
abstract_inverted_index.LM,	134, 158
abstract_inverted_index.Our	119, 172
abstract_inverted_index.and	15, 104, 136
abstract_inverted_index.can	101
abstract_inverted_index.for	73, 88
abstract_inverted_index.our	140, 170
abstract_inverted_index.the	1, 45, 61, 113, 122, 146, 166
abstract_inverted_index.vs.	154, 160
abstract_inverted_index.arts	149
abstract_inverted_index.best	148
abstract_inverted_index.code	173
abstract_inverted_index.drop	33
abstract_inverted_index.e.g.	66
abstract_inverted_index.from	5, 44
abstract_inverted_index.have	18, 56
abstract_inverted_index.hold	70
abstract_inverted_index.pose	2, 90, 117
abstract_inverted_index.task	10
abstract_inverted_index.than	145
abstract_inverted_index.that	41, 100
abstract_inverted_index.then	82
abstract_inverted_index.this	53, 79
abstract_inverted_index.very	23
abstract_inverted_index.when	34
abstract_inverted_index.with	36
abstract_inverted_index.76.3%	161
abstract_inverted_index.85.9%	159
abstract_inverted_index.93.5%	155
abstract_inverted_index.97.9%	153
abstract_inverted_index.Based	77
abstract_inverted_index.O-LM,	135, 164
abstract_inverted_index.image	49
abstract_inverted_index.large	24
abstract_inverted_index.scene	13
abstract_inverted_index.shown	19
abstract_inverted_index.these	27, 85
abstract_inverted_index.this,	94
abstract_inverted_index.three	97, 130
abstract_inverted_index.which	69
abstract_inverted_index.Stable	67
abstract_inverted_index.Unseen	157, 163
abstract_inverted_index.higher	143
abstract_inverted_index.images	6
abstract_inverted_index.margin	128
abstract_inverted_index.method	141
abstract_inverted_index.object	89, 116
abstract_inverted_index.recent	16
abstract_inverted_index.strong	167
abstract_inverted_index.unseen	37, 75, 151
abstract_inverted_index.T-LESS.	137
abstract_inverted_index.achieve	93
abstract_inverted_index.address	52
abstract_inverted_index.believe	40
abstract_inverted_index.capture	103
abstract_inverted_index.crucial	9
abstract_inverted_index.dealing	35
abstract_inverted_index.greatly	111
abstract_inverted_index.limited	46
abstract_inverted_index.method.	171
abstract_inverted_index.methods	28, 124
abstract_inverted_index.models,	65
abstract_inverted_index.objects	4
abstract_inverted_index.popular	131
abstract_inverted_index.propose	96
abstract_inverted_index.results	21, 43
abstract_inverted_index.showing	165
abstract_inverted_index.However,	26
abstract_inverted_index.accuracy	144
abstract_inverted_index.achieves	142
abstract_inverted_index.analysis	59
abstract_inverted_index.approach	120
abstract_inverted_index.distinct	98
abstract_inverted_index.features	62, 87, 107
abstract_inverted_index.in-depth	58
abstract_inverted_index.modeling	74
abstract_inverted_index.objects.	38, 76
abstract_inverted_index.objects:	152
abstract_inverted_index.previous	147
abstract_inverted_index.problem,	54
abstract_inverted_index.released	175
abstract_inverted_index.aggregate	105
abstract_inverted_index.analysis,	80
abstract_inverted_index.benchmark	132
abstract_inverted_index.datasets,	133
abstract_inverted_index.different	109
abstract_inverted_index.diffusion	64, 86, 106
abstract_inverted_index.features.	50
abstract_inverted_index.improving	112
abstract_inverted_index.introduce	84
abstract_inverted_index.potential	72
abstract_inverted_index.promising	20
abstract_inverted_index.Diffusion,	68
abstract_inverted_index.Estimating	0
abstract_inverted_index.approaches	17
abstract_inverted_index.experience	29
abstract_inverted_index.benchmarks.	25
abstract_inverted_index.effectively	102
abstract_inverted_index.estimation.	91, 118
abstract_inverted_index.outperforms	121
abstract_inverted_index.particular,	139
abstract_inverted_index.performance	32
abstract_inverted_index.significant	31
abstract_inverted_index.substantial	71
abstract_inverted_index.considerable	127
abstract_inverted_index.granularity,	110
abstract_inverted_index.innovatively	83
abstract_inverted_index.architectures	99
abstract_inverted_index.understanding,	14
abstract_inverted_index.generalizability	47, 114, 168
abstract_inverted_index.state-of-the-art	123
abstract_inverted_index.https://github.com/Tianfu18/diff-feats-pose.	177
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	3
citation_normalized_percentile