Geometric Features Enhanced Human-Object Interaction Detection Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2406.18691
Cameras are essential vision instruments to capture images for pattern detection and measurement. Human-object interaction (HOI) detection is one of the most popular pattern detection approaches for captured human-centric visual scenes. Recently, Transformer-based models have become the dominant approach for HOI detection due to their advanced network architectures and thus promising results. However, most of them follow the one-stage design of vanilla Transformer, leaving rich geometric priors under-exploited and leading to compromised performance especially when occlusion occurs. Given that geometric features tend to outperform visual ones in occluded scenarios and offer information that complements visual cues, we propose a novel end-to-end Transformer-style HOI detection model, i.e., geometric features enhanced HOI detector (GeoHOI). One key part of the model is a new unified self-supervised keypoint learning method named UniPointNet that bridges the gap of consistent keypoint representation across diverse object categories, including humans. GeoHOI effectively upgrades a Transformer-based HOI detector benefiting from the keypoints similarities measuring the likelihood of human-object interactions as well as local keypoint patches to enhance interaction query representation, so as to boost HOI predictions. Extensive experiments show that the proposed method outperforms the state-of-the-art models on V-COCO and achieves competitive performance on HICO-DET. Case study results on the post-disaster rescue with vision-based instruments showcase the applicability of the proposed GeoHOI in real-world applications.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2406.18691
- https://arxiv.org/pdf/2406.18691
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4400141387
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4400141387Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2406.18691Digital Object Identifier
- Title
-
Geometric Features Enhanced Human-Object Interaction DetectionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-06-26Full publication date if available
- Authors
-
Manli Zhu, Edmond S. L. Ho, Shuang Chen, Longzhi Yang, Hubert P. H. ShumList of authors in order
- Landing page
-
https://arxiv.org/abs/2406.18691Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2406.18691Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2406.18691Direct OA link when available
- Concepts
-
Object (grammar), Computer science, Computer vision, Artificial intelligenceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4400141387 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2406.18691 |
| ids.doi | https://doi.org/10.48550/arxiv.2406.18691 |
| ids.openalex | https://openalex.org/W4400141387 |
| fwci | |
| type | preprint |
| title | Geometric Features Enhanced Human-Object Interaction Detection |
| awards[0].id | https://openalex.org/G5090036199 |
| awards[0].funder_id | https://openalex.org/F4320334627 |
| awards[0].display_name | |
| awards[0].funder_award_id | EP/X031012/1 |
| awards[0].funder_display_name | Engineering and Physical Sciences Research Council |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10812 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9962999820709229 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Human Pose and Action Recognition |
| topics[1].id | https://openalex.org/T11398 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.991100013256073 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1709 |
| topics[1].subfield.display_name | Human-Computer Interaction |
| topics[1].display_name | Hand Gesture Recognition Systems |
| topics[2].id | https://openalex.org/T11714 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9782999753952026 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Multimodal Machine Learning Applications |
| funders[0].id | https://openalex.org/F4320334627 |
| funders[0].ror | https://ror.org/0439y7842 |
| funders[0].display_name | Engineering and Physical Sciences Research Council |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2781238097 |
| concepts[0].level | 2 |
| concepts[0].score | 0.5795375108718872 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q175026 |
| concepts[0].display_name | Object (grammar) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5275154709815979 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C31972630 |
| concepts[2].level | 1 |
| concepts[2].score | 0.48984429240226746 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[2].display_name | Computer vision |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.47709447145462036 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| keywords[0].id | https://openalex.org/keywords/object |
| keywords[0].score | 0.5795375108718872 |
| keywords[0].display_name | Object (grammar) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5275154709815979 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/computer-vision |
| keywords[2].score | 0.48984429240226746 |
| keywords[2].display_name | Computer vision |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.47709447145462036 |
| keywords[3].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2406.18691 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2406.18691 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2406.18691 |
| locations[1].id | doi:10.48550/arxiv.2406.18691 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2406.18691 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5110210759 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Manli Zhu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zhu, Manli |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5080180158 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-5862-106X |
| authorships[1].author.display_name | Edmond S. L. Ho |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ho, Edmond S. L. |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100443501 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-5452-194X |
| authorships[2].author.display_name | Shuang Chen |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Chen, Shuang |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5065079117 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-2115-4909 |
| authorships[3].author.display_name | Longzhi Yang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Yang, Longzhi |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5038258635 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-5651-6039 |
| authorships[4].author.display_name | Hubert P. H. Shum |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Shum, Hubert P. H. |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2406.18691 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Geometric Features Enhanced Human-Object Interaction Detection |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10812 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9962999820709229 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Human Pose and Action Recognition |
| related_works | https://openalex.org/W2058170566, https://openalex.org/W2755342338, https://openalex.org/W2772917594, https://openalex.org/W2775347418, https://openalex.org/W2166024367, https://openalex.org/W3116076068, https://openalex.org/W2229312674, https://openalex.org/W2951359407, https://openalex.org/W2079911747, https://openalex.org/W1969923398 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2406.18691 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2406.18691 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2406.18691 |
| primary_location.id | pmh:oai:arXiv.org:2406.18691 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2406.18691 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2406.18691 |
| publication_date | 2024-06-26 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 98, 119, 145 |
| abstract_inverted_index.as | 160, 162, 172 |
| abstract_inverted_index.in | 86, 213 |
| abstract_inverted_index.is | 17, 118 |
| abstract_inverted_index.of | 19, 54, 60, 115, 132, 157, 209 |
| abstract_inverted_index.on | 188, 194, 199 |
| abstract_inverted_index.so | 171 |
| abstract_inverted_index.to | 5, 43, 70, 82, 166, 173 |
| abstract_inverted_index.we | 96 |
| abstract_inverted_index.HOI | 40, 102, 109, 147, 175 |
| abstract_inverted_index.One | 112 |
| abstract_inverted_index.and | 11, 48, 68, 89, 190 |
| abstract_inverted_index.are | 1 |
| abstract_inverted_index.due | 42 |
| abstract_inverted_index.for | 8, 26, 39 |
| abstract_inverted_index.gap | 131 |
| abstract_inverted_index.key | 113 |
| abstract_inverted_index.new | 120 |
| abstract_inverted_index.one | 18 |
| abstract_inverted_index.the | 20, 36, 57, 116, 130, 151, 155, 181, 185, 200, 207, 210 |
| abstract_inverted_index.Case | 196 |
| abstract_inverted_index.from | 150 |
| abstract_inverted_index.have | 34 |
| abstract_inverted_index.most | 21, 53 |
| abstract_inverted_index.ones | 85 |
| abstract_inverted_index.part | 114 |
| abstract_inverted_index.rich | 64 |
| abstract_inverted_index.show | 179 |
| abstract_inverted_index.tend | 81 |
| abstract_inverted_index.that | 78, 92, 128, 180 |
| abstract_inverted_index.them | 55 |
| abstract_inverted_index.thus | 49 |
| abstract_inverted_index.well | 161 |
| abstract_inverted_index.when | 74 |
| abstract_inverted_index.with | 203 |
| abstract_inverted_index.(HOI) | 15 |
| abstract_inverted_index.Given | 77 |
| abstract_inverted_index.boost | 174 |
| abstract_inverted_index.cues, | 95 |
| abstract_inverted_index.i.e., | 105 |
| abstract_inverted_index.local | 163 |
| abstract_inverted_index.model | 117 |
| abstract_inverted_index.named | 126 |
| abstract_inverted_index.novel | 99 |
| abstract_inverted_index.offer | 90 |
| abstract_inverted_index.query | 169 |
| abstract_inverted_index.study | 197 |
| abstract_inverted_index.their | 44 |
| abstract_inverted_index.GeoHOI | 142, 212 |
| abstract_inverted_index.V-COCO | 189 |
| abstract_inverted_index.across | 136 |
| abstract_inverted_index.become | 35 |
| abstract_inverted_index.design | 59 |
| abstract_inverted_index.follow | 56 |
| abstract_inverted_index.images | 7 |
| abstract_inverted_index.method | 125, 183 |
| abstract_inverted_index.model, | 104 |
| abstract_inverted_index.models | 33, 187 |
| abstract_inverted_index.object | 138 |
| abstract_inverted_index.priors | 66 |
| abstract_inverted_index.rescue | 202 |
| abstract_inverted_index.vision | 3 |
| abstract_inverted_index.visual | 29, 84, 94 |
| abstract_inverted_index.Cameras | 0 |
| abstract_inverted_index.bridges | 129 |
| abstract_inverted_index.capture | 6 |
| abstract_inverted_index.diverse | 137 |
| abstract_inverted_index.enhance | 167 |
| abstract_inverted_index.humans. | 141 |
| abstract_inverted_index.leading | 69 |
| abstract_inverted_index.leaving | 63 |
| abstract_inverted_index.network | 46 |
| abstract_inverted_index.occurs. | 76 |
| abstract_inverted_index.patches | 165 |
| abstract_inverted_index.pattern | 9, 23 |
| abstract_inverted_index.popular | 22 |
| abstract_inverted_index.propose | 97 |
| abstract_inverted_index.results | 198 |
| abstract_inverted_index.scenes. | 30 |
| abstract_inverted_index.unified | 121 |
| abstract_inverted_index.vanilla | 61 |
| abstract_inverted_index.However, | 52 |
| abstract_inverted_index.achieves | 191 |
| abstract_inverted_index.advanced | 45 |
| abstract_inverted_index.approach | 38 |
| abstract_inverted_index.captured | 27 |
| abstract_inverted_index.detector | 110, 148 |
| abstract_inverted_index.dominant | 37 |
| abstract_inverted_index.enhanced | 108 |
| abstract_inverted_index.features | 80, 107 |
| abstract_inverted_index.keypoint | 123, 134, 164 |
| abstract_inverted_index.learning | 124 |
| abstract_inverted_index.occluded | 87 |
| abstract_inverted_index.proposed | 182, 211 |
| abstract_inverted_index.results. | 51 |
| abstract_inverted_index.showcase | 206 |
| abstract_inverted_index.upgrades | 144 |
| abstract_inverted_index.(GeoHOI). | 111 |
| abstract_inverted_index.Extensive | 177 |
| abstract_inverted_index.HICO-DET. | 195 |
| abstract_inverted_index.Recently, | 31 |
| abstract_inverted_index.detection | 10, 16, 24, 41, 103 |
| abstract_inverted_index.essential | 2 |
| abstract_inverted_index.geometric | 65, 79, 106 |
| abstract_inverted_index.including | 140 |
| abstract_inverted_index.keypoints | 152 |
| abstract_inverted_index.measuring | 154 |
| abstract_inverted_index.occlusion | 75 |
| abstract_inverted_index.one-stage | 58 |
| abstract_inverted_index.promising | 50 |
| abstract_inverted_index.scenarios | 88 |
| abstract_inverted_index.approaches | 25 |
| abstract_inverted_index.benefiting | 149 |
| abstract_inverted_index.consistent | 133 |
| abstract_inverted_index.end-to-end | 100 |
| abstract_inverted_index.especially | 73 |
| abstract_inverted_index.likelihood | 156 |
| abstract_inverted_index.outperform | 83 |
| abstract_inverted_index.real-world | 214 |
| abstract_inverted_index.UniPointNet | 127 |
| abstract_inverted_index.categories, | 139 |
| abstract_inverted_index.competitive | 192 |
| abstract_inverted_index.complements | 93 |
| abstract_inverted_index.compromised | 71 |
| abstract_inverted_index.effectively | 143 |
| abstract_inverted_index.experiments | 178 |
| abstract_inverted_index.information | 91 |
| abstract_inverted_index.instruments | 4, 205 |
| abstract_inverted_index.interaction | 14, 168 |
| abstract_inverted_index.outperforms | 184 |
| abstract_inverted_index.performance | 72, 193 |
| abstract_inverted_index.Human-object | 13 |
| abstract_inverted_index.Transformer, | 62 |
| abstract_inverted_index.human-object | 158 |
| abstract_inverted_index.interactions | 159 |
| abstract_inverted_index.measurement. | 12 |
| abstract_inverted_index.predictions. | 176 |
| abstract_inverted_index.similarities | 153 |
| abstract_inverted_index.vision-based | 204 |
| abstract_inverted_index.applicability | 208 |
| abstract_inverted_index.applications. | 215 |
| abstract_inverted_index.architectures | 47 |
| abstract_inverted_index.human-centric | 28 |
| abstract_inverted_index.post-disaster | 201 |
| abstract_inverted_index.representation | 135 |
| abstract_inverted_index.representation, | 170 |
| abstract_inverted_index.self-supervised | 122 |
| abstract_inverted_index.under-exploited | 67 |
| abstract_inverted_index.state-of-the-art | 186 |
| abstract_inverted_index.Transformer-based | 32, 146 |
| abstract_inverted_index.Transformer-style | 101 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |