Disentangled Pre-training for Human-Object Interaction Detection Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2404.01725
Detecting human-object interaction (HOI) has long been limited by the amount of supervised data available. Recent approaches address this issue by pre-training according to pseudo-labels, which align object regions with HOI triplets parsed from image captions. However, pseudo-labeling is tricky and noisy, making HOI pre-training a complex process. Therefore, we propose an efficient disentangled pre-training method for HOI detection (DP-HOI) to address this problem. First, DP-HOI utilizes object detection and action recognition datasets to pre-train the detection and interaction decoder layers, respectively. Then, we arrange these decoder layers so that the pre-training architecture is consistent with the downstream HOI detection task. This facilitates efficient knowledge transfer. Specifically, the detection decoder identifies reliable human instances in each action recognition dataset image, generates one corresponding query, and feeds it into the interaction decoder for verb classification. Next, we combine the human instance verb predictions in the same image and impose image-level supervision. The DP-HOI structure can be easily adapted to the HOI detection task, enabling effective model parameter initialization. Therefore, it significantly enhances the performance of existing HOI detection models on a broad range of rare categories. The code and pre-trained weight are available at https://github.com/xingaoli/DP-HOI.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2404.01725
- https://arxiv.org/pdf/2404.01725
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4393929690
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4393929690Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2404.01725Digital Object Identifier
- Title
-
Disentangled Pre-training for Human-Object Interaction DetectionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-04-02Full publication date if available
- Authors
-
Zhuolong Li, Xing’ao Li, Changxing Ding, Xiangmin XuList of authors in order
- Landing page
-
https://arxiv.org/abs/2404.01725Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2404.01725Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2404.01725Direct OA link when available
- Concepts
-
Training (meteorology), Computer science, Object (grammar), Artificial intelligence, Computer vision, Geography, MeteorologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4393929690 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2404.01725 |
| ids.doi | https://doi.org/10.48550/arxiv.2404.01725 |
| ids.openalex | https://openalex.org/W4393929690 |
| fwci | |
| type | preprint |
| title | Disentangled Pre-training for Human-Object Interaction Detection |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10812 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9750999808311462 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Human Pose and Action Recognition |
| topics[1].id | https://openalex.org/T10036 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9598000049591064 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Advanced Neural Network Applications |
| topics[2].id | https://openalex.org/T11398 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9434999823570251 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1709 |
| topics[2].subfield.display_name | Human-Computer Interaction |
| topics[2].display_name | Hand Gesture Recognition Systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2777211547 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7344118356704712 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q17141490 |
| concepts[0].display_name | Training (meteorology) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5402691960334778 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C2781238097 |
| concepts[2].level | 2 |
| concepts[2].score | 0.4979398250579834 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q175026 |
| concepts[2].display_name | Object (grammar) |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.46604621410369873 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C31972630 |
| concepts[4].level | 1 |
| concepts[4].score | 0.34477341175079346 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[4].display_name | Computer vision |
| concepts[5].id | https://openalex.org/C205649164 |
| concepts[5].level | 0 |
| concepts[5].score | 0.10220429301261902 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[5].display_name | Geography |
| concepts[6].id | https://openalex.org/C153294291 |
| concepts[6].level | 1 |
| concepts[6].score | 0.0 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q25261 |
| concepts[6].display_name | Meteorology |
| keywords[0].id | https://openalex.org/keywords/training |
| keywords[0].score | 0.7344118356704712 |
| keywords[0].display_name | Training (meteorology) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5402691960334778 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/object |
| keywords[2].score | 0.4979398250579834 |
| keywords[2].display_name | Object (grammar) |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.46604621410369873 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/computer-vision |
| keywords[4].score | 0.34477341175079346 |
| keywords[4].display_name | Computer vision |
| keywords[5].id | https://openalex.org/keywords/geography |
| keywords[5].score | 0.10220429301261902 |
| keywords[5].display_name | Geography |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2404.01725 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2404.01725 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2404.01725 |
| locations[1].id | doi:10.48550/arxiv.2404.01725 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2404.01725 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5037772907 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Zhuolong Li |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Li, Zhuolong |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5032087629 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-7016-8883 |
| authorships[1].author.display_name | Xing’ao Li |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Li, Xingao |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5038748720 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-7232-3181 |
| authorships[2].author.display_name | Changxing Ding |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Ding, Changxing |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5007354180 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-4573-5820 |
| authorships[3].author.display_name | Xiangmin Xu |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Xu, Xiangmin |
| authorships[3].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2404.01725 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Disentangled Pre-training for Human-Object Interaction Detection |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10812 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9750999808311462 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Human Pose and Action Recognition |
| related_works | https://openalex.org/W2058170566, https://openalex.org/W2755342338, https://openalex.org/W2772917594, https://openalex.org/W2775347418, https://openalex.org/W2166024367, https://openalex.org/W3116076068, https://openalex.org/W2229312674, https://openalex.org/W2951359407, https://openalex.org/W2079911747, https://openalex.org/W1969923398 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2404.01725 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2404.01725 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2404.01725 |
| primary_location.id | pmh:oai:arXiv.org:2404.01725 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2404.01725 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2404.01725 |
| publication_date | 2024-04-02 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 45, 179 |
| abstract_inverted_index.an | 51 |
| abstract_inverted_index.at | 192 |
| abstract_inverted_index.be | 154 |
| abstract_inverted_index.by | 8, 20 |
| abstract_inverted_index.in | 114, 142 |
| abstract_inverted_index.is | 38, 93 |
| abstract_inverted_index.it | 126, 168 |
| abstract_inverted_index.of | 11, 173, 182 |
| abstract_inverted_index.on | 178 |
| abstract_inverted_index.so | 88 |
| abstract_inverted_index.to | 23, 60, 73, 157 |
| abstract_inverted_index.we | 49, 83, 135 |
| abstract_inverted_index.HOI | 30, 43, 57, 98, 159, 175 |
| abstract_inverted_index.The | 150, 185 |
| abstract_inverted_index.and | 40, 69, 77, 124, 146, 187 |
| abstract_inverted_index.are | 190 |
| abstract_inverted_index.can | 153 |
| abstract_inverted_index.for | 56, 131 |
| abstract_inverted_index.has | 4 |
| abstract_inverted_index.one | 121 |
| abstract_inverted_index.the | 9, 75, 90, 96, 107, 128, 137, 143, 158, 171 |
| abstract_inverted_index.This | 101 |
| abstract_inverted_index.been | 6 |
| abstract_inverted_index.code | 186 |
| abstract_inverted_index.data | 13 |
| abstract_inverted_index.each | 115 |
| abstract_inverted_index.from | 33 |
| abstract_inverted_index.into | 127 |
| abstract_inverted_index.long | 5 |
| abstract_inverted_index.rare | 183 |
| abstract_inverted_index.same | 144 |
| abstract_inverted_index.that | 89 |
| abstract_inverted_index.this | 18, 62 |
| abstract_inverted_index.verb | 132, 140 |
| abstract_inverted_index.with | 29, 95 |
| abstract_inverted_index.(HOI) | 3 |
| abstract_inverted_index.Next, | 134 |
| abstract_inverted_index.Then, | 82 |
| abstract_inverted_index.align | 26 |
| abstract_inverted_index.broad | 180 |
| abstract_inverted_index.feeds | 125 |
| abstract_inverted_index.human | 112, 138 |
| abstract_inverted_index.image | 34, 145 |
| abstract_inverted_index.issue | 19 |
| abstract_inverted_index.model | 164 |
| abstract_inverted_index.range | 181 |
| abstract_inverted_index.task, | 161 |
| abstract_inverted_index.task. | 100 |
| abstract_inverted_index.these | 85 |
| abstract_inverted_index.which | 25 |
| abstract_inverted_index.DP-HOI | 65, 151 |
| abstract_inverted_index.First, | 64 |
| abstract_inverted_index.Recent | 15 |
| abstract_inverted_index.action | 70, 116 |
| abstract_inverted_index.amount | 10 |
| abstract_inverted_index.easily | 155 |
| abstract_inverted_index.image, | 119 |
| abstract_inverted_index.impose | 147 |
| abstract_inverted_index.layers | 87 |
| abstract_inverted_index.making | 42 |
| abstract_inverted_index.method | 55 |
| abstract_inverted_index.models | 177 |
| abstract_inverted_index.noisy, | 41 |
| abstract_inverted_index.object | 27, 67 |
| abstract_inverted_index.parsed | 32 |
| abstract_inverted_index.query, | 123 |
| abstract_inverted_index.tricky | 39 |
| abstract_inverted_index.weight | 189 |
| abstract_inverted_index.adapted | 156 |
| abstract_inverted_index.address | 17, 61 |
| abstract_inverted_index.arrange | 84 |
| abstract_inverted_index.combine | 136 |
| abstract_inverted_index.complex | 46 |
| abstract_inverted_index.dataset | 118 |
| abstract_inverted_index.decoder | 79, 86, 109, 130 |
| abstract_inverted_index.layers, | 80 |
| abstract_inverted_index.limited | 7 |
| abstract_inverted_index.propose | 50 |
| abstract_inverted_index.regions | 28 |
| abstract_inverted_index.(DP-HOI) | 59 |
| abstract_inverted_index.However, | 36 |
| abstract_inverted_index.datasets | 72 |
| abstract_inverted_index.enabling | 162 |
| abstract_inverted_index.enhances | 170 |
| abstract_inverted_index.existing | 174 |
| abstract_inverted_index.instance | 139 |
| abstract_inverted_index.problem. | 63 |
| abstract_inverted_index.process. | 47 |
| abstract_inverted_index.reliable | 111 |
| abstract_inverted_index.triplets | 31 |
| abstract_inverted_index.utilizes | 66 |
| abstract_inverted_index.Detecting | 0 |
| abstract_inverted_index.according | 22 |
| abstract_inverted_index.available | 191 |
| abstract_inverted_index.captions. | 35 |
| abstract_inverted_index.detection | 58, 68, 76, 99, 108, 160, 176 |
| abstract_inverted_index.effective | 163 |
| abstract_inverted_index.efficient | 52, 103 |
| abstract_inverted_index.generates | 120 |
| abstract_inverted_index.instances | 113 |
| abstract_inverted_index.knowledge | 104 |
| abstract_inverted_index.parameter | 165 |
| abstract_inverted_index.pre-train | 74 |
| abstract_inverted_index.structure | 152 |
| abstract_inverted_index.transfer. | 105 |
| abstract_inverted_index.Therefore, | 48, 167 |
| abstract_inverted_index.approaches | 16 |
| abstract_inverted_index.available. | 14 |
| abstract_inverted_index.consistent | 94 |
| abstract_inverted_index.downstream | 97 |
| abstract_inverted_index.identifies | 110 |
| abstract_inverted_index.supervised | 12 |
| abstract_inverted_index.categories. | 184 |
| abstract_inverted_index.facilitates | 102 |
| abstract_inverted_index.image-level | 148 |
| abstract_inverted_index.interaction | 2, 78, 129 |
| abstract_inverted_index.performance | 172 |
| abstract_inverted_index.pre-trained | 188 |
| abstract_inverted_index.predictions | 141 |
| abstract_inverted_index.recognition | 71, 117 |
| abstract_inverted_index.architecture | 92 |
| abstract_inverted_index.disentangled | 53 |
| abstract_inverted_index.human-object | 1 |
| abstract_inverted_index.pre-training | 21, 44, 54, 91 |
| abstract_inverted_index.supervision. | 149 |
| abstract_inverted_index.Specifically, | 106 |
| abstract_inverted_index.corresponding | 122 |
| abstract_inverted_index.respectively. | 81 |
| abstract_inverted_index.significantly | 169 |
| abstract_inverted_index.pseudo-labels, | 24 |
| abstract_inverted_index.classification. | 133 |
| abstract_inverted_index.initialization. | 166 |
| abstract_inverted_index.pseudo-labeling | 37 |
| abstract_inverted_index.https://github.com/xingaoli/DP-HOI. | 193 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |