LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2406.04659
The capacity of existing human keypoint localization models is limited by keypoint priors provided by the training data. To alleviate this restriction and pursue more general model, this work studies keypoint localization from a different perspective by reasoning locations based on keypiont clues in text descriptions. We propose LocLLM, the first Large-Language Model (LLM) based keypoint localization model that takes images and text instructions as inputs and outputs the desired keypoint coordinates. LocLLM leverages the strong reasoning capability of LLM and clues of keypoint type, location, and relationship in textual descriptions for keypoint localization. To effectively tune LocLLM, we construct localization-based instruction conversations to connect keypoint description with corresponding coordinates in input image, and fine-tune the whole model in a parameter-efficient training pipeline. LocLLM shows remarkable performance on standard 2D/3D keypoint localization benchmarks. Moreover, incorporating language clues into the localization makes LocLLM show superior flexibility and generalizable capability in cross dataset keypoint localization, and even detecting novel type of keypoints unseen during training.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2406.04659
- https://arxiv.org/pdf/2406.04659
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4399511881
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4399511881Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2406.04659Digital Object Identifier
- Title
-
LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language ModelWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-06-07Full publication date if available
- Authors
-
Dongkai Wang, Shiyu Xuan, Shiliang ZhangList of authors in order
- Landing page
-
https://arxiv.org/abs/2406.04659Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2406.04659Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2406.04659Direct OA link when available
- Concepts
-
Computer science, Artificial intelligence, Language model, Natural language processingTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4399511881 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2406.04659 |
| ids.doi | https://doi.org/10.48550/arxiv.2406.04659 |
| ids.openalex | https://openalex.org/W4399511881 |
| fwci | |
| type | preprint |
| title | LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T13676 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.7840999960899353 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1710 |
| topics[0].subfield.display_name | Information Systems |
| topics[0].display_name | Educational and Technological Research |
| topics[1].id | https://openalex.org/T11550 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.7317000031471252 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Text and Document Classification Technologies |
| topics[2].id | https://openalex.org/T10824 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.701200008392334 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Image Retrieval and Classification Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.5266458988189697 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C154945302 |
| concepts[1].level | 1 |
| concepts[1].score | 0.473873108625412 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[1].display_name | Artificial intelligence |
| concepts[2].id | https://openalex.org/C137293760 |
| concepts[2].level | 2 |
| concepts[2].score | 0.44773900508880615 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[2].display_name | Language model |
| concepts[3].id | https://openalex.org/C204321447 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3730468451976776 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[3].display_name | Natural language processing |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.5266458988189697 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[1].score | 0.473873108625412 |
| keywords[1].display_name | Artificial intelligence |
| keywords[2].id | https://openalex.org/keywords/language-model |
| keywords[2].score | 0.44773900508880615 |
| keywords[2].display_name | Language model |
| keywords[3].id | https://openalex.org/keywords/natural-language-processing |
| keywords[3].score | 0.3730468451976776 |
| keywords[3].display_name | Natural language processing |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2406.04659 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2406.04659 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2406.04659 |
| locations[1].id | doi:10.48550/arxiv.2406.04659 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2406.04659 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101445467 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0266-4340 |
| authorships[0].author.display_name | Dongkai Wang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wang, Dongkai |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5088499366 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-9950-6025 |
| authorships[1].author.display_name | Shiyu Xuan |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Xuan, Shiyu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5101777591 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-9524-1602 |
| authorships[2].author.display_name | Shiliang Zhang |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Zhang, Shiliang |
| authorships[2].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2406.04659 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-06-11T00:00:00 |
| display_name | LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T13676 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.7840999960899353 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1710 |
| primary_topic.subfield.display_name | Information Systems |
| primary_topic.display_name | Educational and Technological Research |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052, https://openalex.org/W2382290278, https://openalex.org/W3204019825 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2406.04659 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2406.04659 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2406.04659 |
| primary_location.id | pmh:oai:arXiv.org:2406.04659 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2406.04659 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2406.04659 |
| publication_date | 2024-06-07 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 33, 119 |
| abstract_inverted_index.To | 18, 94 |
| abstract_inverted_index.We | 46 |
| abstract_inverted_index.as | 64 |
| abstract_inverted_index.by | 10, 14, 36 |
| abstract_inverted_index.in | 43, 88, 110, 118, 148 |
| abstract_inverted_index.is | 8 |
| abstract_inverted_index.of | 2, 78, 82, 158 |
| abstract_inverted_index.on | 40, 127 |
| abstract_inverted_index.to | 103 |
| abstract_inverted_index.we | 98 |
| abstract_inverted_index.LLM | 79 |
| abstract_inverted_index.The | 0 |
| abstract_inverted_index.and | 22, 61, 66, 80, 86, 113, 145, 153 |
| abstract_inverted_index.for | 91 |
| abstract_inverted_index.the | 15, 49, 68, 74, 115, 138 |
| abstract_inverted_index.even | 154 |
| abstract_inverted_index.from | 32 |
| abstract_inverted_index.into | 137 |
| abstract_inverted_index.more | 24 |
| abstract_inverted_index.show | 142 |
| abstract_inverted_index.text | 44, 62 |
| abstract_inverted_index.that | 58 |
| abstract_inverted_index.this | 20, 27 |
| abstract_inverted_index.tune | 96 |
| abstract_inverted_index.type | 157 |
| abstract_inverted_index.with | 107 |
| abstract_inverted_index.work | 28 |
| abstract_inverted_index.(LLM) | 53 |
| abstract_inverted_index.2D/3D | 129 |
| abstract_inverted_index.Model | 52 |
| abstract_inverted_index.based | 39, 54 |
| abstract_inverted_index.clues | 42, 81, 136 |
| abstract_inverted_index.cross | 149 |
| abstract_inverted_index.data. | 17 |
| abstract_inverted_index.first | 50 |
| abstract_inverted_index.human | 4 |
| abstract_inverted_index.input | 111 |
| abstract_inverted_index.makes | 140 |
| abstract_inverted_index.model | 57, 117 |
| abstract_inverted_index.novel | 156 |
| abstract_inverted_index.shows | 124 |
| abstract_inverted_index.takes | 59 |
| abstract_inverted_index.type, | 84 |
| abstract_inverted_index.whole | 116 |
| abstract_inverted_index.LocLLM | 72, 123, 141 |
| abstract_inverted_index.during | 161 |
| abstract_inverted_index.image, | 112 |
| abstract_inverted_index.images | 60 |
| abstract_inverted_index.inputs | 65 |
| abstract_inverted_index.model, | 26 |
| abstract_inverted_index.models | 7 |
| abstract_inverted_index.priors | 12 |
| abstract_inverted_index.pursue | 23 |
| abstract_inverted_index.strong | 75 |
| abstract_inverted_index.unseen | 160 |
| abstract_inverted_index.LocLLM, | 48, 97 |
| abstract_inverted_index.connect | 104 |
| abstract_inverted_index.dataset | 150 |
| abstract_inverted_index.desired | 69 |
| abstract_inverted_index.general | 25 |
| abstract_inverted_index.limited | 9 |
| abstract_inverted_index.outputs | 67 |
| abstract_inverted_index.propose | 47 |
| abstract_inverted_index.studies | 29 |
| abstract_inverted_index.textual | 89 |
| abstract_inverted_index.capacity | 1 |
| abstract_inverted_index.existing | 3 |
| abstract_inverted_index.keypiont | 41 |
| abstract_inverted_index.keypoint | 5, 11, 30, 55, 70, 83, 92, 105, 130, 151 |
| abstract_inverted_index.language | 135 |
| abstract_inverted_index.provided | 13 |
| abstract_inverted_index.standard | 128 |
| abstract_inverted_index.superior | 143 |
| abstract_inverted_index.training | 16, 121 |
| abstract_inverted_index.Moreover, | 133 |
| abstract_inverted_index.alleviate | 19 |
| abstract_inverted_index.construct | 99 |
| abstract_inverted_index.detecting | 155 |
| abstract_inverted_index.different | 34 |
| abstract_inverted_index.fine-tune | 114 |
| abstract_inverted_index.keypoints | 159 |
| abstract_inverted_index.leverages | 73 |
| abstract_inverted_index.location, | 85 |
| abstract_inverted_index.locations | 38 |
| abstract_inverted_index.pipeline. | 122 |
| abstract_inverted_index.reasoning | 37, 76 |
| abstract_inverted_index.training. | 162 |
| abstract_inverted_index.capability | 77, 147 |
| abstract_inverted_index.remarkable | 125 |
| abstract_inverted_index.benchmarks. | 132 |
| abstract_inverted_index.coordinates | 109 |
| abstract_inverted_index.description | 106 |
| abstract_inverted_index.effectively | 95 |
| abstract_inverted_index.flexibility | 144 |
| abstract_inverted_index.instruction | 101 |
| abstract_inverted_index.performance | 126 |
| abstract_inverted_index.perspective | 35 |
| abstract_inverted_index.restriction | 21 |
| abstract_inverted_index.coordinates. | 71 |
| abstract_inverted_index.descriptions | 90 |
| abstract_inverted_index.instructions | 63 |
| abstract_inverted_index.localization | 6, 31, 56, 131, 139 |
| abstract_inverted_index.relationship | 87 |
| abstract_inverted_index.conversations | 102 |
| abstract_inverted_index.corresponding | 108 |
| abstract_inverted_index.descriptions. | 45 |
| abstract_inverted_index.generalizable | 146 |
| abstract_inverted_index.incorporating | 134 |
| abstract_inverted_index.localization, | 152 |
| abstract_inverted_index.localization. | 93 |
| abstract_inverted_index.Large-Language | 51 |
| abstract_inverted_index.localization-based | 100 |
| abstract_inverted_index.parameter-efficient | 120 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |