Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation Article Swipe

PDF

Hao Zhang , Yongqiang Ma , Wenqi Shao , Ping Luo , Nanning Zheng , Kaipeng Zhang ·

YOU? · · 2024 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2410.03174

Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation. Vision Transformers (ViTs) have advanced global modeling through self-attention but suffer from quadratic computational complexity with respect to token count, limiting their efficiency and scalability to high-resolution inputs, especially on mobile and resource-constrained devices. State Space Models (SSMs), exemplified by Mamba, offer an efficient alternative by combining global receptive fields with linear computational complexity, enabling scalable and resource-friendly sequence modeling. However, when applied to dense prediction tasks, existing visual SSMs face key limitations: weak spatial inductive bias, long-range forgetting from hidden state decay, and low-resolution outputs that hinder fine-grained localization. To address these issues, we propose the Dynamic Visual State Space (DVSS) block, which augments visual state space models with multi-scale convolutional operations to enhance local spatial representations and strengthen spatial inductive biases. Through architectural exploration and theoretical analysis, we incorporate deformable operation into the DVSS block, identifying it as an efficient and effective mechanism to enhance semantic aggregation and mitigate long-range forgetting via input-dependent, adaptive spatial sampling. We embed DVSS into a multi-branch high-resolution architecture to build HRVMamba, a novel model for efficient high-resolution representation learning. Extensive experiments on human pose estimation, image classification, and semantic segmentation show that HRVMamba performs competitively against leading CNN-, ViT-, and SSM-based baselines. Code is available at https://github.com/zhanghao5201/PoseVMamba.

Related Topics

Computer Science

Image Resolution

Artificial Intelligence

Concepts

Computer science State (computer science) Space (punctuation) Resolution (logic) State space High resolution Artificial intelligence Algorithm Geology Remote sensing Mathematics Statistics Operating system

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2410.03174
PDF: https://arxiv.org/pdf/2410.03174
OA Status: green
Cited By: 1
Related Works: 10
OpenAlex ID: https://openalex.org/W4403885770

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4403885770

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2410.03174

Digital Object Identifier
Title: Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2024

Year of publication
Publication date: 2024-10-04

Full publication date if available
Authors: Hao Zhang, Yongqiang Ma, Wenqi Shao, Ping Luo, Nanning Zheng, Kaipeng Zhang

List of authors in order
Landing page: https://arxiv.org/abs/2410.03174

Publisher landing page
PDF URL: https://arxiv.org/pdf/2410.03174

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2410.03174

Direct OA link when available
Concepts: Computer science, State (computer science), Space (punctuation), Resolution (logic), State space, High resolution, Artificial intelligence, Algorithm, Geology, Remote sensing, Mathematics, Statistics, Operating system

Top concepts (fields/topics) attached by OpenAlex
Cited by: 1

Total citation count in OpenAlex
Citations by year (recent): 2025: 1

Per-year citation counts (last 5 years)
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4403885770
doi	https://doi.org/10.48550/arxiv.2410.03174
ids.doi	https://doi.org/10.48550/arxiv.2410.03174
ids.openalex	https://openalex.org/W4403885770
fwci
type	preprint
title	Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10036
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9472000002861023
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1707
topics[0].subfield.display_name	Computer Vision and Pattern Recognition
topics[0].display_name	Advanced Neural Network Applications
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C41008148
concepts[0].level	0
concepts[0].score	0.5516949892044067
concepts[0].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[0].display_name	Computer science
concepts[1].id	https://openalex.org/C48103436
concepts[1].level	2
concepts[1].score	0.4839799404144287
concepts[1].wikidata	https://www.wikidata.org/wiki/Q599031
concepts[1].display_name	State (computer science)
concepts[2].id	https://openalex.org/C2778572836
concepts[2].level	2
concepts[2].score	0.4800373315811157
concepts[2].wikidata	https://www.wikidata.org/wiki/Q380933
concepts[2].display_name	Space (punctuation)
concepts[3].id	https://openalex.org/C138268822
concepts[3].level	2
concepts[3].score	0.47420385479927063
concepts[3].wikidata	https://www.wikidata.org/wiki/Q1051925
concepts[3].display_name	Resolution (logic)
concepts[4].id	https://openalex.org/C72434380
concepts[4].level	2
concepts[4].score	0.4461216628551483
concepts[4].wikidata	https://www.wikidata.org/wiki/Q230930
concepts[4].display_name	State space
concepts[5].id	https://openalex.org/C3020199158
concepts[5].level	2
concepts[5].score	0.41850751638412476
concepts[5].wikidata	https://www.wikidata.org/wiki/Q210521
concepts[5].display_name	High resolution
concepts[6].id	https://openalex.org/C154945302
concepts[6].level	1
concepts[6].score	0.3989041745662689
concepts[6].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[6].display_name	Artificial intelligence
concepts[7].id	https://openalex.org/C11413529
concepts[7].level	1
concepts[7].score	0.26431331038475037
concepts[7].wikidata	https://www.wikidata.org/wiki/Q8366
concepts[7].display_name	Algorithm
concepts[8].id	https://openalex.org/C127313418
concepts[8].level	0
concepts[8].score	0.23380562663078308
concepts[8].wikidata	https://www.wikidata.org/wiki/Q1069
concepts[8].display_name	Geology
concepts[9].id	https://openalex.org/C62649853
concepts[9].level	1
concepts[9].score	0.18186193704605103
concepts[9].wikidata	https://www.wikidata.org/wiki/Q199687
concepts[9].display_name	Remote sensing
concepts[10].id	https://openalex.org/C33923547
concepts[10].level	0
concepts[10].score	0.15710678696632385
concepts[10].wikidata	https://www.wikidata.org/wiki/Q395
concepts[10].display_name	Mathematics
concepts[11].id	https://openalex.org/C105795698
concepts[11].level	1
concepts[11].score	0.08478179574012756
concepts[11].wikidata	https://www.wikidata.org/wiki/Q12483
concepts[11].display_name	Statistics
concepts[12].id	https://openalex.org/C111919701
concepts[12].level	1
concepts[12].score	0.057827144861221313
concepts[12].wikidata	https://www.wikidata.org/wiki/Q9135
concepts[12].display_name	Operating system
keywords[0].id	https://openalex.org/keywords/computer-science
keywords[0].score	0.5516949892044067
keywords[0].display_name	Computer science
keywords[1].id	https://openalex.org/keywords/state
keywords[1].score	0.4839799404144287
keywords[1].display_name	State (computer science)
keywords[2].id	https://openalex.org/keywords/space
keywords[2].score	0.4800373315811157
keywords[2].display_name	Space (punctuation)
keywords[3].id	https://openalex.org/keywords/resolution
keywords[3].score	0.47420385479927063
keywords[3].display_name	Resolution (logic)
keywords[4].id	https://openalex.org/keywords/state-space
keywords[4].score	0.4461216628551483
keywords[4].display_name	State space
keywords[5].id	https://openalex.org/keywords/high-resolution
keywords[5].score	0.41850751638412476
keywords[5].display_name	High resolution
keywords[6].id	https://openalex.org/keywords/artificial-intelligence
keywords[6].score	0.3989041745662689
keywords[6].display_name	Artificial intelligence
keywords[7].id	https://openalex.org/keywords/algorithm
keywords[7].score	0.26431331038475037
keywords[7].display_name	Algorithm
keywords[8].id	https://openalex.org/keywords/geology
keywords[8].score	0.23380562663078308
keywords[8].display_name	Geology
keywords[9].id	https://openalex.org/keywords/remote-sensing
keywords[9].score	0.18186193704605103
keywords[9].display_name	Remote sensing
keywords[10].id	https://openalex.org/keywords/mathematics
keywords[10].score	0.15710678696632385
keywords[10].display_name	Mathematics
keywords[11].id	https://openalex.org/keywords/statistics
keywords[11].score	0.08478179574012756
keywords[11].display_name	Statistics
keywords[12].id	https://openalex.org/keywords/operating-system
keywords[12].score	0.057827144861221313
keywords[12].display_name	Operating system
language	en
locations[0].id	pmh:oai:arXiv.org:2410.03174
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2410.03174
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2410.03174
locations[1].id	doi:10.48550/arxiv.2410.03174
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2410.03174
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5100396897
authorships[0].author.orcid	https://orcid.org/0000-0002-3572-7053
authorships[0].author.display_name	Hao Zhang
authorships[0].author_position	first
authorships[0].raw_author_name	Zhang, Hao
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5101425591
authorships[1].author.orcid	https://orcid.org/0000-0001-8450-0042
authorships[1].author.display_name	Yongqiang Ma
authorships[1].author_position	middle
authorships[1].raw_author_name	Ma, Yongqiang
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5101827257
authorships[2].author.orcid	https://orcid.org/0000-0003-3781-4086
authorships[2].author.display_name	Wenqi Shao
authorships[2].author_position	middle
authorships[2].raw_author_name	Shao, Wenqi
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5113394742
authorships[3].author.orcid
authorships[3].author.display_name	Ping Luo
authorships[3].author_position	middle
authorships[3].raw_author_name	Luo, Ping
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5047405956
authorships[4].author.orcid	https://orcid.org/0000-0003-1608-8257
authorships[4].author.display_name	Nanning Zheng
authorships[4].author_position	middle
authorships[4].raw_author_name	Zheng, Nanning
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5036606244
authorships[5].author.orcid	https://orcid.org/0000-0001-6105-6532
authorships[5].author.display_name	Kaipeng Zhang
authorships[5].author_position	last
authorships[5].raw_author_name	Zhang, Kaipeng
authorships[5].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2410.03174
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2024-10-31T00:00:00
display_name	Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10036
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9472000002861023
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1707
primary_topic.subfield.display_name	Computer Vision and Pattern Recognition
primary_topic.display_name	Advanced Neural Network Applications
related_works	https://openalex.org/W1517180214, https://openalex.org/W2082780921, https://openalex.org/W2025517136, https://openalex.org/W2028664052, https://openalex.org/W634414395, https://openalex.org/W2186016250, https://openalex.org/W1584839083, https://openalex.org/W4212954839, https://openalex.org/W3190051883, https://openalex.org/W4401570279
cited_by_count	1
counts_by_year[0].year	2025
counts_by_year[0].cited_by_count	1
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2410.03174
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2410.03174
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2410.03174
primary_location.id	pmh:oai:arXiv.org:2410.03174
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2410.03174
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2410.03174
publication_date	2024-10-04
publication_year	2024
referenced_works_count	0
abstract_inverted_index.a	181, 188
abstract_inverted_index.To	109
abstract_inverted_index.We	177
abstract_inverted_index.an	61, 159
abstract_inverted_index.as	15, 158
abstract_inverted_index.at	222
abstract_inverted_index.by	58, 64
abstract_inverted_index.is	8, 220
abstract_inverted_index.it	157
abstract_inverted_index.on	48, 198
abstract_inverted_index.to	36, 44, 82, 132, 164, 185
abstract_inverted_index.we	113, 148
abstract_inverted_index.and	42, 50, 75, 102, 137, 145, 161, 168, 204, 216
abstract_inverted_index.but	28
abstract_inverted_index.for	10, 191
abstract_inverted_index.key	90
abstract_inverted_index.the	115, 153
abstract_inverted_index.via	172
abstract_inverted_index.Code	219
abstract_inverted_index.DVSS	154, 179
abstract_inverted_index.SSMs	88
abstract_inverted_index.face	89
abstract_inverted_index.from	30, 98
abstract_inverted_index.have	22
abstract_inverted_index.into	152, 180
abstract_inverted_index.pose	17, 200
abstract_inverted_index.show	207
abstract_inverted_index.such	14
abstract_inverted_index.that	105, 208
abstract_inverted_index.weak	92
abstract_inverted_index.when	80
abstract_inverted_index.with	34, 69, 128
abstract_inverted_index.CNN-,	214
abstract_inverted_index.Space	54, 119
abstract_inverted_index.State	53, 118
abstract_inverted_index.ViT-,	215
abstract_inverted_index.bias,	95
abstract_inverted_index.build	186
abstract_inverted_index.dense	11, 83
abstract_inverted_index.embed	178
abstract_inverted_index.human	16, 199
abstract_inverted_index.image	202
abstract_inverted_index.local	134
abstract_inverted_index.model	190
abstract_inverted_index.novel	189
abstract_inverted_index.offer	60
abstract_inverted_index.space	126
abstract_inverted_index.state	100, 125
abstract_inverted_index.tasks	13
abstract_inverted_index.their	40
abstract_inverted_index.these	111
abstract_inverted_index.token	37
abstract_inverted_index.which	122
abstract_inverted_index.while	3
abstract_inverted_index.(DVSS)	120
abstract_inverted_index.(ViTs)	21
abstract_inverted_index.Mamba,	59
abstract_inverted_index.Models	55
abstract_inverted_index.Vision	19
abstract_inverted_index.Visual	117
abstract_inverted_index.block,	121, 155
abstract_inverted_index.count,	38
abstract_inverted_index.decay,	101
abstract_inverted_index.fields	68
abstract_inverted_index.global	24, 66
abstract_inverted_index.hidden	99
abstract_inverted_index.hinder	106
abstract_inverted_index.linear	70
abstract_inverted_index.mobile	49
abstract_inverted_index.models	127
abstract_inverted_index.suffer	29
abstract_inverted_index.tasks,	85
abstract_inverted_index.visual	6, 87, 124
abstract_inverted_index.(SSMs),	56
abstract_inverted_index.Dynamic	116
abstract_inverted_index.Through	142
abstract_inverted_index.address	110
abstract_inverted_index.against	212
abstract_inverted_index.applied	81
abstract_inverted_index.biases.	141
abstract_inverted_index.crucial	9
abstract_inverted_index.enhance	133, 165
abstract_inverted_index.inputs,	46
abstract_inverted_index.issues,	112
abstract_inverted_index.leading	213
abstract_inverted_index.outputs	104
abstract_inverted_index.propose	114
abstract_inverted_index.respect	35
abstract_inverted_index.spatial	93, 135, 139, 175
abstract_inverted_index.through	26
abstract_inverted_index.HRVMamba	209
abstract_inverted_index.However,	79
abstract_inverted_index.adaptive	174
abstract_inverted_index.advanced	23
abstract_inverted_index.augments	123
abstract_inverted_index.devices.	52
abstract_inverted_index.enabling	73
abstract_inverted_index.existing	86
abstract_inverted_index.limiting	39
abstract_inverted_index.mitigate	169
abstract_inverted_index.modeling	25
abstract_inverted_index.performs	210
abstract_inverted_index.scalable	74
abstract_inverted_index.semantic	166, 205
abstract_inverted_index.sequence	77
abstract_inverted_index.Capturing	0
abstract_inverted_index.Extensive	196
abstract_inverted_index.HRVMamba,	187
abstract_inverted_index.SSM-based	217
abstract_inverted_index.analysis,	147
abstract_inverted_index.available	221
abstract_inverted_index.combining	65
abstract_inverted_index.effective	162
abstract_inverted_index.efficient	62, 160, 192
abstract_inverted_index.inductive	94, 140
abstract_inverted_index.learning.	195
abstract_inverted_index.mechanism	163
abstract_inverted_index.modeling.	78
abstract_inverted_index.operation	151
abstract_inverted_index.quadratic	31
abstract_inverted_index.receptive	67
abstract_inverted_index.sampling.	176
abstract_inverted_index.baselines.	218
abstract_inverted_index.complexity	33
abstract_inverted_index.deformable	150
abstract_inverted_index.efficiency	41
abstract_inverted_index.especially	47
abstract_inverted_index.forgetting	97, 171
abstract_inverted_index.long-range	1, 96, 170
abstract_inverted_index.operations	131
abstract_inverted_index.prediction	12, 84
abstract_inverted_index.preserving	4
abstract_inverted_index.strengthen	138
abstract_inverted_index.aggregation	167
abstract_inverted_index.alternative	63
abstract_inverted_index.complexity,	72
abstract_inverted_index.estimation,	201
abstract_inverted_index.estimation.	18
abstract_inverted_index.exemplified	57
abstract_inverted_index.experiments	197
abstract_inverted_index.exploration	144
abstract_inverted_index.identifying	156
abstract_inverted_index.incorporate	149
abstract_inverted_index.multi-scale	129
abstract_inverted_index.scalability	43
abstract_inverted_index.theoretical	146
abstract_inverted_index.Transformers	20
abstract_inverted_index.architecture	184
abstract_inverted_index.dependencies	2
abstract_inverted_index.fine-grained	107
abstract_inverted_index.limitations:	91
abstract_inverted_index.multi-branch	182
abstract_inverted_index.segmentation	206
abstract_inverted_index.architectural	143
abstract_inverted_index.competitively	211
abstract_inverted_index.computational	32, 71
abstract_inverted_index.convolutional	130
abstract_inverted_index.localization.	108
abstract_inverted_index.low-resolution	103
abstract_inverted_index.representation	194
abstract_inverted_index.self-attention	27
abstract_inverted_index.classification,	203
abstract_inverted_index.high-resolution	5, 45, 183, 193
abstract_inverted_index.representations	7, 136
abstract_inverted_index.input-dependent,	173
abstract_inverted_index.resource-friendly	76
abstract_inverted_index.resource-constrained	51
abstract_inverted_index.https://github.com/zhanghao5201/PoseVMamba.	223
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	6
citation_normalized_percentile