A Close Look at Spatial Modeling: From Attention to Convolution Article Swipe

PDF

Xu Ma , Huan Wang , Can Qin , Kunpeng Li , Xingchen Zhao , Jie Fu , Yun Fu ·

YOU? · · 2022 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2212.12552

Vision Transformers have shown great promise recently for many vision tasks due to the insightful architecture design and attention mechanism. By revisiting the self-attention responses in Transformers, we empirically observe two interesting issues. First, Vision Transformers present a queryirrelevant behavior at deep layers, where the attention maps exhibit nearly consistent contexts in global scope, regardless of the query patch position (also head-irrelevant). Second, the attention maps are intrinsically sparse, few tokens dominate the attention weights; introducing the knowledge from ConvNets would largely smooth the attention and enhance the performance. Motivated by above observations, we generalize self-attention formulation to abstract a queryirrelevant global context directly and further integrate the global context into convolutions. The resulting model, a Fully Convolutional Vision Transformer (i.e., FCViT), purely consists of convolutional layers and firmly inherits the merits of both attention mechanism and convolutions, including dynamic property, weight sharing, and short- and long-range feature modeling, etc. Experimental results demonstrate the effectiveness of FCViT. With less than 14M parameters, our FCViT-S12 outperforms related work ResT-Lite by 3.7% top1 accuracy on ImageNet-1K. When scaling FCViT to larger models, we still perform better than previous state-of-the-art ConvNeXt with even fewer parameters. FCViT-based models also demonstrate promising transferability to downstream tasks, like object detection, instance segmentation, and semantic segmentation. Codes and models are made available at: https://github.com/ma-xu/FCViT.

Related Topics

Computer Science

Segmentation Fault

Artificial Intelligence

Convolutional Neural Network

Concepts

Computer science Segmentation Artificial intelligence Convolutional neural network Transformer Convolution (computer science) Pattern recognition (psychology) Machine learning Artificial neural network Physics Voltage Quantum mechanics

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2212.12552
PDF: https://arxiv.org/pdf/2212.12552
OA Status: green
Cited By: 8
Related Works: 10
OpenAlex ID: https://openalex.org/W4312225519

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4312225519

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2212.12552

Digital Object Identifier
Title: A Close Look at Spatial Modeling: From Attention to Convolution

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2022

Year of publication
Publication date: 2022-12-23

Full publication date if available
Authors: Xu Ma, Huan Wang, Can Qin, Kunpeng Li, Xingchen Zhao, Jie Fu, Yun Fu

List of authors in order
Landing page: https://arxiv.org/abs/2212.12552

Publisher landing page
PDF URL: https://arxiv.org/pdf/2212.12552

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2212.12552

Direct OA link when available
Concepts: Computer science, Segmentation, Artificial intelligence, Convolutional neural network, Transformer, Convolution (computer science), Pattern recognition (psychology), Machine learning, Artificial neural network, Physics, Voltage, Quantum mechanics

Top concepts (fields/topics) attached by OpenAlex
Cited by: 8

Total citation count in OpenAlex
Citations by year (recent): 2025: 1, 2024: 4, 2023: 3

Per-year citation counts (last 5 years)
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4312225519
doi	https://doi.org/10.48550/arxiv.2212.12552
ids.doi	https://doi.org/10.48550/arxiv.2212.12552
ids.openalex	https://openalex.org/W4312225519
fwci
type	preprint
title	A Close Look at Spatial Modeling: From Attention to Convolution
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10036
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9994999766349792
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1707
topics[0].subfield.display_name	Computer Vision and Pattern Recognition
topics[0].display_name	Advanced Neural Network Applications
topics[1].id	https://openalex.org/T10627
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9965000152587891
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1707
topics[1].subfield.display_name	Computer Vision and Pattern Recognition
topics[1].display_name	Advanced Image and Video Retrieval Techniques
topics[2].id	https://openalex.org/T11307
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.995199978351593
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1702
topics[2].subfield.display_name	Artificial Intelligence
topics[2].display_name	Domain Adaptation and Few-Shot Learning
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C41008148
concepts[0].level	0
concepts[0].score	0.8197352886199951
concepts[0].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[0].display_name	Computer science
concepts[1].id	https://openalex.org/C89600930
concepts[1].level	2
concepts[1].score	0.6053475737571716
concepts[1].wikidata	https://www.wikidata.org/wiki/Q1423946
concepts[1].display_name	Segmentation
concepts[2].id	https://openalex.org/C154945302
concepts[2].level	1
concepts[2].score	0.5655044913291931
concepts[2].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[2].display_name	Artificial intelligence
concepts[3].id	https://openalex.org/C81363708
concepts[3].level	2
concepts[3].score	0.5447458028793335
concepts[3].wikidata	https://www.wikidata.org/wiki/Q17084460
concepts[3].display_name	Convolutional neural network
concepts[4].id	https://openalex.org/C66322947
concepts[4].level	3
concepts[4].score	0.511532187461853
concepts[4].wikidata	https://www.wikidata.org/wiki/Q11658
concepts[4].display_name	Transformer
concepts[5].id	https://openalex.org/C45347329
concepts[5].level	3
concepts[5].score	0.45510780811309814
concepts[5].wikidata	https://www.wikidata.org/wiki/Q5166604
concepts[5].display_name	Convolution (computer science)
concepts[6].id	https://openalex.org/C153180895
concepts[6].level	2
concepts[6].score	0.36244720220565796
concepts[6].wikidata	https://www.wikidata.org/wiki/Q7148389
concepts[6].display_name	Pattern recognition (psychology)
concepts[7].id	https://openalex.org/C119857082
concepts[7].level	1
concepts[7].score	0.36211252212524414
concepts[7].wikidata	https://www.wikidata.org/wiki/Q2539
concepts[7].display_name	Machine learning
concepts[8].id	https://openalex.org/C50644808
concepts[8].level	2
concepts[8].score	0.23564526438713074
concepts[8].wikidata	https://www.wikidata.org/wiki/Q192776
concepts[8].display_name	Artificial neural network
concepts[9].id	https://openalex.org/C121332964
concepts[9].level	0
concepts[9].score	0.0
concepts[9].wikidata	https://www.wikidata.org/wiki/Q413
concepts[9].display_name	Physics
concepts[10].id	https://openalex.org/C165801399
concepts[10].level	2
concepts[10].score	0.0
concepts[10].wikidata	https://www.wikidata.org/wiki/Q25428
concepts[10].display_name	Voltage
concepts[11].id	https://openalex.org/C62520636
concepts[11].level	1
concepts[11].score	0.0
concepts[11].wikidata	https://www.wikidata.org/wiki/Q944
concepts[11].display_name	Quantum mechanics
keywords[0].id	https://openalex.org/keywords/computer-science
keywords[0].score	0.8197352886199951
keywords[0].display_name	Computer science
keywords[1].id	https://openalex.org/keywords/segmentation
keywords[1].score	0.6053475737571716
keywords[1].display_name	Segmentation
keywords[2].id	https://openalex.org/keywords/artificial-intelligence
keywords[2].score	0.5655044913291931
keywords[2].display_name	Artificial intelligence
keywords[3].id	https://openalex.org/keywords/convolutional-neural-network
keywords[3].score	0.5447458028793335
keywords[3].display_name	Convolutional neural network
keywords[4].id	https://openalex.org/keywords/transformer
keywords[4].score	0.511532187461853
keywords[4].display_name	Transformer
keywords[5].id	https://openalex.org/keywords/convolution
keywords[5].score	0.45510780811309814
keywords[5].display_name	Convolution (computer science)
keywords[6].id	https://openalex.org/keywords/pattern-recognition
keywords[6].score	0.36244720220565796
keywords[6].display_name	Pattern recognition (psychology)
keywords[7].id	https://openalex.org/keywords/machine-learning
keywords[7].score	0.36211252212524414
keywords[7].display_name	Machine learning
keywords[8].id	https://openalex.org/keywords/artificial-neural-network
keywords[8].score	0.23564526438713074
keywords[8].display_name	Artificial neural network
language	en
locations[0].id	pmh:oai:arXiv.org:2212.12552
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2212.12552
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2212.12552
locations[1].id	doi:10.48550/arxiv.2212.12552
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license	cc-by
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id	https://openalex.org/licenses/cc-by
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2212.12552
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5018168958
authorships[0].author.orcid	https://orcid.org/0000-0001-7674-3589
authorships[0].author.display_name	Xu Ma
authorships[0].author_position	first
authorships[0].raw_author_name	Ma, Xu
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5100331980
authorships[1].author.orcid	https://orcid.org/0000-0001-6951-901X
authorships[1].author.display_name	Huan Wang
authorships[1].author_position	middle
authorships[1].raw_author_name	Wang, Huan
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5021042598
authorships[2].author.orcid	https://orcid.org/0000-0003-0712-5378
authorships[2].author.display_name	Can Qin
authorships[2].author_position	middle
authorships[2].raw_author_name	Qin, Can
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5100654289
authorships[3].author.orcid	https://orcid.org/0000-0001-5805-793X
authorships[3].author.display_name	Kunpeng Li
authorships[3].author_position	middle
authorships[3].raw_author_name	Li, Kunpeng
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5101447875
authorships[4].author.orcid	https://orcid.org/0000-0002-6740-3432
authorships[4].author.display_name	Xingchen Zhao
authorships[4].author_position	middle
authorships[4].raw_author_name	Zhao, Xingchen
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5100666921
authorships[5].author.orcid	https://orcid.org/0000-0001-5596-8391
authorships[5].author.display_name	Jie Fu
authorships[5].author_position	middle
authorships[5].raw_author_name	Fu, Jie
authorships[5].is_corresponding	False
authorships[6].author.id	https://openalex.org/A5005819096
authorships[6].author.orcid	https://orcid.org/0000-0002-5098-2853
authorships[6].author.display_name	Yun Fu
authorships[6].author_position	last
authorships[6].raw_author_name	Fu, Yun
authorships[6].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2212.12552
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	A Close Look at Spatial Modeling: From Attention to Convolution
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10036
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9994999766349792
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1707
primary_topic.subfield.display_name	Computer Vision and Pattern Recognition
primary_topic.display_name	Advanced Neural Network Applications
related_works	https://openalex.org/W4293226380, https://openalex.org/W4321487865, https://openalex.org/W4313906399, https://openalex.org/W4391266461, https://openalex.org/W2590798552, https://openalex.org/W2811106690, https://openalex.org/W4239306820, https://openalex.org/W2947043951, https://openalex.org/W2964954556, https://openalex.org/W2890372105
cited_by_count	8
counts_by_year[0].year	2025
counts_by_year[0].cited_by_count	1
counts_by_year[1].year	2024
counts_by_year[1].cited_by_count	4
counts_by_year[2].year	2023
counts_by_year[2].cited_by_count	3
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2212.12552
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2212.12552
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2212.12552
primary_location.id	pmh:oai:arXiv.org:2212.12552
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2212.12552
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2212.12552
publication_date	2022-12-23
publication_year	2022
referenced_works_count	0
abstract_inverted_index.a	37, 99, 115
abstract_inverted_index.By	20
abstract_inverted_index.at	40
abstract_inverted_index.by	90, 168
abstract_inverted_index.in	25, 51
abstract_inverted_index.of	55, 124, 132, 155
abstract_inverted_index.on	172
abstract_inverted_index.to	12, 97, 177, 198
abstract_inverted_index.we	27, 93, 180
abstract_inverted_index.14M	160
abstract_inverted_index.The	112
abstract_inverted_index.and	17, 85, 104, 127, 136, 143, 145, 206, 210
abstract_inverted_index.are	66, 212
abstract_inverted_index.at:	215
abstract_inverted_index.due	11
abstract_inverted_index.few	69
abstract_inverted_index.for	7
abstract_inverted_index.our	162
abstract_inverted_index.the	13, 22, 44, 56, 63, 72, 76, 83, 87, 107, 130, 153
abstract_inverted_index.two	30
abstract_inverted_index.3.7%	169
abstract_inverted_index.When	174
abstract_inverted_index.With	157
abstract_inverted_index.also	194
abstract_inverted_index.both	133
abstract_inverted_index.deep	41
abstract_inverted_index.etc.	149
abstract_inverted_index.even	189
abstract_inverted_index.from	78
abstract_inverted_index.have	2
abstract_inverted_index.into	110
abstract_inverted_index.less	158
abstract_inverted_index.like	201
abstract_inverted_index.made	213
abstract_inverted_index.many	8
abstract_inverted_index.maps	46, 65
abstract_inverted_index.than	159, 184
abstract_inverted_index.top1	170
abstract_inverted_index.with	188
abstract_inverted_index.work	166
abstract_inverted_index.(also	60
abstract_inverted_index.Codes	209
abstract_inverted_index.FCViT	176
abstract_inverted_index.Fully	116
abstract_inverted_index.above	91
abstract_inverted_index.fewer	190
abstract_inverted_index.great	4
abstract_inverted_index.patch	58
abstract_inverted_index.query	57
abstract_inverted_index.shown	3
abstract_inverted_index.still	181
abstract_inverted_index.tasks	10
abstract_inverted_index.where	43
abstract_inverted_index.would	80
abstract_inverted_index.(i.e.,	120
abstract_inverted_index.FCViT.	156
abstract_inverted_index.First,	33
abstract_inverted_index.Vision	0, 34, 118
abstract_inverted_index.better	183
abstract_inverted_index.design	16
abstract_inverted_index.firmly	128
abstract_inverted_index.global	52, 101, 108
abstract_inverted_index.larger	178
abstract_inverted_index.layers	126
abstract_inverted_index.merits	131
abstract_inverted_index.model,	114
abstract_inverted_index.models	193, 211
abstract_inverted_index.nearly	48
abstract_inverted_index.object	202
abstract_inverted_index.purely	122
abstract_inverted_index.scope,	53
abstract_inverted_index.short-	144
abstract_inverted_index.smooth	82
abstract_inverted_index.tasks,	200
abstract_inverted_index.tokens	70
abstract_inverted_index.vision	9
abstract_inverted_index.weight	141
abstract_inverted_index.FCViT),	121
abstract_inverted_index.Second,	62
abstract_inverted_index.context	102, 109
abstract_inverted_index.dynamic	139
abstract_inverted_index.enhance	86
abstract_inverted_index.exhibit	47
abstract_inverted_index.feature	147
abstract_inverted_index.further	105
abstract_inverted_index.issues.	32
abstract_inverted_index.largely	81
abstract_inverted_index.layers,	42
abstract_inverted_index.models,	179
abstract_inverted_index.observe	29
abstract_inverted_index.perform	182
abstract_inverted_index.present	36
abstract_inverted_index.promise	5
abstract_inverted_index.related	165
abstract_inverted_index.results	151
abstract_inverted_index.scaling	175
abstract_inverted_index.sparse,	68
abstract_inverted_index.ConvNeXt	187
abstract_inverted_index.ConvNets	79
abstract_inverted_index.abstract	98
abstract_inverted_index.accuracy	171
abstract_inverted_index.behavior	39
abstract_inverted_index.consists	123
abstract_inverted_index.contexts	50
abstract_inverted_index.directly	103
abstract_inverted_index.dominate	71
abstract_inverted_index.inherits	129
abstract_inverted_index.instance	204
abstract_inverted_index.position	59
abstract_inverted_index.previous	185
abstract_inverted_index.recently	6
abstract_inverted_index.semantic	207
abstract_inverted_index.sharing,	142
abstract_inverted_index.weights;	74
abstract_inverted_index.FCViT-S12	163
abstract_inverted_index.Motivated	89
abstract_inverted_index.ResT-Lite	167
abstract_inverted_index.attention	18, 45, 64, 73, 84, 134
abstract_inverted_index.available	214
abstract_inverted_index.including	138
abstract_inverted_index.integrate	106
abstract_inverted_index.knowledge	77
abstract_inverted_index.mechanism	135
abstract_inverted_index.modeling,	148
abstract_inverted_index.promising	196
abstract_inverted_index.property,	140
abstract_inverted_index.responses	24
abstract_inverted_index.resulting	113
abstract_inverted_index.consistent	49
abstract_inverted_index.detection,	203
abstract_inverted_index.downstream	199
abstract_inverted_index.generalize	94
abstract_inverted_index.insightful	14
abstract_inverted_index.long-range	146
abstract_inverted_index.mechanism.	19
abstract_inverted_index.regardless	54
abstract_inverted_index.revisiting	21
abstract_inverted_index.FCViT-based	192
abstract_inverted_index.Transformer	119
abstract_inverted_index.demonstrate	152, 195
abstract_inverted_index.empirically	28
abstract_inverted_index.formulation	96
abstract_inverted_index.interesting	31
abstract_inverted_index.introducing	75
abstract_inverted_index.outperforms	164
abstract_inverted_index.parameters,	161
abstract_inverted_index.parameters.	191
abstract_inverted_index.Experimental	150
abstract_inverted_index.ImageNet-1K.	173
abstract_inverted_index.Transformers	1, 35
abstract_inverted_index.architecture	15
abstract_inverted_index.performance.	88
abstract_inverted_index.Convolutional	117
abstract_inverted_index.Transformers,	26
abstract_inverted_index.convolutional	125
abstract_inverted_index.convolutions,	137
abstract_inverted_index.convolutions.	111
abstract_inverted_index.effectiveness	154
abstract_inverted_index.intrinsically	67
abstract_inverted_index.observations,	92
abstract_inverted_index.segmentation,	205
abstract_inverted_index.segmentation.	208
abstract_inverted_index.self-attention	23, 95
abstract_inverted_index.queryirrelevant	38, 100
abstract_inverted_index.transferability	197
abstract_inverted_index.state-of-the-art	186
abstract_inverted_index.head-irrelevant).	61
abstract_inverted_index.https://github.com/ma-xu/FCViT.	216
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	7
sustainable_development_goals[0].id	https://metadata.un.org/sdg/11
sustainable_development_goals[0].score	0.6299999952316284
sustainable_development_goals[0].display_name	Sustainable cities and communities
citation_normalized_percentile