M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection Article Swipe

PDF

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2309.08365

Most existing salient object detection methods mostly use U-Net or feature pyramid structure, which simply aggregates feature maps of different scales, ignoring the uniqueness and interdependence of them and their respective contributions to the final prediction. To overcome these, we propose the M$^3$Net, i.e., the Multilevel, Mixed and Multistage attention network for Salient Object Detection (SOD). Firstly, we propose Multiscale Interaction Block which innovatively introduces the cross-attention approach to achieve the interaction between multilevel features, allowing high-level features to guide low-level feature learning and thus enhancing salient regions. Secondly, considering the fact that previous Transformer based SOD methods locate salient regions only using global self-attention while inevitably overlooking the details of complex objects, we propose the Mixed Attention Block. This block combines global self-attention and window self-attention, aiming at modeling context at both global and local levels to further improve the accuracy of the prediction map. Finally, we proposed a multilevel supervision strategy to optimize the aggregated feature stage-by-stage. Experiments on six challenging datasets demonstrate that the proposed M$^3$Net surpasses recent CNN and Transformer-based SOD arts in terms of four metrics. Codes are available at https://github.com/I2-Multimedia-Lab/M3Net.

Related Topics

Computer Science

Artificial Intelligence

Electrical Engineering

Philosophy

Voltage

Geometry

Concepts

Salient Computer science Feature (linguistics) Block (permutation group theory) Artificial intelligence Transformer Pattern recognition (psychology) Data mining Machine learning Mathematics Engineering Electrical engineering Philosophy Linguistics Voltage Geometry

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2309.08365
PDF: https://arxiv.org/pdf/2309.08365
OA Status: green
Cited By: 11
Related Works: 10
OpenAlex ID: https://openalex.org/W4386841537

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4386841537

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2309.08365

Digital Object Identifier
Title: M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2023

Year of publication
Publication date: 2023-09-15

Full publication date if available
Authors: Yuan Yao, Pan Gao, Xiaoyang Tan

List of authors in order
Landing page: https://arxiv.org/abs/2309.08365

Publisher landing page
PDF URL: https://arxiv.org/pdf/2309.08365

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2309.08365

Direct OA link when available
Concepts: Salient, Computer science, Feature (linguistics), Block (permutation group theory), Artificial intelligence, Transformer, Pattern recognition (psychology), Data mining, Machine learning, Mathematics, Engineering, Electrical engineering, Philosophy, Linguistics, Voltage, Geometry

Top concepts (fields/topics) attached by OpenAlex
Cited by: 11

Total citation count in OpenAlex
Citations by year (recent): 2025: 7, 2024: 4

Per-year citation counts (last 5 years)
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4386841537
doi	https://doi.org/10.48550/arxiv.2309.08365
ids.doi	https://doi.org/10.48550/arxiv.2309.08365
ids.openalex	https://openalex.org/W4386841537
fwci
type	preprint
title	M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T11605
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9997000098228455
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1707
topics[0].subfield.display_name	Computer Vision and Pattern Recognition
topics[0].display_name	Visual Attention and Saliency Detection
topics[1].id	https://openalex.org/T10648
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9573000073432922
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1709
topics[1].subfield.display_name	Human-Computer Interaction
topics[1].display_name	Virtual Reality Applications and Impacts
topics[2].id	https://openalex.org/T12650
topics[2].field.id	https://openalex.org/fields/28
topics[2].field.display_name	Neuroscience
topics[2].score	0.9545000195503235
topics[2].domain.id	https://openalex.org/domains/1
topics[2].domain.display_name	Life Sciences
topics[2].subfield.id	https://openalex.org/subfields/2805
topics[2].subfield.display_name	Cognitive Neuroscience
topics[2].display_name	Aesthetic Perception and Analysis
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C2780719617
concepts[0].level	2
concepts[0].score	0.8256196975708008
concepts[0].wikidata	https://www.wikidata.org/wiki/Q1030752
concepts[0].display_name	Salient
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.7454817891120911
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C2776401178
concepts[2].level	2
concepts[2].score	0.5876179933547974
concepts[2].wikidata	https://www.wikidata.org/wiki/Q12050496
concepts[2].display_name	Feature (linguistics)
concepts[3].id	https://openalex.org/C2777210771
concepts[3].level	2
concepts[3].score	0.5414296984672546
concepts[3].wikidata	https://www.wikidata.org/wiki/Q4927124
concepts[3].display_name	Block (permutation group theory)
concepts[4].id	https://openalex.org/C154945302
concepts[4].level	1
concepts[4].score	0.5041807889938354
concepts[4].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[4].display_name	Artificial intelligence
concepts[5].id	https://openalex.org/C66322947
concepts[5].level	3
concepts[5].score	0.43872907757759094
concepts[5].wikidata	https://www.wikidata.org/wiki/Q11658
concepts[5].display_name	Transformer
concepts[6].id	https://openalex.org/C153180895
concepts[6].level	2
concepts[6].score	0.40482082962989807
concepts[6].wikidata	https://www.wikidata.org/wiki/Q7148389
concepts[6].display_name	Pattern recognition (psychology)
concepts[7].id	https://openalex.org/C124101348
concepts[7].level	1
concepts[7].score	0.39513009786605835
concepts[7].wikidata	https://www.wikidata.org/wiki/Q172491
concepts[7].display_name	Data mining
concepts[8].id	https://openalex.org/C119857082
concepts[8].level	1
concepts[8].score	0.3706473112106323
concepts[8].wikidata	https://www.wikidata.org/wiki/Q2539
concepts[8].display_name	Machine learning
concepts[9].id	https://openalex.org/C33923547
concepts[9].level	0
concepts[9].score	0.10511744022369385
concepts[9].wikidata	https://www.wikidata.org/wiki/Q395
concepts[9].display_name	Mathematics
concepts[10].id	https://openalex.org/C127413603
concepts[10].level	0
concepts[10].score	0.10018351674079895
concepts[10].wikidata	https://www.wikidata.org/wiki/Q11023
concepts[10].display_name	Engineering
concepts[11].id	https://openalex.org/C119599485
concepts[11].level	1
concepts[11].score	0.0
concepts[11].wikidata	https://www.wikidata.org/wiki/Q43035
concepts[11].display_name	Electrical engineering
concepts[12].id	https://openalex.org/C138885662
concepts[12].level	0
concepts[12].score	0.0
concepts[12].wikidata	https://www.wikidata.org/wiki/Q5891
concepts[12].display_name	Philosophy
concepts[13].id	https://openalex.org/C41895202
concepts[13].level	1
concepts[13].score	0.0
concepts[13].wikidata	https://www.wikidata.org/wiki/Q8162
concepts[13].display_name	Linguistics
concepts[14].id	https://openalex.org/C165801399
concepts[14].level	2
concepts[14].score	0.0
concepts[14].wikidata	https://www.wikidata.org/wiki/Q25428
concepts[14].display_name	Voltage
concepts[15].id	https://openalex.org/C2524010
concepts[15].level	1
concepts[15].score	0.0
concepts[15].wikidata	https://www.wikidata.org/wiki/Q8087
concepts[15].display_name	Geometry
keywords[0].id	https://openalex.org/keywords/salient
keywords[0].score	0.8256196975708008
keywords[0].display_name	Salient
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.7454817891120911
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/feature
keywords[2].score	0.5876179933547974
keywords[2].display_name	Feature (linguistics)
keywords[3].id	https://openalex.org/keywords/block
keywords[3].score	0.5414296984672546
keywords[3].display_name	Block (permutation group theory)
keywords[4].id	https://openalex.org/keywords/artificial-intelligence
keywords[4].score	0.5041807889938354
keywords[4].display_name	Artificial intelligence
keywords[5].id	https://openalex.org/keywords/transformer
keywords[5].score	0.43872907757759094
keywords[5].display_name	Transformer
keywords[6].id	https://openalex.org/keywords/pattern-recognition
keywords[6].score	0.40482082962989807
keywords[6].display_name	Pattern recognition (psychology)
keywords[7].id	https://openalex.org/keywords/data-mining
keywords[7].score	0.39513009786605835
keywords[7].display_name	Data mining
keywords[8].id	https://openalex.org/keywords/machine-learning
keywords[8].score	0.3706473112106323
keywords[8].display_name	Machine learning
keywords[9].id	https://openalex.org/keywords/mathematics
keywords[9].score	0.10511744022369385
keywords[9].display_name	Mathematics
keywords[10].id	https://openalex.org/keywords/engineering
keywords[10].score	0.10018351674079895
keywords[10].display_name	Engineering
language	en
locations[0].id	pmh:oai:arXiv.org:2309.08365
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2309.08365
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2309.08365
locations[1].id	doi:10.48550/arxiv.2309.08365
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2309.08365
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5103072099
authorships[0].author.orcid	https://orcid.org/0000-0002-3616-2496
authorships[0].author.display_name	Yuan Yao
authorships[0].author_position	first
authorships[0].raw_author_name	Yuan, Yao
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5101746588
authorships[1].author.orcid	https://orcid.org/0000-0002-7885-6824
authorships[1].author.display_name	Pan Gao
authorships[1].author_position	middle
authorships[1].raw_author_name	Gao, Pan
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5004478562
authorships[2].author.orcid	https://orcid.org/0000-0002-2683-8667
authorships[2].author.display_name	Xiaoyang Tan
authorships[2].author_position	last
authorships[2].raw_author_name	Tan, XiaoYang
authorships[2].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2309.08365
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T11605
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9997000098228455
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1707
primary_topic.subfield.display_name	Computer Vision and Pattern Recognition
primary_topic.display_name	Visual Attention and Saliency Detection
related_works	https://openalex.org/W2329500892, https://openalex.org/W28991112, https://openalex.org/W2370726991, https://openalex.org/W2369710579, https://openalex.org/W4327728159, https://openalex.org/W4394266730, https://openalex.org/W1990856605, https://openalex.org/W2053783616, https://openalex.org/W4388913932, https://openalex.org/W4309130263
cited_by_count	11
counts_by_year[0].year	2025
counts_by_year[0].cited_by_count	7
counts_by_year[1].year	2024
counts_by_year[1].cited_by_count	4
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2309.08365
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2309.08365
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2309.08365
primary_location.id	pmh:oai:arXiv.org:2309.08365
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2309.08365
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2309.08365
publication_date	2023-09-15
publication_year	2023
referenced_works_count	0
abstract_inverted_index.a	149
abstract_inverted_index.To	36
abstract_inverted_index.at	128, 131, 184
abstract_inverted_index.in	176
abstract_inverted_index.of	18, 26, 110, 142, 178
abstract_inverted_index.on	160
abstract_inverted_index.or	9
abstract_inverted_index.to	32, 68, 78, 137, 153
abstract_inverted_index.we	39, 57, 113, 147
abstract_inverted_index.CNN	171
abstract_inverted_index.SOD	96, 174
abstract_inverted_index.and	24, 28, 47, 83, 124, 134, 172
abstract_inverted_index.are	182
abstract_inverted_index.for	51
abstract_inverted_index.six	161
abstract_inverted_index.the	22, 33, 41, 44, 65, 70, 90, 108, 115, 140, 143, 155, 166
abstract_inverted_index.use	7
abstract_inverted_index.Most	0
abstract_inverted_index.This	119
abstract_inverted_index.arts	175
abstract_inverted_index.both	132
abstract_inverted_index.fact	91
abstract_inverted_index.four	179
abstract_inverted_index.map.	145
abstract_inverted_index.maps	17
abstract_inverted_index.only	101
abstract_inverted_index.that	92, 165
abstract_inverted_index.them	27
abstract_inverted_index.thus	84
abstract_inverted_index.Block	61
abstract_inverted_index.Codes	181
abstract_inverted_index.Mixed	46, 116
abstract_inverted_index.U-Net	8
abstract_inverted_index.based	95
abstract_inverted_index.block	120
abstract_inverted_index.final	34
abstract_inverted_index.guide	79
abstract_inverted_index.i.e.,	43
abstract_inverted_index.local	135
abstract_inverted_index.terms	177
abstract_inverted_index.their	29
abstract_inverted_index.using	102
abstract_inverted_index.which	13, 62
abstract_inverted_index.while	105
abstract_inverted_index.(SOD).	55
abstract_inverted_index.Block.	118
abstract_inverted_index.Object	53
abstract_inverted_index.aiming	127
abstract_inverted_index.global	103, 122, 133
abstract_inverted_index.levels	136
abstract_inverted_index.locate	98
abstract_inverted_index.mostly	6
abstract_inverted_index.object	3
abstract_inverted_index.recent	170
abstract_inverted_index.simply	14
abstract_inverted_index.these,	38
abstract_inverted_index.window	125
abstract_inverted_index.Salient	52
abstract_inverted_index.achieve	69
abstract_inverted_index.between	72
abstract_inverted_index.complex	111
abstract_inverted_index.context	130
abstract_inverted_index.details	109
abstract_inverted_index.feature	10, 16, 81, 157
abstract_inverted_index.further	138
abstract_inverted_index.improve	139
abstract_inverted_index.methods	5, 97
abstract_inverted_index.network	50
abstract_inverted_index.propose	40, 58, 114
abstract_inverted_index.pyramid	11
abstract_inverted_index.regions	100
abstract_inverted_index.salient	2, 86, 99
abstract_inverted_index.scales,	20
abstract_inverted_index.Finally,	146
abstract_inverted_index.Firstly,	56
abstract_inverted_index.M$^3$Net	168
abstract_inverted_index.accuracy	141
abstract_inverted_index.allowing	75
abstract_inverted_index.approach	67
abstract_inverted_index.combines	121
abstract_inverted_index.datasets	163
abstract_inverted_index.existing	1
abstract_inverted_index.features	77
abstract_inverted_index.ignoring	21
abstract_inverted_index.learning	82
abstract_inverted_index.metrics.	180
abstract_inverted_index.modeling	129
abstract_inverted_index.objects,	112
abstract_inverted_index.optimize	154
abstract_inverted_index.overcome	37
abstract_inverted_index.previous	93
abstract_inverted_index.proposed	148, 167
abstract_inverted_index.regions.	87
abstract_inverted_index.strategy	152
abstract_inverted_index.Attention	117
abstract_inverted_index.Detection	54
abstract_inverted_index.M$^3$Net,	42
abstract_inverted_index.Secondly,	88
abstract_inverted_index.attention	49
abstract_inverted_index.available	183
abstract_inverted_index.detection	4
abstract_inverted_index.different	19
abstract_inverted_index.enhancing	85
abstract_inverted_index.features,	74
abstract_inverted_index.low-level	80
abstract_inverted_index.surpasses	169
abstract_inverted_index.Multiscale	59
abstract_inverted_index.Multistage	48
abstract_inverted_index.aggregated	156
abstract_inverted_index.aggregates	15
abstract_inverted_index.high-level	76
abstract_inverted_index.inevitably	106
abstract_inverted_index.introduces	64
abstract_inverted_index.multilevel	73, 150
abstract_inverted_index.prediction	144
abstract_inverted_index.respective	30
abstract_inverted_index.structure,	12
abstract_inverted_index.uniqueness	23
abstract_inverted_index.Experiments	159
abstract_inverted_index.Interaction	60
abstract_inverted_index.Multilevel,	45
abstract_inverted_index.Transformer	94
abstract_inverted_index.challenging	162
abstract_inverted_index.considering	89
abstract_inverted_index.demonstrate	164
abstract_inverted_index.interaction	71
abstract_inverted_index.overlooking	107
abstract_inverted_index.prediction.	35
abstract_inverted_index.supervision	151
abstract_inverted_index.innovatively	63
abstract_inverted_index.contributions	31
abstract_inverted_index.self-attention	104, 123
abstract_inverted_index.cross-attention	66
abstract_inverted_index.interdependence	25
abstract_inverted_index.self-attention,	126
abstract_inverted_index.stage-by-stage.	158
abstract_inverted_index.Transformer-based	173
abstract_inverted_index.https://github.com/I2-Multimedia-Lab/M3Net.	185
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	3
citation_normalized_percentile