Fusion-Mamba for Cross-modality Object Detection Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2404.09146
Cross-modality fusing complementary information from different modalities effectively improves object detection performance, making it more useful and robust for a wider range of applications. Existing fusion strategies combine different types of images or merge different backbone features through elaborated neural network modules. However, these methods neglect that modality disparities affect cross-modality fusion performance, as different modalities with different camera focal lengths, placements, and angles are hardly fused. In this paper, we investigate cross-modality fusion by associating cross-modal features in a hidden state space based on an improved Mamba with a gating mechanism. We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction, thereby reducing disparities between cross-modal features and enhancing the representation consistency of fused features. FMB contains two modules: the State Space Channel Swapping (SSCS) module facilitates shallow feature fusion, and the Dual State Space Fusion (DSSF) enables deep fusion in a hidden state space. Through extensive experiments on public datasets, our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M^3FD$ and 4.9% on FLIR-Aligned datasets, demonstrating superior object detection performance. To the best of our knowledge, this is the first work to explore the potential of Mamba for cross-modal fusion and establish a new baseline for cross-modality object detection.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2404.09146
- https://arxiv.org/pdf/2404.09146
- OA Status
- green
- Cited By
- 15
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4394867784
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4394867784Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2404.09146Digital Object Identifier
- Title
-
Fusion-Mamba for Cross-modality Object DetectionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-04-14Full publication date if available
- Authors
-
Wenhao Dong, Haodong Zhu, Shaohui Lin, Xiaoyan Luo, Yunhang Shen, Xuhui Liu, Juan Zhang, Guodong Guo, Baochang ZhangList of authors in order
- Landing page
-
https://arxiv.org/abs/2404.09146Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2404.09146Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2404.09146Direct OA link when available
- Concepts
-
Modality (human–computer interaction), Object (grammar), Fusion, Computer science, Artificial intelligence, Linguistics, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
15Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 9, 2024: 5, 2023: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4394867784 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2404.09146 |
| ids.doi | https://doi.org/10.48550/arxiv.2404.09146 |
| ids.openalex | https://openalex.org/W4394867784 |
| fwci | |
| type | preprint |
| title | Fusion-Mamba for Cross-modality Object Detection |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10627 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9610000252723694 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Advanced Image and Video Retrieval Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2780226545 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6727072596549988 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q6888030 |
| concepts[0].display_name | Modality (human–computer interaction) |
| concepts[1].id | https://openalex.org/C2781238097 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5201376080513 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q175026 |
| concepts[1].display_name | Object (grammar) |
| concepts[2].id | https://openalex.org/C158525013 |
| concepts[2].level | 2 |
| concepts[2].score | 0.43093228340148926 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q2593739 |
| concepts[2].display_name | Fusion |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.3918694853782654 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.27464139461517334 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C41895202 |
| concepts[5].level | 1 |
| concepts[5].score | 0.09178414940834045 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[5].display_name | Linguistics |
| concepts[6].id | https://openalex.org/C138885662 |
| concepts[6].level | 0 |
| concepts[6].score | 0.06665870547294617 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[6].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/modality |
| keywords[0].score | 0.6727072596549988 |
| keywords[0].display_name | Modality (human–computer interaction) |
| keywords[1].id | https://openalex.org/keywords/object |
| keywords[1].score | 0.5201376080513 |
| keywords[1].display_name | Object (grammar) |
| keywords[2].id | https://openalex.org/keywords/fusion |
| keywords[2].score | 0.43093228340148926 |
| keywords[2].display_name | Fusion |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.3918694853782654 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.27464139461517334 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/linguistics |
| keywords[5].score | 0.09178414940834045 |
| keywords[5].display_name | Linguistics |
| keywords[6].id | https://openalex.org/keywords/philosophy |
| keywords[6].score | 0.06665870547294617 |
| keywords[6].display_name | Philosophy |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2404.09146 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2404.09146 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2404.09146 |
| locations[1].id | doi:10.48550/arxiv.2404.09146 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2404.09146 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5075084396 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-5662-5435 |
| authorships[0].author.display_name | Wenhao Dong |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Dong, Wenhao |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5055243365 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-8661-2504 |
| authorships[1].author.display_name | Haodong Zhu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zhu, Haodong |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5043643513 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-0284-9940 |
| authorships[2].author.display_name | Shaohui Lin |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Lin, Shaohui |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5072050541 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-7256-4329 |
| authorships[3].author.display_name | Xiaoyan Luo |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Luo, Xiaoyan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5039883116 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-3970-7519 |
| authorships[4].author.display_name | Yunhang Shen |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Shen, Yunhang |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100666704 |
| authorships[5].author.orcid | https://orcid.org/0009-0000-2960-4930 |
| authorships[5].author.display_name | Xuhui Liu |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Liu, Xuhui |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5100390006 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-5089-723X |
| authorships[6].author.display_name | Juan Zhang |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Zhang, Juan |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5085022758 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-9583-0055 |
| authorships[7].author.display_name | Guodong Guo |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Guo, Guodong |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5015525872 |
| authorships[8].author.orcid | https://orcid.org/0000-0001-7396-6218 |
| authorships[8].author.display_name | Baochang Zhang |
| authorships[8].author_position | last |
| authorships[8].raw_author_name | Zhang, Baochang |
| authorships[8].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2404.09146 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Fusion-Mamba for Cross-modality Object Detection |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10627 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9610000252723694 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Advanced Image and Video Retrieval Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2385859805, https://openalex.org/W2530972254, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W2382290278, https://openalex.org/W4391913857 |
| cited_by_count | 15 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 9 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 5 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2404.09146 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2404.09146 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2404.09146 |
| primary_location.id | pmh:oai:arXiv.org:2404.09146 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2404.09146 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2404.09146 |
| publication_date | 2024-04-14 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 19, 79, 89, 94, 103, 149, 204 |
| abstract_inverted_index.In | 67 |
| abstract_inverted_index.To | 182 |
| abstract_inverted_index.We | 92 |
| abstract_inverted_index.an | 85 |
| abstract_inverted_index.as | 53 |
| abstract_inverted_index.by | 74 |
| abstract_inverted_index.in | 78, 148 |
| abstract_inverted_index.is | 189 |
| abstract_inverted_index.it | 13 |
| abstract_inverted_index.of | 22, 30, 120, 185, 197 |
| abstract_inverted_index.on | 84, 156, 166, 170, 174 |
| abstract_inverted_index.or | 32 |
| abstract_inverted_index.to | 98, 193 |
| abstract_inverted_index.we | 70 |
| abstract_inverted_index.FMB | 123 |
| abstract_inverted_index.and | 16, 62, 115, 138, 172, 202 |
| abstract_inverted_index.are | 64 |
| abstract_inverted_index.for | 18, 107, 199, 207 |
| abstract_inverted_index.map | 99 |
| abstract_inverted_index.new | 205 |
| abstract_inverted_index.our | 159, 186 |
| abstract_inverted_index.the | 117, 127, 139, 163, 183, 190, 195 |
| abstract_inverted_index.two | 125 |
| abstract_inverted_index.4.9% | 173 |
| abstract_inverted_index.5.9% | 169 |
| abstract_inverted_index.Dual | 140 |
| abstract_inverted_index.best | 184 |
| abstract_inverted_index.deep | 146 |
| abstract_inverted_index.from | 4 |
| abstract_inverted_index.into | 102 |
| abstract_inverted_index.more | 14 |
| abstract_inverted_index.that | 46 |
| abstract_inverted_index.this | 68, 188 |
| abstract_inverted_index.with | 56, 88, 168 |
| abstract_inverted_index.work | 192 |
| abstract_inverted_index.$m$AP | 167 |
| abstract_inverted_index.(FMB) | 97 |
| abstract_inverted_index.Mamba | 87, 198 |
| abstract_inverted_index.Space | 129, 142 |
| abstract_inverted_index.State | 128, 141 |
| abstract_inverted_index.based | 83 |
| abstract_inverted_index.block | 96 |
| abstract_inverted_index.first | 191 |
| abstract_inverted_index.focal | 59 |
| abstract_inverted_index.fused | 121 |
| abstract_inverted_index.merge | 33 |
| abstract_inverted_index.range | 21 |
| abstract_inverted_index.space | 82, 106 |
| abstract_inverted_index.state | 81, 105, 151 |
| abstract_inverted_index.these | 43 |
| abstract_inverted_index.types | 29 |
| abstract_inverted_index.wider | 20 |
| abstract_inverted_index.(DSSF) | 144 |
| abstract_inverted_index.(SSCS) | 132 |
| abstract_inverted_index.Fusion | 143 |
| abstract_inverted_index.affect | 49 |
| abstract_inverted_index.angles | 63 |
| abstract_inverted_index.camera | 58 |
| abstract_inverted_index.design | 93 |
| abstract_inverted_index.fused. | 66 |
| abstract_inverted_index.fusing | 1 |
| abstract_inverted_index.fusion | 25, 51, 73, 147, 201 |
| abstract_inverted_index.gating | 90 |
| abstract_inverted_index.hardly | 65 |
| abstract_inverted_index.hidden | 80, 104, 150 |
| abstract_inverted_index.images | 31 |
| abstract_inverted_index.making | 12 |
| abstract_inverted_index.module | 133 |
| abstract_inverted_index.neural | 39 |
| abstract_inverted_index.object | 9, 179, 209 |
| abstract_inverted_index.paper, | 69 |
| abstract_inverted_index.public | 157 |
| abstract_inverted_index.robust | 17 |
| abstract_inverted_index.space. | 152 |
| abstract_inverted_index.useful | 15 |
| abstract_inverted_index.$M^3FD$ | 171 |
| abstract_inverted_index.Channel | 130 |
| abstract_inverted_index.Through | 153 |
| abstract_inverted_index.between | 112 |
| abstract_inverted_index.combine | 27 |
| abstract_inverted_index.enables | 145 |
| abstract_inverted_index.explore | 194 |
| abstract_inverted_index.feature | 136 |
| abstract_inverted_index.fusion, | 137 |
| abstract_inverted_index.methods | 44, 165 |
| abstract_inverted_index.neglect | 45 |
| abstract_inverted_index.network | 40 |
| abstract_inverted_index.shallow | 135 |
| abstract_inverted_index.thereby | 109 |
| abstract_inverted_index.through | 37 |
| abstract_inverted_index.Existing | 24 |
| abstract_inverted_index.However, | 42 |
| abstract_inverted_index.Swapping | 131 |
| abstract_inverted_index.approach | 161 |
| abstract_inverted_index.backbone | 35 |
| abstract_inverted_index.baseline | 206 |
| abstract_inverted_index.contains | 124 |
| abstract_inverted_index.features | 36, 77, 101, 114 |
| abstract_inverted_index.improved | 86 |
| abstract_inverted_index.improves | 8 |
| abstract_inverted_index.lengths, | 60 |
| abstract_inverted_index.modality | 47 |
| abstract_inverted_index.modules. | 41 |
| abstract_inverted_index.modules: | 126 |
| abstract_inverted_index.proposed | 160 |
| abstract_inverted_index.reducing | 110 |
| abstract_inverted_index.superior | 178 |
| abstract_inverted_index.datasets, | 158, 176 |
| abstract_inverted_index.detection | 10, 180 |
| abstract_inverted_index.different | 5, 28, 34, 54, 57 |
| abstract_inverted_index.enhancing | 116 |
| abstract_inverted_index.establish | 203 |
| abstract_inverted_index.extensive | 154 |
| abstract_inverted_index.features. | 122 |
| abstract_inverted_index.potential | 196 |
| abstract_inverted_index.detection. | 210 |
| abstract_inverted_index.elaborated | 38 |
| abstract_inverted_index.knowledge, | 187 |
| abstract_inverted_index.mechanism. | 91 |
| abstract_inverted_index.modalities | 6, 55 |
| abstract_inverted_index.strategies | 26 |
| abstract_inverted_index.associating | 75 |
| abstract_inverted_index.consistency | 119 |
| abstract_inverted_index.cross-modal | 76, 100, 113, 200 |
| abstract_inverted_index.disparities | 48, 111 |
| abstract_inverted_index.effectively | 7 |
| abstract_inverted_index.experiments | 155 |
| abstract_inverted_index.facilitates | 134 |
| abstract_inverted_index.information | 3 |
| abstract_inverted_index.investigate | 71 |
| abstract_inverted_index.outperforms | 162 |
| abstract_inverted_index.placements, | 61 |
| abstract_inverted_index.FLIR-Aligned | 175 |
| abstract_inverted_index.Fusion-Mamba | 95 |
| abstract_inverted_index.interaction, | 108 |
| abstract_inverted_index.performance, | 11, 52 |
| abstract_inverted_index.performance. | 181 |
| abstract_inverted_index.applications. | 23 |
| abstract_inverted_index.complementary | 2 |
| abstract_inverted_index.demonstrating | 177 |
| abstract_inverted_index.Cross-modality | 0 |
| abstract_inverted_index.cross-modality | 50, 72, 208 |
| abstract_inverted_index.representation | 118 |
| abstract_inverted_index.state-of-the-art | 164 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 9 |
| citation_normalized_percentile |