MirrorSAM2: Segment Mirror in Videos with Depth Perception Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2509.17220
This paper presents MirrorSAM2, the first framework that adapts Segment Anything Model 2 (SAM2) to the task of RGB-D video mirror segmentation. MirrorSAM2 addresses key challenges in mirror detection, such as reflection ambiguity and texture confusion, by introducing four tailored modules: a Depth Warping Module for RGB and depth alignment, a Depth-guided Multi-Scale Point Prompt Generator for automatic prompt generation, a Frequency Detail Attention Fusion Module to enhance structural boundaries, and a Mirror Mask Decoder with a learnable mirror token for refined segmentation. By fully leveraging the complementarity between RGB and depth, MirrorSAM2 extends SAM2's capabilities to the prompt-free setting. To our knowledge, this is the first work to enable SAM2 for automatic video mirror segmentation. Experiments on the VMD and DVMD benchmark demonstrate that MirrorSAM2 achieves SOTA performance, even under challenging conditions such as small mirrors, weak boundaries, and strong reflections.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2509.17220
- https://arxiv.org/pdf/2509.17220
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415254058
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415254058Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2509.17220Digital Object Identifier
- Title
-
MirrorSAM2: Segment Mirror in Videos with Depth PerceptionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-09-21Full publication date if available
- Authors
-
Mingchen Xu, Yu‐Kun Lai, Ze Ji, Jing WuList of authors in order
- Landing page
-
https://arxiv.org/abs/2509.17220Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2509.17220Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2509.17220Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415254058 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2509.17220 |
| ids.doi | https://doi.org/10.48550/arxiv.2509.17220 |
| ids.openalex | https://openalex.org/W4415254058 |
| fwci | |
| type | preprint |
| title | MirrorSAM2: Segment Mirror in Videos with Depth Perception |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10531 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.6553000211715698 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Advanced Vision and Imaging |
| topics[1].id | https://openalex.org/T10481 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.59170001745224 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1704 |
| topics[1].subfield.display_name | Computer Graphics and Computer-Aided Design |
| topics[1].display_name | Computer Graphics and Visualization Techniques |
| topics[2].id | https://openalex.org/T11211 |
| topics[2].field.id | https://openalex.org/fields/19 |
| topics[2].field.display_name | Earth and Planetary Sciences |
| topics[2].score | 0.5647000074386597 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1907 |
| topics[2].subfield.display_name | Geology |
| topics[2].display_name | 3D Surveying and Cultural Heritage |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2509.17220 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2509.17220 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2509.17220 |
| locations[1].id | doi:10.48550/arxiv.2509.17220 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2509.17220 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101619740 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Mingchen Xu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xu, Mingchen |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5067850699 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-2094-5680 |
| authorships[1].author.display_name | Yu‐Kun Lai |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Lai, Yukun |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5068175770 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-8968-9902 |
| authorships[2].author.display_name | Ze Ji |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Ji, Ze |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100962724 |
| authorships[3].author.orcid | https://orcid.org/0009-0007-0942-5418 |
| authorships[3].author.display_name | Jing Wu |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Wu, Jing |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2509.17220 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-16T00:00:00 |
| display_name | MirrorSAM2: Segment Mirror in Videos with Depth Perception |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10531 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.6553000211715698 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Advanced Vision and Imaging |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2509.17220 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2509.17220 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2509.17220 |
| primary_location.id | pmh:oai:arXiv.org:2509.17220 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2509.17220 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2509.17220 |
| publication_date | 2025-09-21 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.2 | 12 |
| abstract_inverted_index.a | 41, 50, 60, 71, 76 |
| abstract_inverted_index.By | 83 |
| abstract_inverted_index.To | 100 |
| abstract_inverted_index.as | 30, 134 |
| abstract_inverted_index.by | 36 |
| abstract_inverted_index.in | 26 |
| abstract_inverted_index.is | 104 |
| abstract_inverted_index.of | 17 |
| abstract_inverted_index.on | 117 |
| abstract_inverted_index.to | 14, 66, 96, 108 |
| abstract_inverted_index.RGB | 46, 89 |
| abstract_inverted_index.VMD | 119 |
| abstract_inverted_index.and | 33, 47, 70, 90, 120, 139 |
| abstract_inverted_index.for | 45, 56, 80, 111 |
| abstract_inverted_index.key | 24 |
| abstract_inverted_index.our | 101 |
| abstract_inverted_index.the | 4, 15, 86, 97, 105, 118 |
| abstract_inverted_index.DVMD | 121 |
| abstract_inverted_index.Mask | 73 |
| abstract_inverted_index.SAM2 | 110 |
| abstract_inverted_index.SOTA | 127 |
| abstract_inverted_index.This | 0 |
| abstract_inverted_index.even | 129 |
| abstract_inverted_index.four | 38 |
| abstract_inverted_index.such | 29, 133 |
| abstract_inverted_index.task | 16 |
| abstract_inverted_index.that | 7, 124 |
| abstract_inverted_index.this | 103 |
| abstract_inverted_index.weak | 137 |
| abstract_inverted_index.with | 75 |
| abstract_inverted_index.work | 107 |
| abstract_inverted_index.Depth | 42 |
| abstract_inverted_index.Model | 11 |
| abstract_inverted_index.Point | 53 |
| abstract_inverted_index.RGB-D | 18 |
| abstract_inverted_index.depth | 48 |
| abstract_inverted_index.first | 5, 106 |
| abstract_inverted_index.fully | 84 |
| abstract_inverted_index.paper | 1 |
| abstract_inverted_index.small | 135 |
| abstract_inverted_index.token | 79 |
| abstract_inverted_index.under | 130 |
| abstract_inverted_index.video | 19, 113 |
| abstract_inverted_index.(SAM2) | 13 |
| abstract_inverted_index.Detail | 62 |
| abstract_inverted_index.Fusion | 64 |
| abstract_inverted_index.Mirror | 72 |
| abstract_inverted_index.Module | 44, 65 |
| abstract_inverted_index.Prompt | 54 |
| abstract_inverted_index.SAM2's | 94 |
| abstract_inverted_index.adapts | 8 |
| abstract_inverted_index.depth, | 91 |
| abstract_inverted_index.enable | 109 |
| abstract_inverted_index.mirror | 20, 27, 78, 114 |
| abstract_inverted_index.prompt | 58 |
| abstract_inverted_index.strong | 140 |
| abstract_inverted_index.Decoder | 74 |
| abstract_inverted_index.Segment | 9 |
| abstract_inverted_index.Warping | 43 |
| abstract_inverted_index.between | 88 |
| abstract_inverted_index.enhance | 67 |
| abstract_inverted_index.extends | 93 |
| abstract_inverted_index.refined | 81 |
| abstract_inverted_index.texture | 34 |
| abstract_inverted_index.Anything | 10 |
| abstract_inverted_index.achieves | 126 |
| abstract_inverted_index.mirrors, | 136 |
| abstract_inverted_index.modules: | 40 |
| abstract_inverted_index.presents | 2 |
| abstract_inverted_index.setting. | 99 |
| abstract_inverted_index.tailored | 39 |
| abstract_inverted_index.Attention | 63 |
| abstract_inverted_index.Frequency | 61 |
| abstract_inverted_index.Generator | 55 |
| abstract_inverted_index.addresses | 23 |
| abstract_inverted_index.ambiguity | 32 |
| abstract_inverted_index.automatic | 57, 112 |
| abstract_inverted_index.benchmark | 122 |
| abstract_inverted_index.framework | 6 |
| abstract_inverted_index.learnable | 77 |
| abstract_inverted_index.MirrorSAM2 | 22, 92, 125 |
| abstract_inverted_index.alignment, | 49 |
| abstract_inverted_index.challenges | 25 |
| abstract_inverted_index.conditions | 132 |
| abstract_inverted_index.confusion, | 35 |
| abstract_inverted_index.detection, | 28 |
| abstract_inverted_index.knowledge, | 102 |
| abstract_inverted_index.leveraging | 85 |
| abstract_inverted_index.reflection | 31 |
| abstract_inverted_index.structural | 68 |
| abstract_inverted_index.Experiments | 116 |
| abstract_inverted_index.MirrorSAM2, | 3 |
| abstract_inverted_index.Multi-Scale | 52 |
| abstract_inverted_index.boundaries, | 69, 138 |
| abstract_inverted_index.challenging | 131 |
| abstract_inverted_index.demonstrate | 123 |
| abstract_inverted_index.generation, | 59 |
| abstract_inverted_index.introducing | 37 |
| abstract_inverted_index.prompt-free | 98 |
| abstract_inverted_index.Depth-guided | 51 |
| abstract_inverted_index.capabilities | 95 |
| abstract_inverted_index.performance, | 128 |
| abstract_inverted_index.reflections. | 141 |
| abstract_inverted_index.segmentation. | 21, 82, 115 |
| abstract_inverted_index.complementarity | 87 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |