VimGeo: An Efficient Visual Model for Cross-View Geo-Localization Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.3390/electronics14193906
Cross-view geo-localization is a challenging task due to the significant changes in the appearance of target scenes from variable perspectives. Most existing methods primarily adopt Transformers or ConvNeXt as backbone models but often face high computational costs and accuracy degradation in complex scenarios. Therefore, this paper proposes a visual Mamba framework based on the state-space model (SSM) for cross-view geo-localization. Compared with the existing methods, Vision Mamba is more efficient in modeling and memory usage and achieves more efficient cross-view matching by combining the twin architecture of shared weights with multiple mixed losses. Additionally, this paper introduces Dice Loss to handle scale differences and imbalance issues in cross-view images. Extensive experiments on the public cross-view dataset University-1652 demonstrate that Vision Mamba not only achieves excellent performance in UAV target localization tasks but also attains the highest efficiency with lower memory consumption. This work provides a novel solution for cross-view geo-localization tasks and shows great potential to become the backbone model for the next generation of cross-view geo-localization.
Related Topics To Compare & Contrast
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.3390/electronics14193906
- https://www.mdpi.com/2079-9292/14/19/3906/pdf?version=1759243661
- OA Status
- gold
- OpenAlex ID
- https://openalex.org/W4414651787