Efficient Multi-Scale Attention Module with Cross-Spatial Learning Article Swipe

View

Related Concepts

computer science channel (broadcasting) overhead (engineering) artificial intelligence feature (linguistics) pairwise comparison encoding (memory) pattern recognition (psychology) feature extraction curse of dimensionality dimension (graph theory) feature learning scale (ratio) dimensionality reduction representation (politics) mathematics computer network philosophy linguistics physics quantum mechanics politics political science pure mathematics law operating system

Daliang Ouyang , Su He , Guozhong Zhang , Mingzhu Luo , Huaiyong Guo , Jianming Zhan , Zhijie Huang ·

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.1109/icassp49357.2023.10096516 · OA: W4372347372

Remarkable effectiveness of the channel or spatial attention mechanisms for producing more discernible feature representation are illustrated in various computer vision tasks. However, modeling cross-channel relationships with dimensionality reduction may bring side effect extracting deep visual representations. In this paper, a novel efficient multi-scale (EMA) module is proposed. Focusing on retaining information per and decreasing computational overhead, we reshape partly channels into batch dimensions group multiple sub-features which make semantic features well-distributed inside each group. Specifically, apart from encoding global to re-calibrate channel-wise weight parallel branch, output two branches further aggregated by cross-dimension interaction capturing pixel-level pairwise relationship. We conduct extensive ablation studies experiments image classification object detection tasks popular benchmarks (e.g., CIFAR-100, ImageNet-1k, MS COCO VisDrone2019) evaluating its performance.