Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation

Exploring foci of: arXiv (Cornell University) Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation July 2023 • Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya Zhang The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues. However, current fusion-based methods have the performance limitations due to the small receptive field of convolution and inadequate fusion of audio-visual features. To overcome these issues, we propose a novel \textbf{Au}dio-aware query-enhanced \textbf{TR}ansformer (AuTR) to tackle the task. Unlike existing methods, our approach introduces a multimodal transformer architecture that enabl… Open Article Page

Computer Science Segmentation Fault Transformer Audiovisual Artificial Intelligence Computer Vision Quantum Mechanics Physics Voltage Open Article