Exploring foci of:
arXiv (Cornell University)
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
July 2023 • Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya Zhang
The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues. However, current fusion-based methods have the performance limitations due to the small receptive field of convolution and inadequate fusion of audio-visual features. To overcome these issues, we propose a novel \textbf{Au}dio-aware query-enhanced \textbf{TR}ansformer (AuTR) to tackle the task. Unlike existing methods, our approach introduces a multimodal transformer architecture that enabl…
Computer Science
Segmentation Fault
Transformer
Audiovisual
Artificial Intelligence
Computer Vision
Quantum Mechanics
Physics
Voltage