Exploring foci of:
arXiv (Cornell University)
Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition
January 2024 • Yong Wang, Cheng Lu, Hailun Lian, Yan Zhao, Björn W. Schuller, Yuan Zong, Wenming Zheng
Swin-Transformer has demonstrated remarkable success in computer vision by leveraging its hierarchical feature representation based on Transformer. In speech signals, emotional information is distributed across different scales of speech features, e.\,g., word, phrase, and utterance. Drawing above inspiration, this paper presents a hierarchical speech Transformer with shifted windows to aggregate multi-scale emotion features for speech emotion recognition (SER), called Speech Swin-Transformer. Specifically, we fir…
Transformer
Computer Science
Artificial Intelligence
Engineering
Voltage
Electrical Engineering