arXiv (Cornell University)
Emotion Detection in Speech Using Lightweight and Transformer-Based Models: A Comparative and Ablation Study
November 2025 • Lucky Onyekwelu-Udoka, Md Shafiqul Islam, Mahbub Hasan
Emotion recognition from speech plays a vital role in the development of empathetic human-computer interaction systems. This paper presents a comparative analysis of lightweight transformer-based models, DistilHuBERT and PaSST, by classifying six core emotions from the CREMA-D dataset. We benchmark their performance against a traditional CNN-LSTM baseline model using MFCC features. DistilHuBERT demonstrates superior accuracy (70.64%) and F1 score (70.36%) while maintaining an exceptionally small model size (0.02 M…