Explanipedia

MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition Open

Umberto Cappellazzo, Minsu Kim, Pingchuan Ma, Honglie Chen, Xubo Liu , et al. · 2025

Large language models (LLMs) have recently shown strong potential in audio-visual speech recognition (AVSR), but their high computational demands and sensitivity to token granularity limit their practicality in resource-constrained setting…

Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach Open

Umberto Cappellazzo, Minsu Kim, Stavros Petridis, Daniele Falavigna, Alessio Brutti · 2025

Audio-Visual Speech Recognition (AVSR) enhances robustness in noisy environments by integrating visual cues. While recent advances integrate Large Language Models (LLMs) into AVSR, their high computational cost hinders deployment in resour…

Large Language Models are Strong Audio-Visual Speech Recognition Learners Open

Umberto Cappellazzo, Minsu Kim, Honglie Chen, Pingchuan Ma, Stavros Petridis , et al. · 2024

Computer science Philosophy

Multimodal large language models (MLLMs) have recently become a focal point of research due to their formidable multimodal understanding capabilities. For example, in the audio and speech domains, an LLM can be equipped with (automatic) sp…

Evaluating and Improving Continual Learning in Spoken Language Understanding Open

Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj · 2024

Computer science Psychology Philosophy

Continual learning has emerged as an increasingly important challenge across various tasks, including Spoken Language Understanding (SLU). In SLU, its objective is to effectively handle the emergence of new concepts and evolving environmen…

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters Open

Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti · 2024

Computer science Engineering

Mixture of Experts (MoE) architectures have recently started burgeoning due to their ability to scale model's capacity while maintaining the computational cost affordable. Furthermore, they can be applied to both Transformers and State Spa…

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers Open

Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti, Mirco Ravanelli · 2023

Computer science Engineering

Parameter-efficient transfer learning (PETL) methods have emerged as a solid alternative to the standard full fine-tuning approach. They only train a few extra parameters for each downstream task, without sacrificing performance and dispen…

Continual Contrastive Spoken Language Understanding Open

Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti , et al. · 2023

Computer science Business

Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous co…

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices Open

George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang , et al. · 2023

Computer science

The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented b…

Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding Open

Umberto Cappellazzo, Muqiao Yang, Daniele Falavigna, Alessio Brutti · 2023

Computer science Engineering Biology

The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments. Their propensity to fit the current data distribution to the detriment of the past acqui…

An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding Open

Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti · 2022

Computer science Psychology Mathematics

Continual learning refers to a dynamical framework in which a model receives a stream of non-stationary data over time and must adapt to new data while preserving previously acquired knowledge. Unluckily, neural networks fail to meet these…

Umberto Cappellazzo YOU? Author Swipe