Umberto Cappellazzo
YOU?
Author Swipe
View article: MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition Open
Large language models (LLMs) have recently shown strong potential in audio-visual speech recognition (AVSR), but their high computational demands and sensitivity to token granularity limit their practicality in resource-constrained setting…
View article: Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach
Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach Open
Audio-Visual Speech Recognition (AVSR) enhances robustness in noisy environments by integrating visual cues. While recent advances integrate Large Language Models (LLMs) into AVSR, their high computational cost hinders deployment in resour…
View article: Large Language Models are Strong Audio-Visual Speech Recognition Learners
Large Language Models are Strong Audio-Visual Speech Recognition Learners Open
Multimodal large language models (MLLMs) have recently become a focal point of research due to their formidable multimodal understanding capabilities. For example, in the audio and speech domains, an LLM can be equipped with (automatic) sp…
View article: Evaluating and Improving Continual Learning in Spoken Language Understanding
Evaluating and Improving Continual Learning in Spoken Language Understanding Open
Continual learning has emerged as an increasingly important challenge across various tasks, including Spoken Language Understanding (SLU). In SLU, its objective is to effectively handle the emergence of new concepts and evolving environmen…
View article: Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters Open
Mixture of Experts (MoE) architectures have recently started burgeoning due to their ability to scale model's capacity while maintaining the computational cost affordable. Furthermore, they can be applied to both Transformers and State Spa…
View article: Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers
Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers Open
Parameter-efficient transfer learning (PETL) methods have emerged as a solid alternative to the standard full fine-tuning approach. They only train a few extra parameters for each downstream task, without sacrificing performance and dispen…
View article: Continual Contrastive Spoken Language Understanding
Continual Contrastive Spoken Language Understanding Open
Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous co…
View article: Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices Open
The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented b…
View article: Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding
Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding Open
The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments. Their propensity to fit the current data distribution to the detriment of the past acqui…
View article: An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding
An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding Open
Continual learning refers to a dynamical framework in which a model receives a stream of non-stationary data over time and must adapt to new data while preserving previously acquired knowledge. Unluckily, neural networks fail to meet these…