Explanipedia

Universal Audio Generation Open

Antoine Laurent, Sameer Khurana, Anthony Larcher, Dominik Klement, Mickaël Rouvier , et al. · 2026

This report describe the research done during the third ESPERANTO/JSALT workshop from the 10th June 2024 to the 2nd of August 2024.

Discrete Audio Tokens: More Than a Survey! Open

Pooneh Mousavi, Gallil Maimon, Adel Moumen, Darius Petermann, Jiatong Shi , et al. · 2025

Discrete audio tokens are compact representations that aim to preserve perceptual quality, phonetic content, and speaker characteristics while enabling efficient storage and inference, as well as competitive performance across diverse down…

Late Fusion and Multi-Level Fission Amplify Cross-Modal Transfer in Text-Speech LMs Open

Santiago Cuervo, Adel Moumen, Yanis Labrak, Sameer Khurana, Antoine Laurent , et al. · 2025

Text-Speech Language Models (TSLMs) -- language models trained to jointly process and generate text and speech -- are commonly trained through an early modality fusion/fission approach, in which both modalities are fed and predicted from a…

An Analysis of Linear Complexity Attention Substitutes with BEST-RQ Open

Ryan Whetten, Titouan Parcollet, Adel Moumen, Marco Dinarelli, Yannick Estève · 2024

Self-Supervised Learning (SSL) has proven to be effective in various domains, including speech processing. However, SSL is computationally and memory expensive. This is in part due the quadratic complexity of multi-head self-attention (MHS…

ProGRes: Prompted Generative Rescoring on ASR n-Best Open

Ada Defne Tur, Adel Moumen, Mirco Ravanelli · 2024

Computer science

Large Language Models (LLMs) have shown their ability to improve the performance of speech recognizers by effectively rescoring the n-best hypotheses generated during the beam search process. However, the best way to exploit recent generat…

Open-Source Conversational AI with SpeechBrain 1.0 Open

Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan , et al. · 2024

Computer science

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes trans…

Zero-Shot End-To-End Spoken Question Answering In Medical Domain Open

Yanis Labrak, Adel Moumen, Richard Dufour, Mickaël Rouvier · 2024

Computer science Mathematics Philosophy

In the rapidly evolving landscape of spoken question-answering (SQA), the integration of large language models (LLMs) has emerged as a transformative development. Conventional approaches often entail the use of separate models for question…

Stabilising and accelerating light gated recurrent units for automatic speech recognition Open

Adel Moumen, Titouan Parcollet · 2023

Computer science Engineering Mathematics

The light gated recurrent units (Li-GRU) is well-known for achieving impressive results in automatic speech recognition (ASR) tasks while being lighter and faster to train than a standard gated recurrent units (GRU). However, the unbounded…

Adel Moumen YOU? Author Swipe