Adel Moumen
YOU?
Author Swipe
View article: Universal Audio Generation
Universal Audio Generation Open
This report describe the research done during the third ESPERANTO/JSALT workshop from the 10th June 2024 to the 2nd of August 2024.
Discrete Audio Tokens: More Than a Survey! Open
Discrete audio tokens are compact representations that aim to preserve perceptual quality, phonetic content, and speaker characteristics while enabling efficient storage and inference, as well as competitive performance across diverse down…
View article: Late Fusion and Multi-Level Fission Amplify Cross-Modal Transfer in Text-Speech LMs
Late Fusion and Multi-Level Fission Amplify Cross-Modal Transfer in Text-Speech LMs Open
Text-Speech Language Models (TSLMs) -- language models trained to jointly process and generate text and speech -- are commonly trained through an early modality fusion/fission approach, in which both modalities are fed and predicted from a…
View article: An Analysis of Linear Complexity Attention Substitutes with BEST-RQ
An Analysis of Linear Complexity Attention Substitutes with BEST-RQ Open
Self-Supervised Learning (SSL) has proven to be effective in various domains, including speech processing. However, SSL is computationally and memory expensive. This is in part due the quadratic complexity of multi-head self-attention (MHS…
ProGRes: Prompted Generative Rescoring on ASR n-Best Open
Large Language Models (LLMs) have shown their ability to improve the performance of speech recognizers by effectively rescoring the n-best hypotheses generated during the beam search process. However, the best way to exploit recent generat…
View article: Open-Source Conversational AI with SpeechBrain 1.0
Open-Source Conversational AI with SpeechBrain 1.0 Open
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes trans…
View article: Zero-Shot End-To-End Spoken Question Answering In Medical Domain
Zero-Shot End-To-End Spoken Question Answering In Medical Domain Open
In the rapidly evolving landscape of spoken question-answering (SQA), the integration of large language models (LLMs) has emerged as a transformative development. Conventional approaches often entail the use of separate models for question…
Stabilising and accelerating light gated recurrent units for automatic speech recognition Open
The light gated recurrent units (Li-GRU) is well-known for achieving impressive results in automatic speech recognition (ASR) tasks while being lighter and faster to train than a standard gated recurrent units (GRU). However, the unbounded…