Explanipedia

In-domain SSL pre-training and streaming ASR Open

Jarod Duret, Salima Mdhaffar, Gaëlle Laperrière, Ryan Whetten, A. Galametz , et al. · 2025

In this study, we investigate the benefits of domain-specific self-supervised pre-training for both offline and streaming ASR in Air Traffic Control (ATC) environments. We train BEST-RQ models on 4.5k hours of unlabeled ATC data, then fine…

Towards Early Prediction of Self-Supervised Speech Model Performance Open

Ryan Whetten, Lucas Maison, Titouan Parcollet, Marco Dinarelli, Yannick Estève · 2025

In Self-Supervised Learning (SSL), pre-training and evaluation are resource intensive. In the speech domain, current indicators of the quality of SSL models during pre-training, such as the loss, do not correlate well with downstream perfo…

An Analysis of Linear Complexity Attention Substitutes with BEST-RQ Open

Ryan Whetten, Titouan Parcollet, Adel Moumen, Marco Dinarelli, Yannick Estève · 2024

Computer science Mathematics Geography

Self-Supervised Learning (SSL) has proven to be effective in various domains, including speech processing. However, SSL is computationally and memory expensive. This is in part due the quadratic complexity of multi-head self-attention (MHS…

Open Implementation and Study of BEST-RQ for Speech Processing Open

Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Estève · 2024

Computer science

Self-Supervised Learning (SSL) has proven to be useful in various speech tasks. However, these methods are generally very demanding in terms of data, memory, and computational resources. BERT-based Speech pre-Training with Random-projectio…

Severity Measures for Assessing Error in Automatic Speech Recognition Open

Ryan Whetten · 2023

Computer science Engineering Philosophy

A common metric for evaluating Automatic Speech Recognition (ASR) is Word Error Rate (WER) which solely takes into account discrepancies at the word-level. Although WER is useful, it is not guaranteed to correlate well with intelligibility…

Evaluating Automatic Speech Recognition in an Incremental Setting Open

Ryan Whetten, Mir Tahsin Imtiaz, Casey Kennington · 2023

Computer science Physics Economics

The increasing reliability of automatic speech recognition has proliferated its everyday use. However, for research purposes, it is often unclear which model one should choose for a task, particularly if there is a requirement for speed as…

Evaluating and Improving Automatic Speech Recognition using Severity Open

Ryan Whetten, Casey Kennington · 2023

Computer science Engineering Philosophy

A common metric for evaluating Automatic Speech Recognition (ASR) is Word Error Rate (WER) which solely takes into account discrepancies at the word-level. Although useful, WER is not guaranteed to correlate well with human judgment or per…

Ryan Whetten YOU? Author Swipe