Ryan Whetten
YOU?
Author Swipe
In-domain SSL pre-training and streaming ASR Open
In this study, we investigate the benefits of domain-specific self-supervised pre-training for both offline and streaming ASR in Air Traffic Control (ATC) environments. We train BEST-RQ models on 4.5k hours of unlabeled ATC data, then fine…
View article: Towards Early Prediction of Self-Supervised Speech Model Performance
Towards Early Prediction of Self-Supervised Speech Model Performance Open
In Self-Supervised Learning (SSL), pre-training and evaluation are resource intensive. In the speech domain, current indicators of the quality of SSL models during pre-training, such as the loss, do not correlate well with downstream perfo…
View article: An Analysis of Linear Complexity Attention Substitutes with BEST-RQ
An Analysis of Linear Complexity Attention Substitutes with BEST-RQ Open
Self-Supervised Learning (SSL) has proven to be effective in various domains, including speech processing. However, SSL is computationally and memory expensive. This is in part due the quadratic complexity of multi-head self-attention (MHS…
View article: Open Implementation and Study of BEST-RQ for Speech Processing
Open Implementation and Study of BEST-RQ for Speech Processing Open
Self-Supervised Learning (SSL) has proven to be useful in various speech tasks. However, these methods are generally very demanding in terms of data, memory, and computational resources. BERT-based Speech pre-Training with Random-projectio…
Severity Measures for Assessing Error in Automatic Speech Recognition Open
A common metric for evaluating Automatic Speech Recognition (ASR) is Word Error Rate (WER) which solely takes into account discrepancies at the word-level. Although WER is useful, it is not guaranteed to correlate well with intelligibility…
Evaluating Automatic Speech Recognition in an Incremental Setting Open
The increasing reliability of automatic speech recognition has proliferated its everyday use. However, for research purposes, it is often unclear which model one should choose for a task, particularly if there is a requirement for speed as…
Evaluating and Improving Automatic Speech Recognition using Severity Open
A common metric for evaluating Automatic Speech Recognition (ASR) is Word Error Rate (WER) which solely takes into account discrepancies at the word-level. Although useful, WER is not guaranteed to correlate well with human judgment or per…