Shane Settle
YOU?
Author Swipe
View article: Acoustic Span Embeddings For Multilingual Query-By-Example Search
Acoustic Span Embeddings For Multilingual Query-By-Example Search Open
Query-by-example (QbE) speech search is the task of matching spoken queries to utterances within a search collection. In low- or zero-resource settings, QbE search is often addressed with approaches based on dynamic time warping (DTW). Rec…
View article: What Do Self-Supervised Speech Models Know About Words?
What Do Self-Supervised Speech Models Know About Words? Open
Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks. However, these empirical successes alone do not give a complete picture of what is l…
View article: Neural approaches to spoken content embedding
Neural approaches to spoken content embedding Open
Comparing spoken segments is a central operation to speech processing. Traditional approaches in this area have favored frame-level dynamic programming algorithms, such as dynamic time warping, because they require no supervision, but they…
View article: What Do Self-Supervised Speech Models Know About Words?
What Do Self-Supervised Speech Models Know About Words? Open
Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks. However, these empirical successes alone do not give a complete picture of what is l…
View article: Acoustic span embeddings for multilingual query-by-example search
Acoustic span embeddings for multilingual query-by-example search Open
Query-by-example (QbE) speech search is the task of matching spoken queries to utterances within a search collection. In low- or zero-resource settings, QbE search is often addressed with approaches based on dynamic time warping (DTW). Rec…
View article: Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings
Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings Open
Segmental models are sequence prediction models in which scores of hypotheses are based on entire variable-length segments of frames. We consider segmental models for whole-word ("acoustic-to-word") speech recognition, with the feature vec…
View article: Multilingual Jointly Trained Acoustic and Written Word Embeddings
Multilingual Jointly Trained Acoustic and Written Word Embeddings Open
Acoustic word embeddings (AWEs) are vector representations of spoken word segments. AWEs can be learned jointly with embeddings of character sequences, to generate phonetically meaningful embeddings of written words, or acoustically ground…
View article: Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition
Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition Open
Direct acoustics-to-word (A2W) systems for end-to-end automatic speech recognition are simpler to train, and more efficient to decode with, than sub-word systems. However, A2W systems can have difficulties at training time when data is lim…
View article: Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word\n Speech Recognition
Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word\n Speech Recognition Open
Direct acoustics-to-word (A2W) systems for end-to-end automatic speech\nrecognition are simpler to train, and more efficient to decode with, than\nsub-word systems. However, A2W systems can have difficulties at training time\nwhen data is …
View article: Visually Grounded Learning of Keyword Prediction from Untranscribed Speech
Visually Grounded Learning of Keyword Prediction from Untranscribed Speech Open
During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful…
View article: Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings
Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings Open
Query-by-example search often uses dynamic time warping (DTW) for comparing queries and proposed matching segments. Recent work has shown that comparing speech segments by representing them as fixed-dimensional vectors --- acoustic word em…
View article: Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches
Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches Open
Acoustic word embeddings --- fixed-dimensional vector representations of variable-length spoken word segments --- have begun to be considered for tasks such as speech recognition and query-by-example search. Such embeddings can be learned …