Explanipedia

Efficient Streaming LLM for Speech Recognition Open

Junteng Jia, Gil Keren, Wei Zhou, Egor Lakomkin, Xiaohui Zhang , et al. · 2024

Computer science

Recent works have shown that prompting large language models with audio encodings can unlock speech recognition capabilities. However, existing techniques do not scale efficiently, especially while handling long form streaming audio inputs…

M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses Open

Yufeng Yang, Desh Raj, Ju Lin, Niko Moritz, Junteng Jia , et al. · 2024

Computer science History

The growing popularity of multi-channel wearable devices, such as smart glasses, has led to a surge of applications such as targeted speech recognition and enhanced hearing. However, current approaches to solve these tasks use independentl…

Faster Speech-LLaMA Inference with Multi-token Prediction Open

Desh Raj, Gil Keren, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli · 2024

Computer science

Large language models (LLMs) have become proficient at solving a wide variety of tasks, including those involving multi-modal inputs. In particular, instantiating an LLM (such as LLaMA) with a speech encoder and training it on paired data …

Token-Weighted RNN-T for Learning from Flawed Data Open

Gil Keren, Wei Zhou, Ozlem Kalinli · 2024

Computer science

ASR models are commonly trained with the cross-entropy criterion to increase the probability of a target token sequence. While optimizing the probability of all tokens in the target sequence is sensible, one may want to de-emphasize tokens…

Towards Selection of Text-to-speech Data to Augment ASR Training Open

Shuo Liu, Leda Sarı, Chunyang Wu, Gil Keren, Yuan Shangguan , et al. · 2023

Computer science Philosophy

This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic speech recognition (ASR) model. We trained a neural network, wh…

Text Generation with Speech Synthesis for ASR Data Augmentation Open

Zhuangqun Huang, Gil Keren, Ziran Jiang, Shashank Jain, David Goss-Grubbs , et al. · 2023

Computer science

Aiming at reducing the reliance on expensive human annotations, data synthesis for Automatic Speech Recognition (ASR) has remained an active area of research. While prior work mainly focuses on synthetic speech generation for ASR data augm…

A Token-Wise Beam Search Algorithm for RNN-T Open

Gil Keren · 2023

Computer science

Standard Recurrent Neural Network Transducers (RNN-T) decoding algorithms for speech recognition are iterating over the time axis, such that one time step is decoded before moving on to the next time step. Those algorithms result in a larg…

Improving Fast-slow Encoder based Transducer with Streaming Deliberation Open

Ke Li, Jay Mahadeokar, Jinxi Guo, Yangyang Shi, Gil Keren , et al. · 2022

Computer science Physics Political science

This paper introduces a fast-slow encoder based transducer with streaming deliberation for end-to-end automatic speech recognition. We aim to improve the recognition accuracy of the fast-slow encoder based transducer while keeping its late…

Scaling ASR Improves Zero and Few Shot Learning Open

Alex Xiao, Weiyi Zheng, Gil Keren, Manh Duc Le, Feida Zhang , et al. · 2021

Computer science Mathematics Physics

With 4.5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition. We propose data selection techniques to …

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion Open

Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi , et al. · 2021

Computer science Engineering Mathematics

How to leverage dynamic contextual information in end-to-end speech recognition has remained an active research area. Previous solutions to this problem were either designed for specialized use cases that did not generalize well to open-do…

Alignment Restricted Streaming Recurrent Neural Network Transducer Open

Jay Mahadeokar, Yuan Shangguan, Duc Le, Gil Keren, Hang Su , et al. · 2021

Computer science

There is a growing interest in the speech community in developing Recurrent Neural Network Transducer (RNN-T) models for automatic speech recognition (ASR) applications. RNN-T is trained with a loss function that does not enforce temporal …

New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding Open

Björn W. Schuller, Alice Baird, Alexander Gebhard, Shahin Amiriparian, Gil Keren , et al. · 2021

Computer science Psychology Biology

Computer audition (i.e., intelligent audio) has made great strides in recent years; however, it is still far from achieving holistic hearing abilities, which more appropriately mimic human-like understanding. Within an audio scene, a human…

Deep Shallow Fusion for RNN-T Personalization Open

Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen , et al. · 2020

Computer science

End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and exc…

Contextual RNN-T for Open Domain ASR Open

Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze , et al. · 2020

Computer science Business Sociology

End-to-end (E2E) systems for automatic speech recognition (ASR), such as RNN Transducer (RNN-T) and Listen-Attend-Spell (LAS) blend the individual components of a traditional hybrid ASR system - acoustic model, language model, pronunciatio…

N-HANS: Introducing the Augsburg Neuro-Holistic Audio-eNhancement System Open

Shuo Liu, Gil Keren, Björn W. Schuller · 2019

Medicine Computer science Psychology

N-HANS is a Python toolkit for in-the-wild audio enhancement, including speech, music, and general audio denoising, separation, and selective noise or source suppression. The functionalities are realised based on two neural network models …

Towards Robust Speech Emotion Recognition Using Deep Residual Networks for Speech Enhancement Open

Andreas Triantafyllopoulos, Gil Keren, Johannes Wagner, Ingmar Steiner, Björn W. Schuller · 2019

Computer science

The use of deep learning (DL) architectures for speech enhancement has recently improved the robustness of voice applications under diverse noise conditions.These improvements are usually evaluated based on the perceptual quality of the en…

A Walkthrough for the Principle of Logit Separation Open

Gil Keren, Sivan Sabato, Björn W. Schuller · 2019

Computer science Engineering

We consider neural network training, in applications in which there are many possible classes, but at test-time, the task is a binary classification task of determining whether the given example belongs to a specific class. We define the S…

Single-Channel Speech Separation with Auxiliary Speaker Embeddings Open

Shuo Liu, Gil Keren, Björn W. Schuller · 2019

Computer science Mathematics

We present a novel source separation model to decompose asingle-channel speech signal into two speech segments belonging to two different speakers. The proposed model is a neural network based on residual blocks, and uses learnt speaker em…

Scaling Speech Enhancement in Unseen Environments with Noise Embeddings Open

Gil Keren, Jing Han, Björn W. Schuller · 2018

Computer science Mathematics

We address the problem of speech enhancement generalisation to unseen environments by performing two manipulations. First, we embed an additional recording from the environment alone, and use this embedding to alter activations in the main…

Emotion Recognition in Speech with Latent Discriminative Representations Learning Open

Jing Han, Zixing Zhang, Gil Keren, Björn W. Schuller · 2018

Psychology Computer science

Despite significant recent advances in the field of affective computing, learning meaningful representations for emotion recognition remains quite challenging.In this paper,wepropose anovelfeature learning approach named Latent Discriminat…

Calibrated Prediction Intervals for Neural Network Regressors Open

Gil Keren, Nicholas Cummins, Björn W. Schuller · 2018

Computer science Mathematics

Ongoing developments in neural network models are continually advancing the state of the art in terms of system accuracy. However, the predicted labels should not be regarded as the only core output; also important is a well-calibrated est…

Weakly Supervised One-Shot Detection with Attention Siamese Networks Open

Gil Keren, Maximilian Schmitt, Thomas Kehrenberg, Björn W. Schuller · 2018

Computer science Engineering Chemistry

Neural network models that are not conditioned on class identities were shown to facilitate knowledge transfer between classes and to be well-suited for one-shot learning tasks. Following this motivation, we further explore and establish s…

Weakly Supervised One-Shot Detection with Attention Similarity Networks Open

Gil Keren, Maximilian Schmitt, Thomas Kehrenberg, Björn W. Schuller · 2018

Computer science Economics Chemistry

Neural network models that are not conditioned on class identities were shown to facilitate knowledge transfer between classes and to be well-suited for one-shot learning tasks. Following this motivation, we further explore and establish s…

Calibrated Prediction Intervals for Neural Network Regressors Open

Gil Keren, Nicholas Cummins, Björn W. Schuller · 2018

Computer science Mathematics

Ongoing developments in neural network models are continually advancing the state-of-the-art in terms of system accuracy. However, the predicted labels should not be regarded as the only core output; also important is a well-calibrated est…

CAST a database: Rapid targeted large-scale big data acquisition via small-world modelling of social media platforms Open

Shahin Amiriparian, Sergey Pugachevskiy, Nicholas Cummins, Simone Hantke, Jouni Pohjalainen , et al. · 2017

Computer science Mathematics Physics

The adage that there is no data like more data is not new in affective computing; however, with recent advances in deep learning technologies, such as end-to-end learning, the need for extracting big data is greater than ever. Multimedia r…

Fast Single-Class Classification and the Principle of Logit Separation Open

Gil Keren, Sivan Sabato, Björn W. Schuller · 2017

Computer science Mathematics Biology

We consider neural network training, in applications in which there are many possible classes, but at test-time, the task is a binary classification task of determining whether the given example belongs to a specific class, where the class…

Tunable Sensitivity to Large Errors in Neural Network Training Open

Gil Keren, Sivan Sabato, Björn W. Schuller · 2017

Computer science Mathematics Geography

When humans learn a new concept, they might ignore examples that they cannot make sense of at first, and only later focus on such examples, when they are more useful for learning. We propose incorporating this idea of tunable sensitivity f…

Tunable Sensitivity to Large Errors in Neural Network Training Open

Gil Keren, Sivan Sabato, Björn W. Schuller · 2016

Computer science Mathematics Engineering

When humans learn a new concept, they might ignore examples that they cannot make sense of at first, and only later focus on such examples, when they are more useful for learning. We propose incorporating this idea of tunable sensitivity f…

Gil Keren YOU? Author Swipe