Explanipedia

Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential Open

Mohammad Samragh, Arnav Kundu, David A Harrison, Kumari Nishu, Devang Naik , et al. · 2025

Autoregressive language models are constrained by their inherently sequential nature, generating one token at a time. This paradigm limits inference speed and parallelism, especially during later stages of generation when the direction and…

From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs Open

Kumari Nishu, Sachin Mehta, Samira Abnar, Mehrdad Farajtabar, Maxwell Horton , et al. · 2025

Training large language models (LLMs) for different inference constraints is computationally expensive, limiting control over efficiency-accuracy trade-offs. Moreover, once trained, these models typically process tokens uniformly, regardle…

M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference Open

Nikhil Bhendawade, Mahyar Najibi, Devang Naik, I. I. Belousova · 2025

Residual transformations enhance the representational depth and expressive power of large language models (LLMs). However, applying static residual transformations across all tokens in auto-regressive generation leads to a suboptimal trade…

Knowledge Transfer For Efficient On-Device False Trigger Mitigation Open

Pranay Dighe, Erik Marchi, Srikanth Vishnubhotla, Sachin Kajarekar, Devang Naik · 2024

In this paper, we address the task of determining whether a given utterance is directed towards a voice-enabled smart-assistant device or not. An undirected utterance is termed as a "false trigger" and false trigger mitigation (FTM) is ess…

An Efficient and Streaming Audio Visual Active Speaker Detection System Open

Arnav Kundu, Yanzi Jin, Mohammad Hossein Sekhavat, Max Horton, Danny Tormoen , et al. · 2024

This paper delves into the challenging task of Active Speaker Detection (ASD), where the system needs to determine in real-time whether a person is speaking or not in a series of video frames. While previous works have made significant str…

KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation Open

Minsik Cho, Mohammad Rastegari, Devang Naik · 2024

Large Language Model or LLM inference has two phases, the prompt (or prefill) phase to output the first token and the extension (or decoding) phase to the generate subsequent tokens. In this work, we propose an efficient parallelization sc…

Weight subcloning: direct initialization of transformers using larger pretrained ones Open

Mohammad Samragh, Mehrdad Farajtabar, Sachin Mehta, Raviteja Vemulapalli, Fartash Faghri , et al. · 2023

Training large transformer models from scratch for a target task requires lots of data and is computationally demanding. The usual practice of transfer learning overcomes this challenge by initializing the model with weights of a pretraine…

eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models Open

Minsik Cho, Keivan Alizadeh Vahid, Qichen Fu, Saurabh Adya, Carlo C. Del Mundo , et al. · 2023

Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, …

Optimize What Matters: Training DNN-Hmm Keyword Spotting Model Using End Metric Open

Ashish Shrivastava, Arnav Kundu, Chandra Dhir, Devang Naik, Oncel Tuzel · 2021

Deep Neural Network--Hidden Markov Model (DNN-HMM) based methods have been successfully used for many always-on keyword spotting algorithms that detect a wake word to trigger a device. The DNN predicts the state probabilities of a given sp…

On The Role of Visual Cues in Audiovisual Speech Enhancement Open

Zakaria Aldeneh, Anushree Prasanna Kumar, Barry-John Theobald, Erik Marchi, Sachin Kajarekar , et al. · 2021

We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show …

Complementary Language Model and Parallel Bi-LRNN for False Trigger Mitigation Open

Rishika Agarwal, Xiaochuan Niu, Pranay Dighe, Srikanth Vishnubhotla, Sameer Badaskar , et al. · 2020

False triggers in voice assistants are unintended invocations of the assistant, which not only degrade the user experience but may also compromise privacy. False trigger mitigation (FTM) is a process to detect the false trigger events and …

Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement. Open

Zakaria Aldeneh, Anushree Prasanna Kumar, Barry-John Theobald, Erik Marchi, Sachin Kajarekar , et al. · 2020

We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show …

Multi-Task Learning for Speaker Verification and Voice Trigger Detection Open

Siddharth Sigtia, Erik Marchi, Sachin Kajarekar, Devang Naik, John S. Bridle · 2020

Automatic speech transcription and speaker recognition are usually treated as\nseparate tasks even though they are interdependent. In this study, we\ninvestigate training a single network to perform both tasks jointly. We train\nthe networ…

Lattice-Based Improvements for Voice Triggering Using Graph Neural Networks Open

Pranay Dighe, Saurabh Adya, Nuoyu Li, Srikanth Vishnubhotla, Devang Naik , et al. · 2020

Voice-triggered smart assistants often rely on detection of a trigger-phrase before they start listening for the user request. Mitigation of false triggers is an important aspect of building a privacy-centric non-intrusive smart assistant.…

Detecting Emotion Primitives from Speech and their use in discerning Categorical Emotions Open

Vasudha Kowtha, Vikramjit Mitra, Chris Bartels, Erik Marchi, Sue Booker , et al. · 2020

Emotion plays an essential role in human-to-human communication, enabling us to convey feelings such as happiness, frustration, and sincerity. While modern speech technologies rely heavily on speech recognition and natural language underst…

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice Open

Vikramjit Mitra, Sue Booker, Erik Marchi, David Scott Farrar, Ute Dorothea Peitz , et al. · 2019

Millions of people reach out to digital assistants such as Siri every day, asking for information, making phone calls, seeking assistance, and much more. The expectation is that such assistants should understand the intent of the users que…

Devang Naik YOU? Author Swipe