Explanipedia

Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models Open

Ognjen, Rudovic, Pranay Dighe, Yi Su, Vineet Garg , et al. · 2024

Follow-up conversations with virtual assistants (VAs) enable a user to seamlessly interact with a VA without the need to repeatedly invoke it using a keyword (after the first query). Therefore, accurate Device-directed Speech Detection (DD…

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection Open

Shruti Palaskar, Oggi Rudovic, Sameer Dharur, Florian Pesce, Gautam Krishna , et al. · 2024

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-train…

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness Open

Satyam Kumar, Sai Srujana Buddi, Utkarsh Sarawgi, Vineet Garg, Sumit Ranjan , et al. · 2024

Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speech enhancement, and hands-free communication systems. With the increasing demand for personalized and context-aware technologies…

IoT-Powered Hydroponics System: A Real-Time Monitoring and Control System Open

M. S. Shalini, Saurabh Adya, S Bhavana, H. R. Neha, Sonu Shivani · 2024

The research investigates the integration of IoT technology in hydroponics with the aim of enhancing sustainable agriculture through real-time monitoring and automated control. The IoT-powered hydroponics system utilizes sensor networks to…

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features Open

Gautam Krishna, Sameer Dharur, Oggi Rudovic, Pranay Dighe, Saurabh Adya , et al. · 2023

Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g aco…

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance Open

Utkarsh Sarawgi, John Berkowitz, Vineet Garg, Arnav Kundu, Minsik Cho , et al. · 2023

Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms. Hence, increasing the learning capacity of such streaming models (i.e., by adding mor…

eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models Open

Minsik Cho, Keivan Alizadeh Vahid, Qichen Fu, Saurabh Adya, Carlo C. Del Mundo , et al. · 2023

Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, …

Efficient Multimodal Neural Networks for Trigger-less Voice Assistants Open

Sai Srujana Buddi, Utkarsh Sarawgi, Tashweena Heeramun, Karan Sawnhey, Ed Yanosik , et al. · 2023

The adoption of multimodal interactions by Voice Assistants (VAs) is growing rapidly to enhance human-computer interactions. Smartwatches have now incorporated trigger-less methods of invoking VAs, such as Raise To Speak (RTS), where the u…

PDP: Parameter-free Differentiable Pruning is All You Need Open

Minsik Cho, Saurabh Adya, Devang Naik · 2023

DNN pruning is a popular way to reduce the size of a model, improve the inference latency, and minimize the power consumption on DNN accelerators. However, existing approaches might be too complex, expensive or ineffective to apply to a va…

R2 Loss: Range Restriction Loss for Model Compression and Quantization Open

Arnav Kundu, Chungkuk Yoo, Srijan Mishra, Minsik Cho, Saurabh Adya · 2023

Model quantization and compression is widely used techniques to reduce usage of computing resource at inference time. While state-of-the-art works have been achieved reasonable accuracy with higher bit such as 4bit or 8bit, but still it is…

Improving Voice Trigger Detection with Metric Learning Open

Prateeth Nayak, Takuya Higuchi, Anmol Gupta, Shivesh Ranjan, Stephen Shum , et al. · 2022

Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice t…

Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models Open

Vineet Garg, Ognjen Rudovic, Pranay Dighe, Ahmed Hussen Abdelaziz, Erik Marchi , et al. · 2022

We address the problem of detecting speech directed to a device that does not contain a specific wake-word. Specifically, we focus on audio coming from a touch-based invocation. Mitigating virtual assistants (VAs) activation due to acciden…

DKM: Differentiable K-Means Clustering Layer for Neural Network Compression Open

Minsik Cho, Keivan Alizadeh-Vahid, Saurabh Adya, Mohammad Rastegari · 2021

Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering …

Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation Open

Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha , et al. · 2021

We present a unified and hardware efficient architecture for two stage voice trigger detection (VTD) and false trigger mitigation (FTM) tasks. Two stage VTD systems of voice assistants can get falsely activated to audio segments acoustical…

Streaming Transformer for Hardware Efficient Voice Trigger Detection and\n False Trigger Mitigation Open

Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha , et al. · 2021

We present a unified and hardware efficient architecture for two stage voice\ntrigger detection (VTD) and false trigger mitigation (FTM) tasks. Two stage VTD\nsystems of voice assistants can get falsely activated to audio segments\nacousti…

Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering Open

Saurabh Adya, Vineet Garg, Siddharth Sigtia, Pramod Simha, Chandra Dhir · 2020

We consider the design of two-pass voice trigger detection systems. We focus on the networks in the second pass that are used to re-score candidate segments obtained from the first-pass. Our baseline is an acoustic model(AM), with BiLSTM l…

Lattice-Based Improvements for Voice Triggering Using Graph Neural Networks Open

Pranay Dighe, Saurabh Adya, Nuoyu Li, Srikanth Vishnubhotla, Devang Naik , et al. · 2020

Voice-triggered smart assistants often rely on detection of a trigger-phrase before they start listening for the user request. Mitigation of false triggers is an important aspect of building a privacy-centric non-intrusive smart assistant.…

Lattice-based Improvements for Voice Triggering Using Graph Neural\n Networks Open

Pranay Dighe, Saurabh Adya, Nuoyu Li, Srikanth Vishnubhotla, Devang Naik , et al. · 2020

Voice-triggered smart assistants often rely on detection of a trigger-phrase\nbefore they start listening for the user request. Mitigation of false triggers\nis an important aspect of building a privacy-centric non-intrusive smart\nassista…

Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training Open

Saurabh Adya, Vinay Palakkode, Oncel Tuzel · 2018

Nonlinear conjugate gradient (NLCG) based optimizers have shown superior loss convergence properties compared to gradient descent based optimizers for traditional optimization problems. However, in Deep Neural Network (DNN) training, the d…

Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN\n Training Open

Saurabh Adya, Vinay Palakkode, Oncel Tuzel · 2018

Nonlinear conjugate gradient (NLCG) based optimizers have shown superior loss\nconvergence properties compared to gradient descent based optimizers for\ntraditional optimization problems. However, in Deep Neural Network (DNN)\ntraining, th…

Democratizing Production-Scale Distributed Deep Learning Open

Minghuang Ma, Hadi Pouransari, Daniel L. Chao, Saurabh Adya, Santiago Akle Serrano , et al. · 2018

The interest and demand for training deep neural networks have been experiencing rapid growth, spanning a wide range of applications in both academia and industry. However, training them distributed and at scale remains difficult due to th…

Saurabh Adya YOU? Author Swipe