Explanipedia

SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models Open

Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich , et al. · 2024

Computer science Chemistry

Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this …

VoxWatch: An open-set speaker recognition benchmark on VoxCeleb Open

Raghuveer Peri, Seyed Omid Sadjadi, Daniel Garcia-Romero · 2023

Computer science Engineering Sociology

Despite its broad practical applications such as in fraud prevention, open-set speaker identification (OSI) has received less attention in the speaker recognition community compared to speaker verification (SV). OSI deals with determining …

Mel frequency spectral domain defenses against adversarial attacks on speech recognition systems Open

Nicholas Mehlman, Anirudh Sreeram, Raghuveer Peri, Shrikanth Narayanan · 2023

Computer science Mathematics Philosophy

Automatic speech recognition (ASR) systems are vulnerable to adversarial attacks due to their reliance on machine learning models. Many of the defenses explored for defending ASR systems simply adapt defense approaches developed for the im…

User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition on Federated Learning Open

Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan · 2022

Computer science Philosophy

Many existing privacy-enhanced speech emotion recognition (SER) frameworks focus on perturbing the original speech data through adversarial training within a centralized machine learning setup. However, this privacy protection scheme can f…

The Silent Treatment? Changes in patient emotional expression after silence Open

Christina S. Soma, Bruce E. Wampold, Nikolaos Flemotomos, Raghuveer Peri, Shrikanth Narayanan , et al. · 2022

Psychology Computer science Philosophy

Psychotherapy can be an emotionally laden conversation, where both verbal and nonverbal interventions may impact the therapeutic process. Prior research has postulated mixed results regarding how clients emotionally react following a silen…

User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Open

Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan · 2022

Computer science

Many existing privacy-enhanced speech emotion recognition (SER) frameworks focus on perturbing the original speech data through adversarial training within a centralized machine learning setup. However, this privacy protection scheme can f…

Mel Frequency Spectral Domain Defenses against Adversarial Attacks on Speech Recognition Systems Open

Nicholas Mehlman, Anirudh Sreeram, Raghuveer Peri, Shrikanth Narayanan · 2022

Computer science Chemistry

A variety of recent works have looked into defenses for deep neural networks against adversarial attacks particularly within the image processing domain. Speech processing applications such as automatic speech recognition (ASR) are increas…

To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition Open

Raghuveer Peri, Krishna Somandepalli, Shrikanth Narayanan · 2022

Computer science Economics

Speaker recognition is increasingly used in several everyday applications including smart speakers, customer care centers and other speech-driven analytics. It is crucial to accurately evaluate and mitigate biases present in machine learni…

Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems Open

Anirudh Sreeram, Nicholas Mehlman, Raghuveer Peri, Dillon Knox, Shrikanth Narayanan · 2021

Computer science Biology Chemistry

In this paper we investigate speech denoising as a defense against adversarial attacks on automatic speech recognition (ASR) systems. Adversarial attacks attempt to force misclassification by adding small perturbations to the original spee…

Disentanglement for Audio-Visual Emotion Recognition Using Multitask Setup Open

Raghuveer Peri, Srinivas Parthasarathy, Charles R. Bradshaw, Shiva Sundaram · 2021

Computer science Economics

Deep learning models trained on audio-visual data have been successfully used to achieve state-of-the-art performance for emotion recognition. In particular, models trained with multitask learning have shown additional performance improvem…

Adversarial Defense for Deep Speaker Recognition Using Hybrid Adversarial Training Open

Monisankha Pal, Arindam Jati, Raghuveer Peri, Chin-Cheng Hsu, Wael AbdAlmageed , et al. · 2021

Computer science Chemistry

Deep neural network based speaker recognition systems can easily be deceived by an adversary using minuscule imperceptible perturbations to the input speech samples. These adversarial attacks pose serious security threats to the speaker re…

"Am I A Good Therapist?" Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies. Open

Nikolaos Flemotomos, Víctor Martínez, Zhuohao Chen, Karan Singla, Victor Ardulov , et al. · 2021

Psychology Computer science Medicine

With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care, in order to assist in training, supervision, and quality assurance of services. Traditionally, qua…

Disentanglement for audio-visual emotion recognition using multitask\n setup Open

Raghuveer Peri, Srinivas Parthasarathy, Charles R. Bradshaw, Shiva Sundaram · 2021

Computer science Economics

Deep learning models trained on audio-visual data have been successfully used\nto achieve state-of-the-art performance for emotion recognition. In particular,\nmodels trained with multitask learning have shown additional performance\nimpro…

Meta-Learning With Latent Space Clustering in Generative Adversarial Network for Speaker Diarization Open

Monisankha Pal, Manoj Kumar, Raghuveer Peri, Tae Jin Park, So Hyun Kim , et al. · 2021

Computer science Chemistry

The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encod…

Meta-learning with Latent Space Clustering in Generative Adversarial\n Network for Speaker Diarization Open

Monisankha Pal, M. Kumar, Raghuveer Peri, Tae Jin Park, So Hyun Kim , et al. · 2020

Computer science Chemistry

The performance of most speaker diarization systems with x-vector embeddings\nis both vulnerable to noisy environments and lacks domain robustness. Earlier\nwork on speaker diarization using generative adversarial network (GAN) with an\nen…

An Empirical Analysis of Information Encoded in Disentangled Neural Speaker Representations Open

Raghuveer Peri, Haoqi Li, Krishna Somandepalli, Arindam Jati, Shrikanth Narayanan · 2020

Computer science Chemistry Physics

The primary characteristic of robust speaker representations is that they are invariant to factors of variability not related to speaker identity. Disentanglement of speaker representations is one of the techniques used to improve robustne…

Robust Speaker Recognition Using Unsupervised Adversarial Invariance Open

Raghuveer Peri, Monisankha Pal, Arindam Jati, Krishna Somandepalli, Shrikanth Narayanan · 2020

Computer science Sociology Chemistry

In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations. We adopt a recently proposed unsupervised adversarial inv…

Speaker Diarization Using Latent Space Clustering in Generative Adversarial Network Open

Monisankha Pal, Manoj Kumar, Raghuveer Peri, Tae Jin Park, So Hyun Kim , et al. · 2020

Computer science Geography

In this work, we propose deep latent space clustering for speaker diarization using generative adversarial network (GAN) backprojection with the help of an encoder network. The proposed diarization system is trained jointly with GAN loss, …

An empirical analysis of information encoded in disentangled neural\n speaker representations Open

Raghuveer Peri, Haoqi Li, Krishna Somandepalli, Arindam Jati, Shrikanth Narayanan · 2020

Computer science Mathematics Chemistry

The primary characteristic of robust speaker representations is that they are\ninvariant to factors of variability not related to speaker identity.\nDisentanglement of speaker representations is one of the techniques used to\nimprove robus…

A study of semi-supervised speaker diarization system using gan mixture model Open

Monisankha Pal, Manoj Kumar, Raghuveer Peri, Shrikanth Narayanan · 2019

Computer science Geology

We propose a new speaker diarization system based on a recently introduced unsupervised clustering technique namely, generative adversarial network mixture model (GANMM). The proposed system uses x-vectors as front-end representation. Spec…

Raghuveer Peri YOU? Author Swipe