Raghuveer Peri
YOU?
Author Swipe
View article: SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models Open
Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this …
View article: VoxWatch: An open-set speaker recognition benchmark on VoxCeleb
VoxWatch: An open-set speaker recognition benchmark on VoxCeleb Open
Despite its broad practical applications such as in fraud prevention, open-set speaker identification (OSI) has received less attention in the speaker recognition community compared to speaker verification (SV). OSI deals with determining …
View article: Mel frequency spectral domain defenses against adversarial attacks on speech recognition systems
Mel frequency spectral domain defenses against adversarial attacks on speech recognition systems Open
Automatic speech recognition (ASR) systems are vulnerable to adversarial attacks due to their reliance on machine learning models. Many of the defenses explored for defending ASR systems simply adapt defense approaches developed for the im…
View article: User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition on Federated Learning
User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition on Federated Learning Open
Many existing privacy-enhanced speech emotion recognition (SER) frameworks focus on perturbing the original speech data through adversarial training within a centralized machine learning setup. However, this privacy protection scheme can f…
View article: The Silent Treatment? Changes in patient emotional expression after silence
The Silent Treatment? Changes in patient emotional expression after silence Open
Psychotherapy can be an emotionally laden conversation, where both verbal and nonverbal interventions may impact the therapeutic process. Prior research has postulated mixed results regarding how clients emotionally react following a silen…
View article: User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning
User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Open
Many existing privacy-enhanced speech emotion recognition (SER) frameworks focus on perturbing the original speech data through adversarial training within a centralized machine learning setup. However, this privacy protection scheme can f…
View article: Mel Frequency Spectral Domain Defenses against Adversarial Attacks on Speech Recognition Systems
Mel Frequency Spectral Domain Defenses against Adversarial Attacks on Speech Recognition Systems Open
A variety of recent works have looked into defenses for deep neural networks against adversarial attacks particularly within the image processing domain. Speech processing applications such as automatic speech recognition (ASR) are increas…
View article: To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition
To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition Open
Speaker recognition is increasingly used in several everyday applications including smart speakers, customer care centers and other speech-driven analytics. It is crucial to accurately evaluate and mitigate biases present in machine learni…
View article: Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems
Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems Open
In this paper we investigate speech denoising as a defense against adversarial attacks on automatic speech recognition (ASR) systems. Adversarial attacks attempt to force misclassification by adding small perturbations to the original spee…
View article: Disentanglement for Audio-Visual Emotion Recognition Using Multitask Setup
Disentanglement for Audio-Visual Emotion Recognition Using Multitask Setup Open
Deep learning models trained on audio-visual data have been successfully used to achieve state-of-the-art performance for emotion recognition. In particular, models trained with multitask learning have shown additional performance improvem…
View article: Adversarial Defense for Deep Speaker Recognition Using Hybrid Adversarial Training
Adversarial Defense for Deep Speaker Recognition Using Hybrid Adversarial Training Open
Deep neural network based speaker recognition systems can easily be deceived by an adversary using minuscule imperceptible perturbations to the input speech samples. These adversarial attacks pose serious security threats to the speaker re…
View article: "Am I A Good Therapist?" Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies.
"Am I A Good Therapist?" Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies. Open
With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care, in order to assist in training, supervision, and quality assurance of services. Traditionally, qua…
View article: Disentanglement for audio-visual emotion recognition using multitask\n setup
Disentanglement for audio-visual emotion recognition using multitask\n setup Open
Deep learning models trained on audio-visual data have been successfully used\nto achieve state-of-the-art performance for emotion recognition. In particular,\nmodels trained with multitask learning have shown additional performance\nimpro…
View article: Meta-Learning With Latent Space Clustering in Generative Adversarial Network for Speaker Diarization
Meta-Learning With Latent Space Clustering in Generative Adversarial Network for Speaker Diarization Open
The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encod…
View article: Meta-learning with Latent Space Clustering in Generative Adversarial\n Network for Speaker Diarization
Meta-learning with Latent Space Clustering in Generative Adversarial\n Network for Speaker Diarization Open
The performance of most speaker diarization systems with x-vector embeddings\nis both vulnerable to noisy environments and lacks domain robustness. Earlier\nwork on speaker diarization using generative adversarial network (GAN) with an\nen…
View article: An Empirical Analysis of Information Encoded in Disentangled Neural Speaker Representations
An Empirical Analysis of Information Encoded in Disentangled Neural Speaker Representations Open
The primary characteristic of robust speaker representations is that they are invariant to factors of variability not related to speaker identity. Disentanglement of speaker representations is one of the techniques used to improve robustne…
View article: Robust Speaker Recognition Using Unsupervised Adversarial Invariance
Robust Speaker Recognition Using Unsupervised Adversarial Invariance Open
In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations. We adopt a recently proposed unsupervised adversarial inv…
View article: Speaker Diarization Using Latent Space Clustering in Generative Adversarial Network
Speaker Diarization Using Latent Space Clustering in Generative Adversarial Network Open
In this work, we propose deep latent space clustering for speaker diarization using generative adversarial network (GAN) backprojection with the help of an encoder network. The proposed diarization system is trained jointly with GAN loss, …
View article: An empirical analysis of information encoded in disentangled neural\n speaker representations
An empirical analysis of information encoded in disentangled neural\n speaker representations Open
The primary characteristic of robust speaker representations is that they are\ninvariant to factors of variability not related to speaker identity.\nDisentanglement of speaker representations is one of the techniques used to\nimprove robus…
View article: A study of semi-supervised speaker diarization system using gan mixture model
A study of semi-supervised speaker diarization system using gan mixture model Open
We propose a new speaker diarization system based on a recently introduced unsupervised clustering technique namely, generative adversarial network mixture model (GANMM). The proposed system uses x-vectors as front-end representation. Spec…