Victoria Mingote
YOU?
Author Swipe
View article: Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges
Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges Open
Nowadays, the large amount of audio-visual content available has fostered the need to develop new robust automatic speaker diarization systems to analyse and characterise it. This kind of system helps to reduce the cost of doing this proce…
View article: Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation Open
Research in multilingual speech-to-text translation is topical. Having a single model that supports multiple translation tasks is desirable. The goal of this work it to improve cross-lingual transfer learning in multilingual speech-to-text…
View article: Direct Text to Speech Translation System Using Acoustic Units
Direct Text to Speech Translation System Using Acoustic Units Open
This paper proposes a direct text to speech translation system using discrete\nacoustic units. This framework employs text in different source languages as\ninput to generate speech in the target language without the need for text\ntranscr…
View article: Class token and knowledge distillation for multi-head self-attention speaker verification systems
Class token and knowledge distillation for multi-head self-attention speaker verification systems Open
This paper explores three novel approaches to improve the performance of speaker verification (SV) systems based on deep neural networks (DNN) using Multi-head Self-Attention (MSA) mechanisms and memory layers. Firstly, we propose the use …
View article: Representation and Metric Learning Advances for Deep Neural Network Face and Speaker Biometric Systems
Representation and Metric Learning Advances for Deep Neural Network Face and Speaker Biometric Systems Open
El aumento del uso de dispositivos tecnológicos y sistemas de reconocimiento biométrico en la vida cotidiana de las personas ha motivado un gran interés en la investigación y el desarrollo de sistemas eficaces y robustos. Sin embargo, toda…
View article: Multimodal Diarization Systems by Training Enrollment Models as Identity Representations
Multimodal Diarization Systems by Training Enrollment Models as Identity Representations Open
This paper describes a post-evaluation analysis of the system developed by ViVoLAB research group for the IberSPEECH-RTVE 2020 Multimodal Diarization (MD) Challenge. This challenge focuses on the study of multimodal systems for the diariza…
View article: aDCF Loss Function for Deep Metric Learning in End-to-End Text-Dependent Speaker Verification Systems
aDCF Loss Function for Deep Metric Learning in End-to-End Text-Dependent Speaker Verification Systems Open
Metric learning approaches have widely expanded to the training of Speaker Verification (SV) systems based on Deep Neural Networks (DNNs), by using a loss function more consistent with the evaluation process than the traditional identifica…
View article: Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems
Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems Open
This paper explores three novel approaches to improve the performance of speaker verification (SV) systems based on deep neural networks (DNN) using Multi-head Self-Attention (MSA) mechanisms and memory layers. Firstly, we propose the use …
View article: Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data
Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data Open
Area under the ROC curve (AUC) optimisation techniques developed for neural networks have recently demonstrated their capabilities in different audio and speech related tasks. However, due to its intrinsic nature, AUC optimisation has focu…
View article: Log-Likelihood-Ratio Cost Function as Objective Loss for Speaker Verification Systems
Log-Likelihood-Ratio Cost Function as Objective Loss for Speaker Verification Systems Open
International audience
View article: Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data
Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data Open
Area under the ROC curve (AUC) optimisation techniques developed for neural networks have recently demonstrated their capabilities in different audio and speech related tasks. However, due to its intrinsic nature, AUC optimisation has focu…
View article: Optimization of the area under the ROC curve using neural network supervectors for text-dependent speaker verification
Optimization of the area under the ROC curve using neural network supervectors for text-dependent speaker verification Open
View article: Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification
Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification Open
In this paper, we propose a new differentiable neural network with an alignment mechanism for text-dependent speaker verification. Unlike previous works, we do not extract the embedding of an utterance from the global average pooling of th…
View article: Optimization of the Area Under the ROC Curve using Neural Network\n Supervectors for Text-Dependent Speaker Verification
Optimization of the Area Under the ROC Curve using Neural Network\n Supervectors for Text-Dependent Speaker Verification Open
This paper explores two techniques to improve the performance of\ntext-dependent speaker verification systems based on deep neural networks.\nFirstly, we propose a general alignment mechanism to keep the temporal\nstructure of each phrase …
View article: Differentiable Supervector Extraction for Encoding Speaker and Phrase\n Information in Text Dependent Speaker Verification
Differentiable Supervector Extraction for Encoding Speaker and Phrase\n Information in Text Dependent Speaker Verification Open
In this paper, we propose a new differentiable neural network alignment\nmechanism for text-dependent speaker verification which uses alignment models\nto produce a supervector representation of an utterance. Unlike previous works\nwith si…
View article: Differentiable Supervector Extraction for Encoding Speaker and Phrase Information in Text Dependent Speaker Verification
Differentiable Supervector Extraction for Encoding Speaker and Phrase Information in Text Dependent Speaker Verification Open
In this paper, we propose a new differentiable neural network alignment mechanism for text-dependent speaker verification which uses alignment models to produce a supervector representation of an utterance. Unlike previous works with simil…