Alfonso Ortega
YOU?
Author Swipe
View article: Sparse Autoencoders Make Audio Foundation Models more Explainable
Sparse Autoencoders Make Audio Foundation Models more Explainable Open
Audio pretrained models are widely employed to solve various tasks in speech processing, sound event detection, or music information retrieval. However, the representations learned by these models are unclear, and their analysis mainly res…
View article: There Was Never a Bottleneck in Concept Bottleneck Models
There Was Never a Bottleneck in Concept Bottleneck Models Open
Deep learning representations are often difficult to interpret, which can hinder their deployment in sensitive applications. Concept Bottleneck Models (CBMs) have emerged as a promising approach to mitigate this issue by learning represent…
View article: Beyond Global Metrics: A Fairness Analysis for Interpretable Voice Disorder Detection Systems
Beyond Global Metrics: A Fairness Analysis for Interpretable Voice Disorder Detection Systems Open
We conducted a comprehensive analysis of an Automatic Voice Disorders Detection (AVDD) system using existing voice disorder datasets with available demographic metadata. The study involved analysing system performance across various demogr…
View article: Tunability of a microchip Yb:KGW solid-state laser self-injected with different wavelength-selective external cavities
Tunability of a microchip Yb:KGW solid-state laser self-injected with different wavelength-selective external cavities Open
This work investigates tunable emission in a continuous-wave Yb:KGW microchip-type solid-state laser utilizing an external cavity. While microchip lasers offer advantages like compactness and simplicity, achieving broad tunability within t…
View article: La potestad sancionadora de la Agencia Española de Protección de Datos en materia de transferencias internacionales de datos de carácter personal
La potestad sancionadora de la Agencia Española de Protección de Datos en materia de transferencias internacionales de datos de carácter personal Open
The design of an adequate, effective and efficient regulatory regulation aimed at protecting the holder of the right to data protection derived from an international transfer of data constitutes a real challenge. This is a matter that is n…
View article: Angular Distance Distribution Loss for Audio Classification
Angular Distance Distribution Loss for Audio Classification Open
Classification is a pivotal task in deep learning not only because of its intrinsic importance, but also for providing embeddings with desirable properties in other tasks. To optimize these properties, a wide variety of loss functions have…
View article: Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges
Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges Open
Nowadays, the large amount of audio-visual content available has fostered the need to develop new robust automatic speaker diarization systems to analyse and characterise it. This kind of system helps to reduce the cost of doing this proce…
View article: Rethinking Disentanglement under Dependent Factors of Variation
Rethinking Disentanglement under Dependent Factors of Variation Open
Representation learning is an approach that allows to discover and extract the factors of variation from the data. Intuitively, a representation is said to be disentangled if it separates the different factors of variation in a way that is…
View article: Predefined Prototypes for Intra-Class Separation and Disentanglement
Predefined Prototypes for Intra-Class Separation and Disentanglement Open
Prototypical Learning is based on the idea that there is a point (which we call prototype) around which the embeddings of a class are clustered. It has shown promising results in scenarios with little labeled data or to design explainable …
View article: Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing
Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing Open
Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is…
View article: Automatic Voice Disorder Detection from a Practical Perspective
Automatic Voice Disorder Detection from a Practical Perspective Open
View article: Unsupervised multiple domain translation through controlled Disentanglement in variational autoencoder
Unsupervised multiple domain translation through controlled Disentanglement in variational autoencoder Open
International audience
View article: Unsupervised Multiple Domain Translation through Controlled Disentanglement in Variational Autoencoder
Unsupervised Multiple Domain Translation through Controlled Disentanglement in Variational Autoencoder Open
Unsupervised Multiple Domain Translation is the task of transforming data from one domain to other domains without having paired data to train the systems. Typically, methods based on Generative Adversarial Networks (GANs) are used to addr…
View article: An Explainable Proxy Model for Multiabel Audio Segmentation
An Explainable Proxy Model for Multiabel Audio Segmentation Open
Audio signal segmentation is a key task for automatic audio indexing. It consists of detecting the boundaries of class-homogeneous segments in the signal. In many applications, explainable AI is a vital process for transparency of decision…
View article: Implementation of a High-Frequency Phosphor Thermometry Technique to Study the Heat Transfer of a Single Droplet Impingement
Implementation of a High-Frequency Phosphor Thermometry Technique to Study the Heat Transfer of a Single Droplet Impingement Open
View article: Tunable emission of an external-cavity Yb:KGW monolithic solid-state laser
Tunable emission of an external-cavity Yb:KGW monolithic solid-state laser Open
Tunable microchip/monolithic lasers are expected to show promising applications. In this work external cavity techniques are applied to a Yb:KGW monolithic laser achieving 35 nm tuning by balancing N m and N p emission polarizations.
View article: Cross-Corpus Training Strategy for Speech Emotion Recognition Using Self-Supervised Representations
Cross-Corpus Training Strategy for Speech Emotion Recognition Using Self-Supervised Representations Open
Speech Emotion Recognition (SER) plays a crucial role in applications involving human-machine interaction. However, the scarcity of suitable emotional speech datasets presents a major challenge for accurate SER systems. Deep Neural Network…
View article: An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies
An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies Open
Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organize…
View article: Deep Learning for chaos detection
Deep Learning for chaos detection Open
In this article, we study how a chaos detection problem can be solved using Deep Learning techniques. We consider two classical test examples: the Logistic map as a discrete dynamical system and the Lorenz system as a continuous dynamical …
View article: Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification
Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification Open
Despite the maturity of modern speaker verification technology, its performance still significantly degrades when facing non-neutrally-phonated (e.g., shouted and whispered) speech. To address this issue, in this paper, we propose a new sp…
View article: On the Problem of Data Availability in Automatic Voice Disorder Detection
On the Problem of Data Availability in Automatic Voice Disorder Detection Open
View article: Automatic Voice Disorder Detection Using Self-Supervised Representations
Automatic Voice Disorder Detection Using Self-Supervised Representations Open
Many speech features and models, including Deep Neural Networks (DNN), are used for classification tasks between healthy and pathological speech with the Saarbruecken Voice Database (SVD). However, accuracy values of 80.71% for phrases or …
View article: Class token and knowledge distillation for multi-head self-attention speaker verification systems
Class token and knowledge distillation for multi-head self-attention speaker verification systems Open
This paper explores three novel approaches to improve the performance of speaker verification (SV) systems based on deep neural networks (DNN) using Multi-head Self-Attention (MSA) mechanisms and memory layers. Firstly, we propose the use …
View article: A Study on the Use of wav2vec Representations for Multiclass Audio Segmentation
A Study on the Use of wav2vec Representations for Multiclass Audio Segmentation Open
International audience
View article: Cross-Corpus Speech Emotion Recognition with HuBERT Self-Supervised Representation
Cross-Corpus Speech Emotion Recognition with HuBERT Self-Supervised Representation Open
International audience
View article: I4U System Description for NIST SRE'20 CTS Challenge
I4U System Description for NIST SRE'20 CTS Challenge Open
This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge. The I4U's submission was resulted from active collaboration among researchers across eig…
View article: Wiener Filter and Deep Neural Networks: A Well-Balanced Pair for Speech Enhancement
Wiener Filter and Deep Neural Networks: A Well-Balanced Pair for Speech Enhancement Open
This paper proposes a Deep Learning (DL) based Wiener filter estimator for speech enhancement in the framework of the classical spectral-domain speech estimator algorithm. According to the characteristics of the intermediate steps of the s…
View article: Unsupervised Anomaly Detection Applied to Φ-OTDR
Unsupervised Anomaly Detection Applied to Φ-OTDR Open
Distributed acoustic sensors (DASs) based on direct-detection Φ-OTDR use the light–matter interaction between light pulses and optical fiber to detect mechanical events in the fiber environment. The signals received in Φ-OTDR come from the…
View article: Detección automática de emociones a partir de la voz combinando bases de datos para aumentar el entrenamiento
Detección automática de emociones a partir de la voz combinando bases de datos para aumentar el entrenamiento Open
La voz es la vía de comunicación más natural para el ser humano, aportando tanto información lingüística, como del estado emocional del hablante. Con el objetivo de aumentar la precisión de los sistemas de reconocimiento de emociones en vo…
View article: RODRÍGUEZ PINEAU, E. y TORRALBA MENDIOLA, E. (Dirs.), La protección de las transmisiones de datos transfronterizas, Thomson-Reuters-Aranzadi, Cizur Menor, 2022, 412 pp.
RODRÍGUEZ PINEAU, E. y TORRALBA MENDIOLA, E. (Dirs.), La protección de las transmisiones de datos transfronterizas, Thomson-Reuters-Aranzadi, Cizur Menor, 2022, 412 pp. Open
Recensión de Alfonso Ortega Giménez. RODRÃGUEZ PINEAU, E. y TORRALBA MENDIOLA, E. (Dirs.), La protección de las transmisiones de datos transfronterizas, Thomson-Reuters-Aranzadi, Cizur Menor, 2022, 412 pp.