Explanipedia

Sparse Autoencoders Make Audio Foundation Models more Explainable Open

Théo Mariotte, Martin Lebourdais, Antonio Almudévar, Marie Tahon, Alfonso Ortega , et al. · 2025

Audio pretrained models are widely employed to solve various tasks in speech processing, sound event detection, or music information retrieval. However, the representations learned by these models are unclear, and their analysis mainly res…

There Was Never a Bottleneck in Concept Bottleneck Models Open

Antonio Almudévar, José Miguel Hernández-Lobato, Alfonso Ortega · 2025

Deep learning representations are often difficult to interpret, which can hinder their deployment in sensitive applications. Concept Bottleneck Models (CBMs) have emerged as a promising approach to mitigate this issue by learning represent…

Beyond Global Metrics: A Fairness Analysis for Interpretable Voice Disorder Detection Systems Open

Mariel Estévez, Cyntia Bonomi, Dayana Ribas, Alfonso Ortega, Luciana Ferrer · 2025

We conducted a comprehensive analysis of an Automatic Voice Disorders Detection (AVDD) system using existing voice disorder datasets with available demographic metadata. The study involved analysing system performance across various demogr…

Tunability of a microchip Yb:KGW solid-state laser self-injected with different wavelength-selective external cavities Open

Haroldo Maestre, Miguel Cuenca, Alfonso Ortega · 2025

This work investigates tunable emission in a continuous-wave Yb:KGW microchip-type solid-state laser utilizing an external cavity. While microchip lasers offer advantages like compactness and simplicity, achieving broad tunability within t…

La potestad sancionadora de la Agencia Española de Protección de Datos en materia de transferencias internacionales de datos de carácter personal Open

Alfonso Ortega · 2025

The design of an adequate, effective and efficient regulatory regulation aimed at protecting the holder of the right to data protection derived from an international transfer of data constitutes a real challenge. This is a matter that is n…

Angular Distance Distribution Loss for Audio Classification Open

Antonio Almudévar, Romain Serizel, Alfonso Ortega · 2024

Classification is a pivotal task in deep learning not only because of its intrinsic importance, but also for providing embeddings with desirable properties in other tasks. To optimize these properties, a wide variety of loss functions have…

Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges Open

Victoria Mingote, Alfonso Ortega, Antonio Miguel, Eduardo Lleida · 2024

Nowadays, the large amount of audio-visual content available has fostered the need to develop new robust automatic speaker diarization systems to analyse and characterise it. This kind of system helps to reduce the cost of doing this proce…

Rethinking Disentanglement under Dependent Factors of Variation Open

Antonio Almudévar, Alfonso Ortega, Luis Vicente, Antonio Miguel, Eduardo Lleida · 2024

Representation learning is an approach that allows to discover and extract the factors of variation from the data. Intuitively, a representation is said to be disentangled if it separates the different factors of variation in a way that is…

Predefined Prototypes for Intra-Class Separation and Disentanglement Open

Antonio Almudévar, Théo Mariotte, Alfonso Ortega, Marie Tahon, Luis Vicente , et al. · 2024

Prototypical Learning is based on the idea that there is a point (which we call prototype) around which the embeddings of a class are clustered. It has shown promising results in scenarios with little labeled data or to design explainable …

Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing Open

Martin Lebourdais, Théo Mariotte, Antonio Almudévar, Marie Tahon, Alfonso Ortega · 2024

Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is…

Automatic Voice Disorder Detection from a Practical Perspective Open

J. Muñoz Vidal, Dayana Ribas, Cyntia Bonomi, Eduardo Lleida, Luciana Ferrer , et al. · 2024

Unsupervised multiple domain translation through controlled Disentanglement in variational autoencoder Open

Antonio Almudévar, Théo Mariotte, Alfonso Ortega, Marie Tahon · 2024

International audience

Unsupervised Multiple Domain Translation through Controlled Disentanglement in Variational Autoencoder Open

Antonio Almudévar, Théo Mariotte, Alfonso Ortega, Marie Tahon · 2024

Unsupervised Multiple Domain Translation is the task of transforming data from one domain to other domains without having paired data to train the systems. Typically, methods based on Generative Adversarial Networks (GANs) are used to addr…

An Explainable Proxy Model for Multiabel Audio Segmentation Open

Théo Mariotte, Antonio Almudévar, Marie Tahon, Alfonso Ortega · 2024

Audio signal segmentation is a key task for automatic audio indexing. It consists of detecting the boundaries of class-homogeneous segments in the signal. In many applications, explainable AI is a vital process for transparency of decision…

Implementation of a High-Frequency Phosphor Thermometry Technique to Study the Heat Transfer of a Single Droplet Impingement Open

Víctor Alonso Martínez, Alfonso Ortega · 2024

Tunable emission of an external-cavity Yb:KGW monolithic solid-state laser Open

Miguel Cuenca, Alfonso Ortega, Haroldo Maestre · 2024

Tunable microchip/monolithic lasers are expected to show promising applications. In this work external cavity techniques are applied to a Yb:KGW monolithic laser achieving 35 nm tuning by balancing N m and N p emission polarizations.

Cross-Corpus Training Strategy for Speech Emotion Recognition Using Self-Supervised Representations Open

Miguel A. Pastor, Dayana Ribas, Alfonso Ortega, Antonio Miguel, Eduardo Lleida · 2023

Speech Emotion Recognition (SER) plays a crucial role in applications involving human-machine interaction. However, the scarcity of suitable emotional speech datasets presents a major challenge for accurate SER systems. Deep Neural Network…

An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies Open

Eduardo Lleida, Luis Javier Rodríguez-Fuentes, Javier Tejedor, Alfonso Ortega, Antonio Miguel , et al. · 2023

Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organize…

Deep Learning for chaos detection Open

Roberto Barrio, Álvaro Lozano Rojo, Ana Mayora-Cebollero, Carmen Mayora-Cebollero, Antonio Miguel , et al. · 2023

In this article, we study how a chaos detection problem can be solved using Deep Learning techniques. We consider two classical test examples: the Logistic map as a discrete dynamical system and the Lorenz system as a continuous dynamical …

Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification Open

Iván López‐Espejo, Santi Prieto, Alfonso Ortega, Eduardo Lleida · 2023

Despite the maturity of modern speaker verification technology, its performance still significantly degrades when facing non-neutrally-phonated (e.g., shouted and whispered) speech. To address this issue, in this paper, we propose a new sp…

On the Problem of Data Availability in Automatic Voice Disorder Detection Open

Dayana Ribas, Antonio Miguel, Alfonso Ortega, Eduardo Lleida · 2023

Automatic Voice Disorder Detection Using Self-Supervised Representations Open

Dayana Ribas, Miguel A. Pastor, Antonio Miguel, David Martínez, Alfonso Ortega , et al. · 2023

Many speech features and models, including Deep Neural Networks (DNN), are used for classification tasks between healthy and pathological speech with the Saarbruecken Voice Database (SVD). However, accuracy values of 80.71% for phrases or …

Class token and knowledge distillation for multi-head self-attention speaker verification systems Open

Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida · 2022

This paper explores three novel approaches to improve the performance of speaker verification (SV) systems based on deep neural networks (DNN) using Multi-head Self-Attention (MSA) mechanisms and memory layers. Firstly, we propose the use …

A Study on the Use of wav2vec Representations for Multiclass Audio Segmentation Open

Pablo Gimeno, Alfonso Ortega, Antonio De Miguel, Eduardo Lleida · 2022

International audience

Cross-Corpus Speech Emotion Recognition with HuBERT Self-Supervised Representation Open

M. Pastor, Dayana Ribas, Alfonso Ortega, Antonio De Miguel, Eduardo Lleida · 2022

International audience

I4U System Description for NIST SRE'20 CTS Challenge Open

Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch , et al. · 2022

This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge. The I4U's submission was resulted from active collaboration among researchers across eig…

Wiener Filter and Deep Neural Networks: A Well-Balanced Pair for Speech Enhancement Open

Dayana Ribas, Antonio Miguel, Alfonso Ortega, Eduardo Lleida · 2022

This paper proposes a Deep Learning (DL) based Wiener filter estimator for speech enhancement in the framework of the classical spectral-domain speech estimator algorithm. According to the characteristics of the intermediate steps of the s…

Unsupervised Anomaly Detection Applied to Φ-OTDR Open

Antonio Almudévar, Pascual Sevillano, Luis Vicente, Javier Preciado-Garbayo, Alfonso Ortega · 2022

Distributed acoustic sensors (DASs) based on direct-detection Φ-OTDR use the light–matter interaction between light pulses and optical fiber to detect mechanical events in the fiber environment. The signals received in Φ-OTDR come from the…

Detección automática de emociones a partir de la voz combinando bases de datos para aumentar el entrenamiento Open

Miguel Ángel Pastor Yoldi, Dayana Ribas González, Alfonso Ortega · 2022

La voz es la vía de comunicación más natural para el ser humano, aportando tanto información lingüística, como del estado emocional del hablante. Con el objetivo de aumentar la precisión de los sistemas de reconocimiento de emociones en vo…

RODRÍGUEZ PINEAU, E. y TORRALBA MENDIOLA, E. (Dirs.), La protección de las transmisiones de datos transfronterizas, Thomson-Reuters-Aranzadi, Cizur Menor, 2022, 412 pp. Open

Alfonso Ortega · 2022

RecensiÃ³n de Alfonso Ortega GimÃ©nez. RODRÃGUEZ PINEAU, E. y TORRALBA MENDIOLA, E. (Dirs.), La protecciÃ³n de las transmisiones de datos transfronterizas, Thomson-Reuters-Aranzadi, Cizur Menor, 2022, 412 pp.

Alfonso Ortega YOU? Author Swipe