Patrick A. Naylor
YOU?
Author Swipe
View article: Binaural Localization Model for Speech in Noise
Binaural Localization Model for Speech in Noise Open
Binaural acoustic source localization is important to human listeners for spatial awareness, communication and safety. In this paper, an end-to-end binaural localization model for speech in noise is presented. A lightweight convolutional r…
View article: SONIVA: Speech recOgNItion Validation in Aphasia
SONIVA: Speech recOgNItion Validation in Aphasia Open
Post-stroke aphasia is a major contributor to language impairment and neuro-disability worldwide, making automated assessment a critical research priority. However, the development of clinically validated automatic speech recognition (ASR)…
View article: Observations and Biogeochemical Modeling Reveal Chlorophyll Diel Cycle With Near‐Sunset Maxima in the Red Sea
Observations and Biogeochemical Modeling Reveal Chlorophyll Diel Cycle With Near‐Sunset Maxima in the Red Sea Open
The Red Sea is an extremely warm tropical sea hosting diverse ecosystems, with marine organisms operating at the high end of their thermal tolerance. Therefore, in the context of global warming, it is increasingly important to understand t…
View article: Binary Estimator Selection Methods for Hearing Aids With a Remote Microphone
Binary Estimator Selection Methods for Hearing Aids With a Remote Microphone Open
Using a high signal-to-noise ratio remote microphone (RM) with hearing aids (HAs) is advantageous for HA users. However, the benefit depends significantly on the properties of the wireless channel. While existing literature often assumes a…
View article: Steered Response Power for Sound Source Localization: a tutorial review
Steered Response Power for Sound Source Localization: a tutorial review Open
In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many w…
View article: Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora Open
In recent years, automatic speech recognition (ASR) models greatly improved transcription performance both in clean, low noise, acoustic conditions and in reverberant environments. However, all these systems rely on the availability of hun…
View article: XANE Background Acoustic Embeddings: Ablation and Clustering Analysis
XANE Background Acoustic Embeddings: Ablation and Clustering Analysis Open
We explore the recently proposed explainable acoustic neural embedding~(XANE) system that models the background acoustics of a speech signal in a non-intrusive manner. The XANE embeddings are used to estimate specific parameters related to…
View article: XANE: eXplainable Acoustic Neural Embeddings
XANE: eXplainable Acoustic Neural Embeddings Open
We present a novel method for extracting neural embeddings that model the background acoustics of a speech signal. The extracted embeddings are used to estimate specific parameters related to the background acoustic properties of the signa…
View article: Steered Response Power for Sound Source Localization: A Tutorial Review
Steered Response Power for Sound Source Localization: A Tutorial Review Open
In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many w…
View article: The Neural-SRP method for positional sound source localization
The Neural-SRP method for positional sound source localization Open
Steered Response Power (SRP) is a widely used method for the task of sound source localization using microphone arrays, showing satisfactory localization performance on many practical scenarios. However, its performance is diminished under…
View article: Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks
Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks Open
Studies have shown that in noisy acoustic environments, providing binaural signals to the user of an assistive listening device may improve speech intelligibility and spatial awareness. This paper presents a binaural speech enhancement met…
View article: Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification
Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification Open
This paper studies modulation spectrum features ($Φ$) and mel-frequency cepstral coefficients ($Ψ$) in joint speaker diarization and identification (JSID). JSID is important as speaker diarization on its own to distinguish speakers is insu…
View article: Group Conversations in Noisy Environments (GiN) – Multimedia Recordings for Location-Aware Speech Enhancement
Group Conversations in Noisy Environments (GiN) – Multimedia Recordings for Location-Aware Speech Enhancement Open
Recent years have seen a growing interest in the use of smart glasses mounted with microphones to solve the cocktail party problem using beamforming techniques or machine learning. Many such approaches could bring substantial advances in h…
View article: Practical utility of a head-mounted gaze-directed beamforming system
Practical utility of a head-mounted gaze-directed beamforming system Open
Assistive auditory devices that enhance signal-to-noise ratio must follow the user's changing attention; errors could lead to the desired source being suppressed as noise. A method for measuring the practical benefit of attention-following…
View article: Subspace Hybrid MVDR Beamforming for Augmented Hearing
Subspace Hybrid MVDR Beamforming for Augmented Hearing Open
Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoustic scenario - be it real-world or simulated - is straightforward in terms of the number of sound sources, the ambient sound field and their dy…
View article: Polynomial Eigenvalue Decomposition for Multichannel Broadband Signal Processing: A mathematical technique offering new insights and solutions
Polynomial Eigenvalue Decomposition for Multichannel Broadband Signal Processing: A mathematical technique offering new insights and solutions Open
This article is devoted to the polynomial eigenvalue decomposition (PEVD) and its applications in broadband multichannel signal processing, motivated by the optimum solutions provided by the EVD for the narrowband case [1] , [2] . In gener…
View article: Binaural Speech Enhancement Using Complex Convolutional Recurrent Networks
Binaural Speech Enhancement Using Complex Convolutional Recurrent Networks Open
From hearing aids to augmented and virtual reality devices, binaural speech enhancement algorithms have been established as state-of-the-art techniques to improve speech intelligibility and listening comfort. In this paper, we present an e…
View article: Dual input neural networks for positional sound source localization
Dual input neural networks for positional sound source localization Open
In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a …
View article: Dual input neural networks for positional sound source localization
Dual input neural networks for positional sound source localization Open
In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a …
View article: Observations and biogeochemical modeling reveal chlorophyll diel cycle with near-sunset maxima in the Red Sea
Observations and biogeochemical modeling reveal chlorophyll diel cycle with near-sunset maxima in the Red Sea Open
The Red Sea is an extremely warm tropical sea that hosts diverse ecosystems; thus, it is important to understand its ecology in the context of global warming. Using a coupled physical–biogeochemical model validated against in situ data, we…
View article: Audio Signal Processing in the 21st Century: The important outcomes of the past 25 years
Audio Signal Processing in the 21st Century: The important outcomes of the past 25 years Open
International audience
View article: Long-term Conversation Analysis: Exploring Utility and Privacy
Long-term Conversation Analysis: Exploring Utility and Privacy Open
The analysis of conversations recorded in everyday life requires privacy protection. In this contribution, we explore a privacy-preserving feature extraction method based on input feature dimension reduction, spectral smoothing and the low…
View article: Two-Stage Voice Anonymization for Enhanced Privacy
Two-Stage Voice Anonymization for Enhanced Privacy Open
In recent years, the need for privacy preservation when manipulating or storing personal data, including speech , has become a major issue. In this paper, we present a system addressing the speaker-level anonymization problem. We propose a…
View article: Graph neural networks for sound source localization on distributed microphone networks
Graph neural networks for sound source localization on distributed microphone networks Open
Distributed Microphone Arrays (DMAs) present many challenges with respect to centralized microphone arrays. An important requirement of applications on these arrays is handling a variable number of input channels. We consider the use of Gr…
View article: Subspace Hybrid Beamforming for Head-worn Microphone Arrays
Subspace Hybrid Beamforming for Head-worn Microphone Arrays Open
A two-stage multi-channel speech enhancement method is proposed which consists of a novel adaptive beamformer, Hybrid Minimum Variance Distortionless Response (MVDR), Isotropic-MVDR (Iso), and a novel multi-channel spectral Principal Compo…
View article: Using a single-channel reference with the MBSTOI binaural intelligibility metric
Using a single-channel reference with the MBSTOI binaural intelligibility metric Open
In order to assess the intelligibility of a target signal in a noisy environment, intrusive speech intelligibility metrics are typically used. They require a clean reference signal to be available which can be difficult to obtain especiall…
View article: Signal Compaction Using Polynomial EVD for Spherical Array Processing With Applications
Signal Compaction Using Polynomial EVD for Spherical Array Processing With Applications Open
Multi-channel signals captured by spatially separated sensors often contain a high level of data redundancy. A compact signal representation enables more efficient storage and processing, which has been exploited for data compression, nois…
View article: Uncovering the Potential for a Weakly Supervised End-to-End Model in Recognising Speech from Patient with Post-Stroke Aphasia
Uncovering the Potential for a Weakly Supervised End-to-End Model in Recognising Speech from Patient with Post-Stroke Aphasia Open
Post-stroke speech and language deficits (aphasia) significantly impact patients' quality of life. Many with mild symptoms remain undiagnosed, and the majority do not receive the intensive doses of therapy recommended, due to healthcare co…
View article: The Neural-SRP Method for Universal Robust Multi-source Tracking
The Neural-SRP Method for Universal Robust Multi-source Tracking Open
Neural networks have achieved state-of-the-art performance on the task of acoustic Direction-of-Arrival (DOA) estimation using microphone arrays. Neural models can be classified as end-to-end or hybrid, each class showing advantages and di…