Jacob Donley
YOU?
Author Swipe
View article: Design and analysis of binaural signal matching with arbitrary microphone arrays and listener head rotations
Design and analysis of binaural signal matching with arbitrary microphone arrays and listener head rotations Open
Binaural reproduction is rapidly becoming a topic of great interest in the research community, especially with the surge of new and popular devices, such as virtual reality headsets, smart glasses, and head-tracked headphones. In order to …
View article: Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment
Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment Open
Building reliable speech systems often requires combining multiple modalities, like audio and visual cues. While such multimodal solutions frequently lead to improvements in performance and may even be critical in certain cases, they come …
View article: Performance and Robustness of Signal-Dependent vs. Signal-Independent Binaural Signal Matching with Wearable Microphone Arrays
Performance and Robustness of Signal-Dependent vs. Signal-Independent Binaural Signal Matching with Wearable Microphone Arrays Open
The increasing popularity of spatial audio in applications such as teleconferencing, entertainment, and virtual reality has led to the recent developments of binaural reproduction methods. However, only a few of these methods are well-suit…
View article: M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses Open
The growing popularity of multi-channel wearable devices, such as smart glasses, has led to a surge of applications such as targeted speech recognition and enhanced hearing. However, current approaches to solve these tasks use independentl…
View article: Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos Open
Egocentric videos provide comprehensive contexts for user and scene understanding, spanning multisensory perception to behavioral interaction. We propose Spherical World-Locking (SWL) as a general framework for egocentric scene representat…
View article: Design and Analysis of Binaural Signal Matching with Arbitrary Microphone Arrays and Listener Head Rotations
Design and Analysis of Binaural Signal Matching with Arbitrary Microphone Arrays and Listener Head Rotations Open
Binaural reproduction is rapidly becoming a topic of great interest in the research community, especially with the surge of new and popular devices, such as virtual reality headsets, smart glasses, and head-tracked headphones. In order to …
View article: Ambisonics Encoding For Arbitrary Microphone Arrays Incorporating Residual Channels For Binaural Reproduction
Ambisonics Encoding For Arbitrary Microphone Arrays Incorporating Residual Channels For Binaural Reproduction Open
In the rapidly evolving fields of virtual and augmented reality, accurate spatial audio capture and reproduction are essential. For these applications, Ambisonics has emerged as a standard format. However, existing methods for encoding Amb…
View article: On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement
On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement Open
We introduce a time-domain framework for efficient multichannel speech enhancement, emphasizing low latency and computational efficiency. This framework incorporates two compact deep neural networks (DNNs) surrounding a multichannel neural…
View article: Group Conversations in Noisy Environments (GiN) – Multimedia Recordings for Location-Aware Speech Enhancement
Group Conversations in Noisy Environments (GiN) – Multimedia Recordings for Location-Aware Speech Enhancement Open
Recent years have seen a growing interest in the use of smart glasses mounted with microphones to solve the cocktail party problem using beamforming techniques or machine learning. Many such approaches could bring substantial advances in h…
View article: Subspace Hybrid MVDR Beamforming for Augmented Hearing
Subspace Hybrid MVDR Beamforming for Augmented Hearing Open
Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoustic scenario - be it real-world or simulated - is straightforward in terms of the number of sound sources, the ambient sound field and their dy…
View article: Performance Analysis Of Binaural Signal Matching (BSM) in the Time-Frequency Domain
Performance Analysis Of Binaural Signal Matching (BSM) in the Time-Frequency Domain Open
The capture and reproduction of spatial audio is becoming increasingly popular, with the mushrooming of applications in teleconferencing, entertainment and virtual reality. Many binaural reproduction methods have been developed and studied…
View article: Subspace Hybrid Beamforming for Head-worn Microphone Arrays
Subspace Hybrid Beamforming for Head-worn Microphone Arrays Open
A two-stage multi-channel speech enhancement method is proposed which consists of a novel adaptive beamformer, Hybrid Minimum Variance Distortionless Response (MVDR), Isotropic-MVDR (Iso), and a novel multi-channel spectral Principal Compo…
View article: ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement Open
Prior works on improving speech quality with visual input typically study each type of auditory distortion separately (e.g., separation, inpainting, video-to-speech) and present tailored algorithms. This paper proposes to unify these subje…
View article: LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders Open
Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements. This approach has been shown to yield improvements over audio-only s…
View article: The Impact of Removing Head Movements on Audio-Visual Speech Enhancement
The Impact of Removing Head Movements on Audio-Visual Speech Enhancement Open
This paper investigates the impact of head movements on audio-visual speech\nenhancement (AVSE). Although being a common conversational feature, head\nmovements have been ignored by past and recent studies: they challenge today's\nlearning…
View article: The impact of removing head movements on audio-visual speech enhancement
The impact of removing head movements on audio-visual speech enhancement Open
This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE). Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today's learning-ba…
View article: The Spear Challenge - Review of Results
The Spear Challenge - Review of Results Open
Verbal communication can be challenging in the presence of acoustic noise.To tackle this problem, microphone arrays coupled with numerous processing methods have been studied in the past few decades.Recent interest in Augmented Reality (AR…
View article: NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers
NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers Open
Estimating a time-varying spatial covariance matrix for a beamforming algorithm is a challenging task, especially for wearable devices, as the algorithm must compensate for time-varying signal statistics due to rapid pose-changes. In this …
View article: Multichannel Speech Enhancement without Beamforming
Multichannel Speech Enhancement without Beamforming Open
Deep neural networks are often coupled with traditional spatial filters, such as MVDR beamformers for effectively exploiting spatial information. Even though single-stage end-to-end supervised models can obtain impressive enhancement, comb…
View article: TADRN: Triple-Attentive Dual-Recurrent Network for Ad-hoc Array Multichannel Speech Enhancement.
TADRN: Triple-Attentive Dual-Recurrent Network for Ad-hoc Array Multichannel Speech Enhancement. Open
Deep neural networks (DNNs) have been successfully used for multichannel speech enhancement in fixed array geometries. However, challenges remain for ad-hoc arrays with unknown microphone placements. We propose a deep neural network based …
View article: Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network Open
Deep neural networks (DNNs) are very effective for multichannel speech enhancement with fixed array geometries. However, it is not trivial to use DNNs for ad-hoc arrays with unknown order and placement of microphones. We propose a novel tr…
View article: TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement
TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement Open
In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a thir…
View article: EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments
EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments Open
Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Trainin…
View article: Multi-Channel Speech Enhancement Using Graph Neural Networks
Multi-Channel Speech Enhancement Using Graph Neural Networks Open
Multi-channel speech enhancement aims to extract clean speech from a noisy mixture using signals captured from multiple microphones. Recently proposed methods tackle this problem by incorporating deep neural network models with spatial fil…
View article: Reproduction of Personal Sound in Shared Environments
Reproduction of Personal Sound in Shared Environments Open
The experience and utility of personal sound is a highly sought after characteristic of shared spaces. Personal sound allows individuals, or small groups of individuals, to listen to separate streams of audio content without external inter…
View article: Coherence Based Source Counter: v1.0.1
Coherence Based Source Counter: v1.0.1 Open
Included verbose option to suppress rir_generator output to command window and included missing tightPlots package.
View article: Coherence Based Source Counter: v1.0.1
Coherence Based Source Counter: v1.0.1 Open
Included verbose option to suppress rir_generator output to command window and included missing tightPlots package.
View article: Multizone Reproduction of Speech Soundfields: A Perceptually Weighted Approach
Multizone Reproduction of Speech Soundfields: A Perceptually Weighted Approach Open
In this paper a method for the reproduction of multizone speech soundfields using perceptual weighting criteria is proposed. Psychoacoustic models are used to derive a space-time-frequency weighting function to control leakage of perceptua…