Explanipedia

Design and analysis of binaural signal matching with arbitrary microphone arrays and listener head rotations Open

Lior Madmoni, Zamir Ben-Hur, Jacob Donley, Vladimir Tourbabin, Boaz Rafaely · 2025

Binaural reproduction is rapidly becoming a topic of great interest in the research community, especially with the surge of new and popular devices, such as virtual reality headsets, smart glasses, and head-tracked headphones. In order to …

Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment Open

Joanna Hong, Sanjeel Parekh, Honglie Chen, Jacob Donley, Ke Tan , et al. · 2025

Computer science Geography

Building reliable speech systems often requires combining multiple modalities, like audio and visual cues. While such multimodal solutions frequently lead to improvements in performance and may even be critical in certain cases, they come …

Performance and Robustness of Signal-Dependent vs. Signal-Independent Binaural Signal Matching with Wearable Microphone Arrays Open

Ami Berger, Vladimir Tourbabin, Jacob Donley, Zamir Ben-Hur, Boaz Rafaely · 2024

Computer science Physics Mathematics

The increasing popularity of spatial audio in applications such as teleconferencing, entertainment, and virtual reality has led to the recent developments of binaural reproduction methods. However, only a few of these methods are well-suit…

M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses Open

Yufeng Yang, Desh Raj, Ju Lin, Niko Moritz, Junteng Jia , et al. · 2024

Computer science History

The growing popularity of multi-channel wearable devices, such as smart glasses, has led to a surge of applications such as targeted speech recognition and enhanced hearing. However, current approaches to solve these tasks use independentl…

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos Open

Heeseung Yun, Ruohan Gao, Ishwarya Ananthabhotla, Anurag Kumar, Jacob Donley , et al. · 2024

Computer science Psychology

Egocentric videos provide comprehensive contexts for user and scene understanding, spanning multisensory perception to behavioral interaction. We propose Spherical World-Locking (SWL) as a general framework for egocentric scene representat…

Design and Analysis of Binaural Signal Matching with Arbitrary Microphone Arrays and Listener Head Rotations Open

Lior Madmoni, Zamir Ben-Hur, Jacob Donley, Vladimir Tourbabin, Boaz Rafaely · 2024

Computer science Mathematics Physics

Binaural reproduction is rapidly becoming a topic of great interest in the research community, especially with the surge of new and popular devices, such as virtual reality headsets, smart glasses, and head-tracked headphones. In order to …

Ambisonics Encoding For Arbitrary Microphone Arrays Incorporating Residual Channels For Binaural Reproduction Open

Yhonatan Gayer, Vladimir Tourbabin, Zamir Ben-Hur, Jacob Donley, Boaz Rafaely · 2024

Computer science Physics Biology

In the rapidly evolving fields of virtual and augmented reality, accurate spatial audio capture and reproduction are essential. For these applications, Ambisonics has emerged as a standard format. However, existing methods for encoding Amb…

On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement Open

Tsun-An Hsieh, Jacob Donley, Daniel Wong, Buye Xu, Ashutosh Pandey · 2024

Computer science

We introduce a time-domain framework for efficient multichannel speech enhancement, emphasizing low latency and computational efficiency. This framework incorporates two compact deep neural networks (DNNs) surrounding a multichannel neural…

Group Conversations in Noisy Environments (GiN) – Multimedia Recordings for Location-Aware Speech Enhancement Open

Emilie d'Olne, Alastair H. Moore, Patrick A. Naylor, Jacob Donley, Vladimir Tourbabin , et al. · 2023

Computer science Chemistry

Recent years have seen a growing interest in the use of smart glasses mounted with microphones to solve the cocktail party problem using beamforming techniques or machine learning. Many such approaches could bring substantial advances in h…

Subspace Hybrid MVDR Beamforming for Augmented Hearing Open

Sina Hafezi, Alastair H. Moore, Pierre Guiraud, Patrick A. Naylor, Jacob Donley , et al. · 2023

Computer science Philosophy Chemistry

Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoustic scenario - be it real-world or simulated - is straightforward in terms of the number of sound sources, the ambient sound field and their dy…

Performance Analysis Of Binaural Signal Matching (BSM) in the Time-Frequency Domain Open

Ami Berger, Vladimir Tourbabin, Jacob Donley, Zamir Ben-Hur, Boaz Rafaely · 2023

Computer science Physics Mathematics

The capture and reproduction of spatial audio is becoming increasingly popular, with the mushrooming of applications in teleconferencing, entertainment and virtual reality. Many binaural reproduction methods have been developed and studied…

Subspace Hybrid Beamforming for Head-worn Microphone Arrays Open

Sina Hafezi, Alastair H. Moore, Pierre Guiraud, Patrick A. Naylor, Jacob Donley , et al. · 2023

Computer science Mathematics Philosophy

A two-stage multi-channel speech enhancement method is proposed which consists of a novel adaptive beamformer, Hybrid Minimum Variance Distortionless Response (MVDR), Isotropic-MVDR (Iso), and a novel multi-channel spectral Principal Compo…

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement Open

Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi · 2022

Computer science Philosophy Physics

Prior works on improving speech quality with visual input typically study each type of auditory distortion separately (e.g., separation, inpainting, video-to-speech) and present tailored algorithms. This paper proposes to unify these subje…

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders Open

Rodrigo Mira, Buye Xu, Jacob Donley, Anurag Kumar, Stavros Petridis , et al. · 2022

Computer science Art

Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements. This approach has been shown to yield improvements over audio-only s…

The Impact of Removing Head Movements on Audio-Visual Speech Enhancement Open

Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley , et al. · 2022

Computer science Geology Sociology

This paper investigates the impact of head movements on audio-visual speech\nenhancement (AVSE). Although being a common conversational feature, head\nmovements have been ignored by past and recent studies: they challenge today's\nlearning…

The impact of removing head movements on audio-visual speech enhancement Open

Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley , et al. · 2022

Computer science Geology Philosophy

This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE). Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today's learning-ba…

The Spear Challenge - Review of Results Open

Vladimir Tourbabin, Pierre Guiraud, Sina Hafezi, Patrick A. Naylor, Alastair H. Moore , et al. · 2022

Computer science History

Verbal communication can be challenging in the presence of acoustic noise.To tackle this problem, microphone arrays coupled with numerous processing methods have been studied in the past few decades.Recent interest in Augmented Reality (AR…

NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers Open

Jonah Casebeer, Jacob Donley, Daniel Wong, Buye Xu, Anurag Kumar · 2021

Computer science Mathematics

Estimating a time-varying spatial covariance matrix for a beamforming algorithm is a challenging task, especially for wearable devices, as the algorithm must compensate for time-varying signal statistics due to rapid pose-changes. In this …

Multichannel Speech Enhancement without Beamforming Open

Asutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia , et al. · 2021

Computer science Philosophy

Deep neural networks are often coupled with traditional spatial filters, such as MVDR beamformers for effectively exploiting spatial information. Even though single-stage end-to-end supervised models can obtain impressive enhancement, comb…

TADRN: Triple-Attentive Dual-Recurrent Network for Ad-hoc Array Multichannel Speech Enhancement. Open

Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia , et al. · 2021

Computer science Art

Deep neural networks (DNNs) have been successfully used for multichannel speech enhancement in fixed array geometries. However, challenges remain for ad-hoc arrays with unknown microphone placements. We propose a deep neural network based …

Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network Open

Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia , et al. · 2021

Computer science

Deep neural networks (DNNs) are very effective for multichannel speech enhancement with fixed array geometries. However, it is not trivial to use DNNs for ad-hoc arrays with unknown order and placement of microphones. We propose a novel tr…

TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement Open

Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia , et al. · 2021

Computer science Mathematics Biology

In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a thir…

EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments Open

Jacob Donley, Vladimir Tourbabin, Jung‐Suk Lee, Mark Broyles, Hao Jiang , et al. · 2021

Computer science

Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Trainin…

Multi-Channel Speech Enhancement Using Graph Neural Networks Open

Panagiotis Tzirakis, Anurag Kumar, Jacob Donley · 2021

Computer science Mathematics Physics

Multi-channel speech enhancement aims to extract clean speech from a noisy mixture using signals captured from multiple microphones. Recently proposed methods tackle this problem by incorporating deep neural network models with spatial fil…

Reproduction of Personal Sound in Shared Environments Open

Jacob Donley · 2018

Computer science Biology Psychology

The experience and utility of personal sound is a highly sought after characteristic of shared spaces. Personal sound allows individuals, or small groups of individuals, to listen to separate streams of audio content without external inter…

Coherence Based Source Counter: v1.0.1 Open

Jacob Donley, Shahab Pasha, Christian Ritz · 2017

Computer science Mathematics

Included verbose option to suppress rir_generator output to command window and included missing tightPlots package.

Coherence Based Source Counter: v1.0.1 Open

Shahab Pasha, Jacob Donley, Christian Ritz · 2017

Computer science Mathematics

Included verbose option to suppress rir_generator output to command window and included missing tightPlots package.

Multizone Reproduction of Speech Soundfields: A Perceptually Weighted Approach Open

Jacob Donley, Christian Ritz · 2015

Computer science Biology Physics

In this paper a method for the reproduction of multizone speech soundfields using perceptual weighting criteria is proposed. Psychoacoustic models are used to derive a space-time-frequency weighting function to control leakage of perceptua…

Jacob Donley YOU? Author Swipe