Hakan Erdoğan
YOU?
Author Swipe
View article: Binaural Angular Separation Network
Binaural Angular Separation Network Open
We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones. The model is trained with simulated room impulse responses (RIRs) using omni-directional…
View article: TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition Open
We present TokenSplit, a speech separation model that acts on discrete token sequences. The model is trained on multiple tasks simultaneously: separate and transcribe each speech source, and generate speech from text. The model operates on…
View article: Guided Speech Enhancement Network
Guided Speech Enhancement Network Open
High quality speech capture has been widely studied for both voice communication and human computer interface reasons. To improve the capture performance, we can often find multi-microphone speech enhancement techniques deployed on various…
View article: CycleGAN-Based Unpaired Speech Dereverberation
CycleGAN-Based Unpaired Speech Dereverberation Open
Typically, neural network-based speech dereverberation models are trained on paired data, composed of a dry utterance and its corresponding reverberant utterance. The main limitation of this approach is that such models can only be trained…
View article: Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training
Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training Open
The recently-proposed mixture invariant training (MixIT) is an unsupervised method for training single-channel sound separation models in the sense that it does not require ground-truth isolated reference sources. In this paper, we investi…
View article: DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement Open
Single-channel speech enhancement (SE) is an important task in speech processing. A widely used framework combines an analysis/synthesis filterbank with a mask prediction network, such as the Conv-TasNet architecture. In such systems, the …
View article: Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation
Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation Open
Supervised neural network training has led to significant progress on single-channel sound separation. This approach relies on ground truth isolated sources, which precludes scaling to widely available mixture data and limits progress on o…
View article: Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes
Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes Open
We propose a benchmark of state-of-the-art sound event detection systems (SED). We designed synthetic evaluation sets to focus on specific sound event detection challenges. We analyze the performance of the submissions to DCASE 2021 task 4…
View article: End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings Open
We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discr…
View article: What’s all the Fuss about Free Universal Sound Separation Data?
What’s all the Fuss about Free Universal Sound Separation Data? Open
We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types. The dataset consists of 23 hours of single-source audio…
View article: Integration of speech separation, diarization, and recognition for multi-speaker meetings: Separated LibriCSS dataset
Integration of speech separation, diarization, and recognition for multi-speaker meetings: Separated LibriCSS dataset Open
Dataset This data repository contains separated audio streams for the LibriCSS dataset using the following window-based separation methods: 1. Mask-based MVDR: Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, and Fil Alleva, “Multi-microphone ne…
View article: Integration of speech separation, diarization, and recognition for multi-speaker meetings: Separated LibriCSS dataset
Integration of speech separation, diarization, and recognition for multi-speaker meetings: Separated LibriCSS dataset Open
Dataset This data repository contains separated audio streams for the LibriCSS dataset using the following window-based separation methods: 1. Mask-based MVDR: Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, and Fil Alleva, “Multi-microphone ne…
View article: Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording Open
Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years. Recent research includes extracting target speech by using the target speaker's voice snippet and jointly separati…
View article: Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis Open
Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation. With technical advances in systems dealing with speech separation, speaker diarization, an…
View article: Sound Event Detection and Separation: a Benchmark on Desed Synthetic\n Soundscapes
Sound Event Detection and Separation: a Benchmark on Desed Synthetic\n Soundscapes Open
We propose a benchmark of state-of-the-art sound event detection systems\n(SED). We designed synthetic evaluation sets to focus on specific sound event\ndetection challenges. We analyze the performance of the submissions to DCASE\n2021 tas…
View article: Improving Sound Event Detection In Domestic Environments Using Sound Separation
Improving Sound Event Detection In Domestic Environments Using Sound Separation Open
Performing sound event detection on real-world recordings often implies dealing with overlapping target sound events and non-target sounds, also referred to as interference or noise. Until now these problems were mainly tackled at the clas…
View article: Unsupervised Sound Separation Using Mixture Invariant Training
Unsupervised Sound Separation Using Mixture Invariant Training Open
In recent years, rapid progress has been made on the problem of single-channel sound separation using supervised training of deep neural networks. In such supervised approaches, a model is trained to predict the component sources from synt…
View article: Evaluation set DCASE 2020 task 4 (for submissions)
Evaluation set DCASE 2020 task 4 (for submissions) Open
This repo contains the dataset to download to submit results and be evaluated in task 4 of DCASE 2020. Please, check the submission package in order to follow the instruction to have a submission. *Note: some files are 5 mins long, so if y…
View article: Evaluation set DCASE 2020 task 4 (for submissions)
Evaluation set DCASE 2020 task 4 (for submissions) Open
This repo contains the dataset to download to submit results and be evaluated in task 4 of DCASE 2020. Please, check the submission package in order to follow the instruction to have a submission. *Note: some files are 5 mins long, so if y…
View article: Evaluation set DCASE 2020 task 4 (for submissions)
Evaluation set DCASE 2020 task 4 (for submissions) Open
This repo contains the dataset to download to submit results and be evaluated in task 4 of DCASE 2020. Please, check the submission package in order to follow the instruction to have a submission. *Note: some files are 5 mins long, so if y…
View article: Free Universal Sound Separation Dataset
Free Universal Sound Separation Dataset Open
The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation. This is the official sound separation data for the DCASE2020 Ch…
View article: Free Universal Sound Separation Dataset
Free Universal Sound Separation Dataset Open
The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation. This is the official sound separation data for the DCASE2020 Ch…
View article: Free Universal Sound Separation Dataset
Free Universal Sound Separation Dataset Open
The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation. This is the official sound separation data for the DCASE2020 Ch…
View article: Free Universal Sound Separation Dataset
Free Universal Sound Separation Dataset Open
The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation. This is the official sound separation data for the DCASE2020 Ch…
View article: Free Universal Sound Separation Dataset
Free Universal Sound Separation Dataset Open
The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation. This is the official sound separation data for the DCASE2020 Ch…
View article: Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement Open
This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation. Our neural networks for separation use an advanced convolutional architecture t…
View article: Universal Sound Separation
Universal Sound Separation Open
Recent deep learning approaches have achieved impressive performance on speech enhancement and separation tasks. However, these approaches have not been investigated for separating mixtures of arbitrary sounds of different types, a task we…
View article: SDR – Half-baked or Well Done?
SDR – Half-baked or Well Done? Open
In speech enhancement and source separation, signal-to-noise ratio is a ubiquitous objective measure of denoising/separation quality. A decade ago, the BSS_eval toolkit was developed to give researchers worldwide a way to evaluate the qual…
View article: Low-Latency Speaker-Independent Continuous Speech Separation
Low-Latency Speaker-Independent Continuous Speech Separation Open
Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no over…
View article: SDR - half-baked or well done?
SDR - half-baked or well done? Open
In speech enhancement and source separation, signal-to-noise ratio is a ubiquitous objective measure of denoising/separation quality. A decade ago, the BSS_eval toolkit was developed to give researchers worldwide a way to evaluate the qual…