Keiichi Tokuda
YOU?
Author Swipe
View article: LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning
LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning Open
The Multi-speaker, Multi-lingual Indic Text to Speech (TTS) with voice cloning (LIMMITS'24) challenge is organized as part of the ICASSP 2024 signal processing grand challenge. LIMMITS'24 aims at the development of voice cloning for the mu…
View article: V2Coder: A Non-Autoregressive Vocoder Based on Hierarchical Variational Autoencoders
V2Coder: A Non-Autoregressive Vocoder Based on Hierarchical Variational Autoencoders Open
This paper introduces V2Coder, a non-autoregressive vocoder based on hierarchical variational autoencoders (VAEs). The hierarchical VAE with hierarchically extended prior and approximate posterior distributions is highly expressive for mod…
View article: PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model
PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model Open
This paper presents a neural vocoder based on a denoising diffusion probabilistic model (DDPM) incorporating explicit periodic signals as auxiliary conditioning signals. Recently, DDPM-based neural vocoders have gained prominence as non-au…
View article: Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech
Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech Open
The Lightweight, Multi-speaker, Multi-lingual Indic Text-to-Speech (LIMMITS'23) challenge is organized as part of the ICASSP 2023 Signal Processing Grand Challenge. LIMMITS'23 aims at the development of a lightweight, multi-speaker, multi-…
View article: Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation
Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation Open
This paper proposes singing voice synthesis (SVS) based on frame-level sequence-to-sequence models considering vocal timing deviation. In SVS, it is essential to synchronize the timing of singing with temporal structures represented by sco…
View article: Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism Open
This paper proposes a novel sequence-to-sequence (seq2seq) model with a musical note position-aware attention mechanism for singing voice synthesis (SVS). A seq2seq modeling approach that can simultaneously perform acoustic and temporal mo…
View article: Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System
Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System Open
This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in neural waveform …
View article: End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue Open
The recent text-to-speech (TTS) has achieved quality comparable to that of humans; however, its application in spoken dialogue has not been widely studied. This study aims to realize a TTS that closely resembles human dialogue. First, we r…
View article: Neural Sequence-to-Sequence Speech Synthesis Using a Hidden Semi-Markov Model Based Structured Attention Mechanism
Neural Sequence-to-Sequence Speech Synthesis Using a Hidden Semi-Markov Model Based Structured Attention Mechanism Open
This paper proposes a novel Sequence-to-Sequence (Seq2Seq) model integrating the structure of Hidden Semi-Markov Models (HSMMs) into its attention mechanism. In speech synthesis, it has been shown that methods based on Seq2Seq models using…
View article: PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components
PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components Open
We propose PeriodNet, a non-autoregressive (non-AR) waveform generation model with a new model structure for modeling periodic and aperiodic components in speech waveforms. The non-AR waveform generation models can generate speech waveform…
View article: PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components
PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components Open
This paper presents PeriodNet, a non-autoregressive (non-AR) waveform generative model with a new model structure for modeling periodic and aperiodic components in speech waveforms. Non-AR raw waveform generative models have enabled the fa…
View article: Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis
Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis Open
This paper proposes a hierarchical generative model with a multi-grained latent variable to synthesize expressive speech. In recent years, fine-grained latent variables are introduced into the text-to-speech synthesis that enable the fine …
View article: Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks
Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks Open
The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synth…
View article: Singing voice synthesis based on convolutional neural networks
Singing voice synthesis based on convolutional neural networks Open
The present paper describes a singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of syn…
View article: Issue Information ‐ TOC
Issue Information ‐ TOC Open
View article: Constructing text-to-speech systems for languages with unknown pronunciations
Constructing text-to-speech systems for languages with unknown pronunciations Open
This paper proposes a method for constructing text-to-speech (TTS) systems for languages with unknown pronunciations. One goal of speech synthesis research is to establish a framework that can be used to construct TTS systems for any writt…
View article: A Bayesian Approach to Image Recognition Based on Separable Lattice Hidden Markov Models
A Bayesian Approach to Image Recognition Based on Separable Lattice Hidden Markov Models Open
This paper proposes a Bayesian approach to image recognition based on separable lattice hidden Markov models (SL-HMMs). The geometric variations of the object to be recognized, e.g., size, location, and rotation, are an essential problem i…