Explanipedia

LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning Open

Sathvik Udupa, Jesuraja Bandekar, Abhayjeet Singh, G Deekshitha, Saurabh Kumar , et al. · 2025

The Multi-speaker, Multi-lingual Indic Text to Speech (TTS) with voice cloning (LIMMITS'24) challenge is organized as part of the ICASSP 2024 signal processing grand challenge. LIMMITS'24 aims at the development of voice cloning for the mu…

V2Coder: A Non-Autoregressive Vocoder Based on Hierarchical Variational Autoencoders Open

Takato Fujimoto, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda · 2025

This paper introduces V2Coder, a non-autoregressive vocoder based on hierarchical variational autoencoders (VAEs). The hierarchical VAE with hierarchically extended prior and approximate posterior distributions is highly expressive for mod…

PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model Open

Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda · 2024

This paper presents a neural vocoder based on a denoising diffusion probabilistic model (DDPM) incorporating explicit periodic signals as auxiliary conditioning signals. Recently, DDPM-based neural vocoders have gained prominence as non-au…

Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech Open

Abhayjeet Singh, Amala Nagireddi, Anjali Jayakumar, G Deekshitha, Jesuraja Bandekar , et al. · 2024

The Lightweight, Multi-speaker, Multi-lingual Indic Text-to-Speech (LIMMITS'23) challenge is organized as part of the ICASSP 2023 Signal Processing Grand Challenge. LIMMITS'23 aims at the development of a lightweight, multi-speaker, multi-…

Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation Open

Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda · 2023

This paper proposes singing voice synthesis (SVS) based on frame-level sequence-to-sequence models considering vocal timing deviation. In SVS, it is essential to synchronize the timing of singing with temporal structures represented by sco…

Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism Open

Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda · 2022

This paper proposes a novel sequence-to-sequence (seq2seq) model with a musical note position-aware attention mechanism for singing voice synthesis (SVS). A seq2seq modeling approach that can simultaneously perform acoustic and temporal mo…

Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System Open

Takenori Yoshimura, Shinji Takaki, Kazuhiro Nakamura, Keiichiro Oura, Yukiya Hono , et al. · 2022

This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in neural waveform …

End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue Open

Kentaro Mitsui, Tianyu Zhao, Kei Sawada, Yukiya Hono, Yoshihiko Nankaku , et al. · 2022

The recent text-to-speech (TTS) has achieved quality comparable to that of humans; however, its application in spoken dialogue has not been widely studied. This study aims to realize a TTS that closely resembles human dialogue. First, we r…

Neural Sequence-to-Sequence Speech Synthesis Using a Hidden Semi-Markov Model Based Structured Attention Mechanism Open

Yoshihiko Nankaku, Kenta Sumiya, Takenori Yoshimura, Shinji Takaki, Kei Hashimoto , et al. · 2021

This paper proposes a novel Sequence-to-Sequence (Seq2Seq) model integrating the structure of Hidden Semi-Markov Models (HSMMs) into its attention mechanism. In speech synthesis, it has been shown that methods based on Seq2Seq models using…

PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components Open

Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku , et al. · 2021

We propose PeriodNet, a non-autoregressive (non-AR) waveform generation model with a new model structure for modeling periodic and aperiodic components in speech waveforms. The non-AR waveform generation models can generate speech waveform…

PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components Open

Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku , et al. · 2021

This paper presents PeriodNet, a non-autoregressive (non-AR) waveform generative model with a new model structure for modeling periodic and aperiodic components in speech waveforms. Non-AR raw waveform generative models have enabled the fa…

Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis Open

Yukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura , et al. · 2020

This paper proposes a hierarchical generative model with a multi-grained latent variable to synthesize expressive speech. In recent years, fine-grained latent variables are introduced into the text-to-speech synthesis that enable the fine …

Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks Open

Kazuhiro Nakamura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku , et al. · 2019

The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synth…

Singing voice synthesis based on convolutional neural networks Open

Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda · 2019

The present paper describes a singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of syn…

Issue Information ‐ TOC Open

Kazuaki Sawada, Kouichi Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda · 2018

Constructing text-to-speech systems for languages with unknown pronunciations Open

Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda · 2018

This paper proposes a method for constructing text-to-speech (TTS) systems for languages with unknown pronunciations. One goal of speech synthesis research is to establish a framework that can be used to construct TTS systems for any writt…

A Bayesian Approach to Image Recognition Based on Separable Lattice Hidden Markov Models Open

Kei Sawada, Akira Tamamori, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda · 2016

This paper proposes a Bayesian approach to image recognition based on separable lattice hidden Markov models (SL-HMMs). The geometric variations of the object to be recognized, e.g., size, location, and rotation, are an essential problem i…

Keiichi Tokuda YOU? Author Swipe