Explanipedia

Voice Impression Control in Zero-Shot TTS Open

Kensuke Fujita, Yusuke Ijima · 2025

Para-/non-linguistic information in speech is pivotal in shaping the listeners' impression. Although zero-shot text-to-speech (TTS) has achieved high speaker fidelity, modulating subtle para-/non-linguistic information to control perceived…

One's own recorded voice is more intelligible than the voices of others in the presence of competing speech Open

Hikaru Yanagida, Yusuke Ijima, Naohiro Tawara · 2025

Lightweight Zero-shot Text-to-Speech with Mixture of Adapters Open

Ken‐ichi Fujita, Takanori Ashihara, Marc Delcroix, Yusuke Ijima · 2024

The advancements in zero-shot text-to-speech (TTS) methods, based on large-scale models, have demonstrated high fidelity in reproducing speaker characteristics. However, these models are too large for practical daily use. We propose a ligh…

What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis Open

Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami , et al. · 2024

Self-supervised learning (SSL) has attracted increased attention for learning meaningful speech representations. Speech SSL models, such as WavLM, employ masked prediction training to encode general-purpose representations. In contrast, sp…

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters Open

Ken‐ichi Fujita, Hiroshi Satō, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix , et al. · 2024

The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from reference speech using self-supervised learning (SSL) speech representations, can reproduce speaker characteristics very accurately. However, this approa…

Effect of Personal Traits on Impressions of One's Own Recorded Voice Open

Hikaru Yanagida, Yusuke Ijima, Naohiro Tawara · 2024

StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models Open

Kazuki Yamauchi, Yusuke Ijima, Yuki Saito · 2023

We propose StyleCap, a method to generate natural language descriptions of speaking styles appearing in speech. Although most of conventional techniques for para-/non-linguistic information recognition focus on the category classification …

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? Open

Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima , et al. · 2023

Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition. More recently, speech SSL models have also been shown to be beneficial in advancing…

Expressive Text-to-Speech Synthesis using Text Chat Dataset with Speaking Style Information Open

Yukinori Homma, Hiroki Kanagawa, Nozomi Kobayashi, Yusuke Ijima, Kuniko Saito · 2023

This paper aims to generate expressive speech for integration with a robot and AI character dialogue systems. To generate expressive speech, some researchers have proposed using labels that express specific dialogue acts and emotions (i.e.…

Perceived emotional states mediate willingness to buy from advertising speech Open

Mizuki Nagano, Yusuke Ijima, Sadao Hiroya · 2023

Previous studies have shown that stimulus-organism-response (SOR) theory can well explain the willingness to buy from stores, products, and advertising-related stimuli. However, few studies have investigated advertising speech stimulus tha…

SIMD-size aware weight regularization for fast neural vocoding on CPU Open

Hiroki Kanagawa, Yusuke Ijima · 2022

This paper proposes weight regularization for a faster neural vocoder. Pruning time-consuming DNN modules is a promising way to realize a real-time vocoder on a CPU (e.g. WaveRNN, LPCNet). Regularization that encourages sparsity is also ef…

Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification Open

Yuki Saito, Taiki Nakamura, Yusuke Ijima, Kyosuke Nishida, Shinnosuke Takamichi · 2020

We propose non-parallel and many-to-many voice conversion (VC) using variational autoencoders (VAEs) that constructs VC models for converting arbitrary speakers' characteristics into those of other arbitrary speakers without parallel speec…

Saxe: Text-to-Speech Synthesis Engine Applicable to Diverse Use Cases Open

Yusuke Ijima, Nozomi Kobayashi, Hiroko Yabushita, Takashi Nakamura · 2020

from input text and a speech-synthesis section that generates synthesized speech from the

Model architectures to extrapolate emotional expressions in DNN-based text-to-speech Open

Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima · 2020

Estimating Sentence Final Tone Labels using Dialogue-Act Information for Text-to-Speech Synthesis within a Spoken Dialogue System Open

Nobukatsu Hojo, Yusuke Ijima, Hiroaki Sugiyama · 2020

This paper proposes a novel sentence final tone labels estimation method using dialogue-act (DA) informationfor text-to-speech synthesis within a spoken dialogue system. Estimating appropriate sentence final tone labels isconsidered essent…

DNN-based Speech Synthesis using Dialogue-Act Information and Its Evaluation with Respect to Illocutionary Act Naturalness Open

Nobukatsu Hojo, Yusuke Ijima, Hiroaki Sugiyama, Noboru Miyazaki, Takahito Kawanishi , et al. · 2020

This paper aims at improving naturalness of synthesized speech generated by a text-to-speech (TTS) systemwithin a spoken dialogue system with respect to “how natural the system’s intention is perceived via the synthesizedspeech”. We call t…

V2S attack: building DNN-based voice conversion from automatic speaker verification Open

Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi Saruwatari · 2019

This paper presents a new voice impersonation attack using voice conversion (VC). Enrolling personal voices for automatic speaker verification (ASV) offers natural and flexible biometric authentication systems. Basically, the ASV systems d…

DNN-Based Speech Synthesis Using Speaker Codes Open

Nobukatsu Hojo, Yusuke Ijima, Hideyuki Mizuno · 2018

Deep neural network (DNN)-based speech synthesis can produce more natural synthesized speech than the conventional HMM-based speech synthesis. However, it is not revealed whether the synthesized speech quality can be improved by utilizing …

Similar Speaker Selection Technique Based on Distance Metric Learning Using Highly Correlated Acoustic Features with Perceptual Voice Quality Similarity Open

Yusuke Ijima, Hideyuki Mizuno · 2015

This paper analyzes the correlation between various acoustic features and perceptual voice quality similarity, and proposes a perceptually similar speaker selection technique based on distance metric learning. To analyze the relationship b…

Yusuke Ijima YOU? Author Swipe