Explanipedia

TRACE: A Time-Relational Approximate Cubing Engine for Fast Data Insights Open

Suharsh Sivakumar, Jonathan Shen, Rajat Monga · 2024

A large class of data questions can be modeled as identifying important slices of data driven by user defined metrics. This paper presents TRACE, a Time-Relational Approximate Cubing Engine that enables interactive analysis on such slices …

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks Open

Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Jia Ye , et al. · 2022

Transfer tasks in text-to-speech (TTS) synthesis - where one or more aspects of the speech of one set of speakers is transferred to another set of speakers that do not feature these aspects originally - remains a challenging task. One of t…

Examining Scaling and Transfer of Language Model Architectures for Machine Translation Open

Biao Zhang, Behrooz Ghorbani, Ankur Bapna, Yong Cheng, Xavier García , et al. · 2022

Natural language understanding and generation models follow one of the two dominant architectural paradigms: language models (LMs) that process concatenated sequences in a single stack of layers, and encoder-decoder models (EncDec) that ut…

Sports at Play in American Politics Open

Jonathan Shen, Courtenay Shrimpton · 2021

Sports have been a vital element to American entertainment for decades, which are only gaining popularity. Various sport events allow Americans to temporarily escape the stress associated with their social lives and the divisiveness of par…

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS Open

Jia Ye, Heiga Zen, Jonathan Shen, Zhang Yu, Yonghui Wu · 2021

This paper introduces PnG BERT, a new encoder model for neural TTS. This model is augmented from the original BERT model, by taking both phoneme and grapheme representations of text as input, as well as the word-level alignment between the…

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling Open

Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Jia Ye , et al. · 2021

This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mec…

Parallel Tacotron: Non-Autoregressive and Controllable TTS Open

Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Jia Ye , et al. · 2020

Although neural end-to-end text-to-speech models can synthesize highly natural speech, there is still room for improvements to its efficiency and naturalness. This paper proposes a non-autoregressive neural text-to-speech model augmented w…

Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling Open

Jonathan Shen, Jia Ye, Mike Chrzanowski, Yu Zhang, Isaac Elias , et al. · 2020

This paper presents Non-Attentive Tacotron based on the Tacotron 2 text-to-speech model, replacing the attention mechanism with an explicit duration predictor. This improves robustness significantly as measured by unaligned duration ratio …

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling Open

Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia Xu Chen , et al. · 2019

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible an…

Hierarchical Generative Modeling for Controllable Speech Synthesis Open

Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu , et al. · 2018

This paper proposes a neural sequence-to-sequence text-to-speech (TTS) model which can control latent attributes in the generated speech that are rarely annotated in the training data, such as speaking style, accent, background noise, and …

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Open

Jia Ye, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen , et al. · 2018

We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. Our system consists of three independently …

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Open

Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly , et al. · 2017

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spect…

In Teacher We Trust: Learning Compressed Models for Pedestrian Detection Open

Jonathan Shen, Noranart Vesdapunt, Vishnu Naresh Boddeti, Kris Kitani · 2016

Deep convolutional neural networks continue to advance the state-of-the-art in many domains as they grow bigger and more complex. It has been observed that many of the parameters of a large network are redundant, allowing for the possibili…

Jonathan Shen YOU? Author Swipe