Heewoo Jun
YOU?
Author Swipe
View article: Shap-E: Generating Conditional 3D Implicit Functions
Shap-E: Generating Conditional 3D Implicit Functions Open
We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered a…
View article: Point-E: A System for Generating 3D Point Clouds from Complex Prompts
Point-E: A System for Generating 3D Point Clouds from Complex Prompts Open
While recent work on text-conditional 3D object generation has shown promising results, the state-of-the-art methods typically require multiple GPU-hours to produce a single sample. This is in stark contrast to state-of-the-art generative …
View article: Efficient Training of Language Models to Fill in the Middle
Efficient Training of Language Models to Fill in the Middle Open
We show that autoregressive language models can learn to infill text after we apply a straightforward transformation to the dataset, which simply moves a span of text from the middle of a document to its end. While this data augmentation h…
View article: Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code Open
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we…
View article: Scaling Laws for Autoregressive Generative Modeling
Scaling Laws for Autoregressive Generative Modeling Open
We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transform…
View article: Jukebox: A Generative Model for Music
Jukebox: A Generative Model for Music Open
We introduce Jukebox, a model that generates music with singing in the raw audio domain. We tackle the long context of raw audio using a multi-scale VQ-VAE to compress it to discrete codes, and modeling those using autoregressive Transform…
View article: Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks
Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks Open
We propose the multi-head convolutional neural network (MCNN) architecture\nfor waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is\nemployed with transposed convolution layers in parallel heads. MCNN achieves\nmore th…
View article: Language Modeling at Scale
Language Modeling at Scale Open
We show how Zipf's Law can be used to scale up language modeling (LM) to take advantage of more training data and more GPUs. LM plays a key role in many important natural language applications such as speech recognition and machine transla…
View article: Cold Fusion: Training Seq2Seq Models Together with Language Models
Cold Fusion: Training Seq2Seq Models Together with Language Models Open
Sequence-to-sequence (Seq2Seq) models with attention have excelled at tasks which involve generating natural language sentences such as machine translation, image captioning and speech recognition. Performance has further been improved by …
View article: Robust Speech Recognition Using Generative Adversarial Networks
Robust Speech Recognition Using Generative Adversarial Networks Open
This paper describes a general, scalable, end-to-end framework that uses the generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learn…
View article: Deep Learning Scaling is Predictable, Empirically
Deep Learning Scaling is Predictable, Empirically Open
Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve ac…
View article: Reducing Bias in Production Speech Models
Reducing Bias in Production Speech Models Open
Replacing hand-engineered pipelines with end-to-end deep learning systems has enabled strong results in applications like speech and object recognition. However, the causality and latency constraints of production systems put end-to-end sp…