Tero Karras
YOU?
Author Swipe
View article: A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation Open
Both text-to-image generation and large language models (LLMs) have made significant advancements. However, many text-to-image models still employ the somewhat outdated T5 and CLIP as their text encoders. In this work, we investigate the e…
View article: Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Open
We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy. Edify Image utilizes cascaded pixel-space diffusion models trained using a novel Laplacian diffusion pro…
View article: Guiding a Diffusion Model with a Bad Version of Itself
Guiding a Diffusion Model with a Bad Version of Itself Open
The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class label or a text prompt. The popular classifie…
View article: Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models Open
Guidance is a crucial technique for extracting the best performance out of image-generating diffusion models. Traditionally, a constant guidance weight has been applied throughout the sampling chain of an image. We show that guidance is cl…
View article: Analyzing and Improving the Training Dynamics of Diffusion Models
Analyzing and Improving the Training Dynamics of Diffusion Models Open
Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets. In this paper, we identify and rectify several causes for uneven and ineffective training in the popular ADM di…
View article: Generative Novel View Synthesis with 3D-Aware Diffusion Models
Generative Novel View Synthesis with 3D-Aware Diffusion Models Open
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our model samples from the distribution of possible renderings consistent with the input and, even in the presence of ambi…
View article: StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis Open
Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. However, the…
View article: Simulator-Based Self-Supervision for Learned 3D Tomography Reconstruction
Simulator-Based Self-Supervision for Learned 3D Tomography Reconstruction Open
We propose a deep learning method for 3D volumetric reconstruction in low-dose helical cone-beam computed tomography. Prior machine learning approaches require reference reconstructions computed by another algorithm for training. In contra…
View article: eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers Open
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashio…
View article: Generating Long Videos of Dynamic Scenes
Generating Long Videos of Dynamic Scenes Open
We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time whi…
View article: Elucidating the Design Space of Diffusion-Based Generative Models
Elucidating the Design Space of Diffusion-Based Generative Models Open
We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets u…
View article: The Role of ImageNet Classes in Fréchet Inception Distance
The Role of ImageNet Classes in Fréchet Inception Distance Open
Fréchet Inception Distance (FID) is the primary metric for ranking models in data-driven generative modeling. While remarkably successful, the metric is known to sometimes disagree with human judgement. We investigate a root cause of these…
View article: Efficient Geometry-aware 3D Generative Adversarial Networks
Efficient Geometry-aware 3D Generative Adversarial Networks Open
Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximation…
View article: Alias-Free Generative Adversarial Networks
Alias-Free Generative Adversarial Networks Open
We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearin…
View article: Alias-Free Generative Adversarial Networks
Alias-Free Generative Adversarial Networks Open
We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearin…
View article: Modular primitives for high-performance differentiable rendering
Modular primitives for high-performance differentiable rendering Open
We present a modular differentiable renderer design that yields performance superior to previous methods by leveraging existing, highly optimized hardware graphics pipelines. Our design supports all crucial operations in a modern graphics …
View article: Training Generative Adversarial Networks with Limited Data
Training Generative Adversarial Networks with Limited Data Open
Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes train…
View article: Analyzing and Improving the Image Quality of StyleGAN
Analyzing and Improving the Image Quality of StyleGAN Open
The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architectu…
View article: Semi-Supervised StyleGAN for Disentanglement Learning
Semi-Supervised StyleGAN for Disentanglement Learning Open
Disentanglement learning is crucial for obtaining disentangled representations and controllable generation. Current disentanglement methods face several inherent limitations: difficulty with high-resolution images, primarily focusing on le…
View article: Few-Shot Unsupervised Image-to-Image Translation
Few-Shot Unsupervised Image-to-Image Translation Open
Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods requ…
View article: A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks Open
We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g.,…
View article: Few-Shot Unsupervised Image-to-Image Translation
Few-Shot Unsupervised Image-to-Image Translation Open
Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods requ…
View article: Improved Precision and Recall Metric for Assessing Generative Models
Improved Precision and Recall Metric for Assessing Generative Models Open
The ability to automatically estimate the quality and coverage of the samples produced by a generative model is a vital requirement for driving algorithm research. We present an evaluation metric that can separately and reliably measure bo…
View article: High-Quality Self-Supervised Deep Image Denoising
High-Quality Self-Supervised Deep Image Denoising Open
We describe a novel method for training high-quality image denoising models based on unorganized collections of corrupted images. The training does not need access to clean reference images, or explicit pairs of corrupted images, and can t…
View article: Texture Level of Detail Strategies for Real-Time Ray Tracing
Texture Level of Detail Strategies for Real-Time Ray Tracing Open
Unlike rasterization, where one can rely on pixel quad partial derivatives, an alternative approach must be taken for filtered texturing during ray tracing. We describe two methods for computing texture level of detail for ray tracing. The…
View article: Noise2Noise: Learning Image Restoration without Clean Data
Noise2Noise: Learning Image Restoration without Clean Data Open
We apply basic statistical reasoning to signal reconstruction by machine learning -- learning to map corrupted observations to clean signals -- with a simple and powerful conclusion: it is possible to learn to restore images by only lookin…
View article: Noise2Noise: Learning image restoration without clean data
Noise2Noise: Learning image restoration without clean data Open
We apply basic statistical reasoning to signal reconstruction by machine learning - learning to map corrupted observations to clean signals - with a simple and powerful conclusion: It is possible to learn to restore images by only looking …
View article: Progressive Growing of GANs for Improved Quality, Stability, and Variation
Progressive Growing of GANs for Improved Quality, Stability, and Variation Open
We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details …
View article: Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning.
Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning. Open
We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with fine-tuning by backpropagation - a computationally efficient procedure that m…
View article: Pruning Convolutional Neural Networks for Resource Efficient Inference
Pruning Convolutional Neural Networks for Resource Efficient Inference Open
We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with fine-tuning by backpropagation - a computationally efficient procedure that m…