Explanipedia

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space Open

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul , et al. · 2025

We present evaluation results for FLUX.1 Kontext, a generative flow matching model that unifies image generation and editing. The model generates novel output views by incorporating semantic context from text and image inputs. Using a simp…

test1 Open

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian , et al. · 2025

Computer science Economics

We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into ge…

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Open

Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser , et al. · 2024

Computer science Chemistry Physics

Diffusion models are the main driver of progress in image and video synthesis, but suffer from slow inference speed. Distillation methods, like the recently introduced adversarial diffusion distillation (ADD) aim to shift the model from ma…

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion Open

Vikram Voleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz , et al. · 2024

Computer science Physics

We present Stable Video 3D (SV3D) -- a latent video diffusion model for high-resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent work on 3D generation propose techniques to adapt 2D generative models for…

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Open

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller , et al. · 2024

Computer science Mathematics Engineering

Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos. Rectified flow is a …

aMUSEd: An Open MUSE Reproduction Open

Suraj Patil, William Berman, Robin Rombach, Patrick von Platen · 2024

Computer science Geography

We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared…

DiffusionSat: A Generative Foundation Model for Satellite Imagery Open

Samar Khanna, Patrick Liu, Linqi Zhou, Chenlin Meng, Robin Rombach , et al. · 2023

Computer science Geography Engineering

Diffusion models have achieved state-of-the-art results on many modalities including images, speech, and video. However, existing models are not tailored to support remote sensing data, which is widely used in important applications includ…

Adversarial Diffusion Distillation Open

Axel Sauer, Dominik Lorenz, Andreas Blattmann, Robin Rombach · 2023

Computer science Chemistry Physics

We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality. We use score distillation to …

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Open

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn , et al. · 2023

Computer science Physics Biology

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention b…

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models Open

Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, K Schwarz , et al. · 2023

Computer science Political science Philosophy

Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizin…

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models Open

Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim , et al. · 2023

Computer science Physics

Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution vi…

Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models Open

Robin Rombach, Andreas Blattmann, Björn Ommer · 2022

Computer science Mathematics

Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Of particular note is the field of ``AI-Art'', which has seen unprecedented growth with the emergence of powerful m…

Semi-Parametric Neural Image Synthesis Open

Andreas Blattmann, Robin Rombach, Kaan Oktay, Björn Ommer · 2022

Computer science Mathematics

Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Much of this success is due to the scalability of these architectures and hence caused by a dramatic increase in mo…

Invertible Neural Networks for Understanding Semantics of Invariances of CNN Representations Open

Robin Rombach, Patrick Esser, Andreas Blattmann, Björn Ommer · 2022

Computer science Political science

To tackle increasingly complex tasks, it has become an essential ability of neural networks to learn abstract representations. These task-specific representations and, particularly, the invariances they capture turn neural networks into bl…

High-Resolution Image Synthesis with Latent Diffusion Models Open

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer · 2021

Computer science Mathematics

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a gu…

Geometry-Free View Synthesis: Transformers and no 3D Priors Open

Robin Rombach, Patrick Esser, Björn Ommer · 2021

Computer science Physics

Is a geometric model required to synthesize novel views from a single image? Being bound to local convolutions, CNNs need explicit 3D biases to model geometric transformations. In contrast, we demonstrate that a transformer-based model can…

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis Open

Patrick Esser, Robin Rombach, Andreas Blattmann, Björn Ommer · 2021

Computer science Mathematics Biology

Autoregressive models and their sequential factorization of the data likelihood have recently demonstrated great potential for image representation and synthesis. Nevertheless, they incorporate image context in a linear 1D order by attendi…

Taming Transformers for High-Resolution Image Synthesis Open

Patrick Esser, Robin Rombach, Björn Ommer · 2021

Computer science Mathematics Philosophy

Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on a wide variety of tasks. In contrast to CNNs, they contain no inductive bias that prioritizes local interactions. This …

Stochastic Image-to-Video Synthesis using cINNs Open

Michael Dorkenwald, Timo Milbich, Andreas Blattmann, Robin Rombach, Konstantinos G. Derpanis , et al. · 2021

Computer science Mathematics

Video understanding calls for a model to learn the characteristic interplay between static scene content and its dynamics: Given an image, the model must be able to predict a future progression of the portrayed scene and, conversely, a vid…

High-Resolution Complex Scene Synthesis with Transformers Open

Manuel Jahn, Robin Rombach, Björn Ommer · 2021

Computer science Mathematics Physics

The use of coarse-grained layouts for controllable synthesis of complex scene images via deep generative models has recently gained popularity. However, results of current approaches still fall short of their promise of high-resolution syn…

A Note on Data Biases in Generative Models Open

Patrick Esser, Robin Rombach, Björn Ommer · 2020

Computer science Philosophy

It is tempting to think that machines are less prone to unfairness and prejudice. However, machine learning approaches compute their outputs based on data. While biases can enter at any stage of the development pipeline, models are particu…

Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs Open

Robin Rombach, Patrick Esser, Björn Ommer · 2020

Computer science Political science Psychology

To tackle increasingly complex tasks, it has become an essential ability of neural networks to learn abstract representations. These task-specific representations and, particularly, the invariances they capture turn neural networks into bl…

Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs Open

Robin Rombach, Patrick Esser, Björn Ommer · 2020

Computer science Mathematics Chemistry

To tackle increasingly complex tasks, it has become an essential ability of neural networks to learn abstract representations. These task-specific representations and, particularly, the invariances they capture turn neural networks into bl…

A Disentangling Invertible Interpretation Network for Explaining Latent Representations Open

Patrick Esser, Robin Rombach, Björn Ommer · 2020

Computer science Mathematics Political science

Neural networks have greatly boosted performance in computer vision by learning powerful representations of input data. The drawback of end-to-end training for maximal overall performance are black-box models whose hidden representations a…

Network-to-Network Translation with Conditional Invertible Neural Networks Open

Robin Rombach, Patrick Esser, Björn Ommer · 2020

Computer science Mathematics Chemistry

Given the ever-increasing computational costs of modern machine learning models, we need to find new ways to reuse such expert models and thus tap into the resources that have been invested in their creation. Recent work suggests that the …

Network Fusion for Content Creation with Conditional INNs. Open

Robin Rombach, Patrick Esser, Björn Ommer · 2020

Computer science Engineering

Artificial Intelligence for Content Creation has the potential to reduce the amount of manual content creation work significantly. While automation of laborious work is welcome, it is only useful if it allows users to control aspects of th…

Network-to-Network Translation with Conditional Invertible Neural\n Networks Open

Robin Rombach, Patrick Esser, Björn Ommer · 2020

Computer science Mathematics Chemistry

Given the ever-increasing computational costs of modern machine learning\nmodels, we need to find new ways to reuse such expert models and thus tap into\nthe resources that have been invested in their creation. Recent work suggests\nthat t…

Robin Rombach YOU? Author Swipe