Explanipedia

Mixture of Contexts for Long Video Generation Open

Shengqu Cai, Ceyuan Yang, Lvmin Zhang, Yang Guo, Junfei Xiao , et al. · 2025

Long video generation is fundamentally a long context memory problem: models must retain and retrieve salient events across a long range without collapsing or drifting. However, scaling diffusion transformers to generate long-context video…

Captain Cinema: Towards Short Movie Generation Open

Junfei Xiao, Ceyuan Yang, Lvmin Zhang, Shengqu Cai, Yang Zhao , et al. · 2025

We present Captain Cinema, a generation framework for short movie generation. Given a detailed textual description of a movie storyline, our approach firstly generates a sequence of keyframes that outline the entire narrative, which ensure…

Instance Segmentation of Scene Sketches Using Natural Image Priors Open

Mia Tang, Yael Vinker, C.-W. Yan, Lvmin Zhang, Maneesh Agrawala · 2025

Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation Open

Xiangtan Li Xiangtan Li, Miao Li, Tingting Cai, Haocheng Xi, Shuo Yang , et al. · 2025

Recent advances in diffusion models have enabled high-quality video generation, but the additional temporal dimension significantly increases computational costs, making training and inference on long videos prohibitively expensive. In thi…

Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models Open

Lvmin Zhang, Shengqu Cai, Miao Li, Gordon Wetzstein, Maneesh Agrawala · 2025

We present a neural network structure, FramePack, to train next-frame (or next-frame-section) prediction models for video generation. FramePack compresses input frame contexts with frame-wise importance so that more frames can be encoded w…

Instance Segmentation of Scene Sketches Using Natural Image Priors Open

Mia Tang, Yael Vinker, C.-W. Yan, Lvmin Zhang, Maneesh Agrawala · 2025

Sketch segmentation involves grouping pixels within a sketch that belong to the same object or instance. It serves as a valuable tool for sketch editing tasks, such as moving, scaling, or removing specific components. While image segmentat…

Transparent Image Layer Diffusion using Latent Transparency Open

Lvmin Zhang, Maneesh Agrawala · 2024

We present LayerDiffuse, an approach enabling large-scale pretrained latent diffusion models to generate transparent images. The method allows generation of single transparent images or of multiple transparent layers. The method learns a "…

Adding Conditional Control to Text-to-Image Diffusion Models Open

Lvmin Zhang, Maneesh Agrawala · 2023

We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust e…

Style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN Open

Lvmin Zhang, Yi Ji, Xin Lin · 2017

Recently, with the revolutionary neural style transferring methods, creditable paintings can be synthesized automatically from content images and style images. However, when it comes to the task of applying a painting's style to an anime s…

Lvmin Zhang YOU? Author Swipe