Lvmin Zhang
YOU?
Author Swipe
View article: Mixture of Contexts for Long Video Generation
Mixture of Contexts for Long Video Generation Open
Long video generation is fundamentally a long context memory problem: models must retain and retrieve salient events across a long range without collapsing or drifting. However, scaling diffusion transformers to generate long-context video…
View article: Captain Cinema: Towards Short Movie Generation
Captain Cinema: Towards Short Movie Generation Open
We present Captain Cinema, a generation framework for short movie generation. Given a detailed textual description of a movie storyline, our approach firstly generates a sequence of keyframes that outline the entire narrative, which ensure…
View article: Instance Segmentation of Scene Sketches Using Natural Image Priors
Instance Segmentation of Scene Sketches Using Natural Image Priors Open
View article: Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation
Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation Open
Recent advances in diffusion models have enabled high-quality video generation, but the additional temporal dimension significantly increases computational costs, making training and inference on long videos prohibitively expensive. In thi…
View article: Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models
Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models Open
We present a neural network structure, FramePack, to train next-frame (or next-frame-section) prediction models for video generation. FramePack compresses input frame contexts with frame-wise importance so that more frames can be encoded w…
View article: Instance Segmentation of Scene Sketches Using Natural Image Priors
Instance Segmentation of Scene Sketches Using Natural Image Priors Open
Sketch segmentation involves grouping pixels within a sketch that belong to the same object or instance. It serves as a valuable tool for sketch editing tasks, such as moving, scaling, or removing specific components. While image segmentat…
View article: Transparent Image Layer Diffusion using Latent Transparency
Transparent Image Layer Diffusion using Latent Transparency Open
We present LayerDiffuse, an approach enabling large-scale pretrained latent diffusion models to generate transparent images. The method allows generation of single transparent images or of multiple transparent layers. The method learns a "…
View article: Adding Conditional Control to Text-to-Image Diffusion Models
Adding Conditional Control to Text-to-Image Diffusion Models Open
We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust e…
View article: Style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN
Style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN Open
Recently, with the revolutionary neural style transferring methods, creditable paintings can be synthesized automatically from content images and style images. However, when it comes to the task of applying a painting's style to an anime s…