Eli Shechtman
YOU?
Author Swipe
View article: RELIC: Interactive Video World Model with Long-Horizon Memory
RELIC: Interactive Video World Model with Long-Horizon Memory Open
A truly interactive world model requires three key ingredients: real-time long-horizon streaming, consistent spatial memory, and precise user control. However, most existing approaches address only one of these aspects in isolation, as ach…
View article: MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training
MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training Open
Video diffusion models achieve strong frame-level fidelity but still struggle with motion coherence, dynamics and realism, often producing jitter, ghosting, or implausible dynamics. A key limitation is that the standard denoising MSE objec…
View article: Fine-grained Defocus Blur Control for Generative Image Models
Fine-grained Defocus Blur Control for Generative Image Models Open
Current text-to-image diffusion models excel at generating diverse, high-quality images, yet they struggle to incorporate fine-grained camera metadata such as precise aperture settings. In this work, we introduce a novel text-to-image diff…
View article: Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos Open
We propose a generative model that, given a coarsely edited image, synthesizes a photorealistic output that follows the prescribed layout. Our method transfers fine details from the original image and preserve the identity of its parts. Ye…
View article: Identifying Prompted Artist Names from Generated Images
Identifying Prompted Artist Names from Generated Images Open
A common and controversial use of text-to-image models is to generate pictures by explicitly naming artists, such as "in the style of Greg Rutkowski". We introduce a benchmark for prompted-artist recognition: predicting which artist names …
View article: Long-Context State-Space Video World Models
Long-Context State-Space Video World Models Open
Video diffusion models have recently shown promise for world modeling through autoregressive frame prediction conditioned on actions. However, they struggle to maintain long-term memory due to the high computational cost associated with pr…
View article: SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models Open
We present SliderSpace, a framework for automatically decomposing the visual capabilities of diffusion models into controllable and human-understandable directions. Unlike existing control methods that require a user to specify attributes …
View article: Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers
Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers Open
Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation quality but suffer from high latency and memory inefficiency, making them difficult to deploy on resource-constrained devices. One major efficiency bottle…
View article: From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models Open
Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies. The generation of a single frame requires the model to process the entire sequence, …
View article: Customizing Text-to-Image Diffusion with Object Viewpoint Control
Customizing Text-to-Image Diffusion with Object Viewpoint Control Open
View article: Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment Open
In this paper, we introduce a model designed to improve the prediction of image-text alignment, targeting the challenge of compositional understanding in current visual-language models. Our approach focuses on generating high-quality train…
View article: TurboEdit: Instant text-based image editing
TurboEdit: Instant text-based image editing Open
We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input…
View article: Image Neural Field Diffusion Models
Image Neural Field Diffusion Models Open
Diffusion models have shown an impressive ability to model complex data distributions, with several key advantages over GANs, such as stable training, better coverage of the training distribution's modes, and the ability to solve inverse p…
View article: Improved Distribution Matching Distillation for Fast Image Synthesis
Improved Distribution Matching Distillation for Fast Image Synthesis Open
Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their teacher in distribution, without enfor…
View article: Distilling Diffusion Models into Conditional GANs
Distilling Diffusion Models into Conditional GANs Open
We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a p…
View article: Editable Image Elements for Controllable Synthesis
Editable Image Elements for Controllable Synthesis Open
Diffusion models have made significant advances in text-guided synthesis tasks. However, editing user-provided images remains challenging, as the high dimensional noise input space of diffusion models is not naturally suited for image inve…
View article: Lazy Diffusion Transformer for Interactive Image Editing
Lazy Diffusion Transformer for Interactive Image Editing Open
We introduce a novel diffusion transformer, LazyDiffusion, that generates partial image updates efficiently. Our approach targets interactive image editing applications in which, starting from a blank canvas or an image, a user specifies a…
View article: Customizing Text-to-Image Diffusion with Object Viewpoint Control
Customizing Text-to-Image Diffusion with Object Viewpoint Control Open
Model customization introduces new concepts to existing text-to-image models, enabling the generation of these new concepts/objects in novel contexts. However, such methods lack accurate camera view control with respect to the new object, …
View article: VideoGigaGAN: Towards Detail-rich Video Super-Resolution
VideoGigaGAN: Towards Detail-rich Video Super-Resolution Open
Video super-resolution (VSR) approaches have shown impressive temporal consistency in upsampled videos. However, these approaches tend to generate blurrier results than their image counterparts as they are limited in their generative capab…
View article: Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos Open
We propose a generative model that, given a coarsely edited image, synthesizes a photorealistic output that follows the prescribed layout. Our method transfers fine details from the original image and preserve the identity of its parts. Ye…
View article: Jump Cut Smoothing for Talking Heads
Jump Cut Smoothing for Talking Heads Open
A jump cut offers an abrupt, sometimes unwanted change in the viewing experience. We present a novel framework for smoothing these jump cuts, in the context of talking head videos. We leverage the appearance of the subject from the other s…
View article: NewMove: Customizing text-to-video models with novel motions
NewMove: Customizing text-to-video models with novel motions Open
We introduce an approach for augmenting text-to-video generation models with customized motions, extending their capabilities beyond the motions depicted in the original training data. By leveraging a few video samples demonstrating specif…
View article: One-step Diffusion with Distribution Matching Distillation
One-step Diffusion with Distribution Matching Distillation Open
Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on im…
View article: Perceptual Artifacts Localization for Image Synthesis Tasks
Perceptual Artifacts Localization for Image Synthesis Tasks Open
Recent advancements in deep generative models have facilitated the creation of photo-realistic images across various tasks. However, these generated images often exhibit perceptual artifacts in specific regions, necessitating manual correc…
View article: Diffusion Image Analogies
Diffusion Image Analogies Open
In this paper we present Diffusion Image Analogies—an example-based image editing approach that builds upon the concept of image analogies originally introduced by Hertzmann et al. [2001]. Given a pair of images that specify the intent of …
View article: DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer
DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer Open
Neural Style Transfer (NST) is the field of study applying neural techniques to modify the artistic appearance of a content image to match the style of a reference style image. Traditionally, NST methods have focused on texture-based image…
View article: Realistic Saliency Guided Image Enhancement
Realistic Saliency Guided Image Enhancement Open
Common editing operations performed by professional photographers include the cleanup operations: de-emphasizing distracting elements and enhancing subjects. These edits are challenging, requiring a delicate balance between manipulating th…
View article: SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network
SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network Open
In photo editing, it is common practice to remove visual distractions to improve the overall image quality and highlight the primary subject. However, manually selecting and removing these small and dense distracting regions can be a labor…
View article: NeAT: Neural Artistic Tracing for Beautiful Style Transfer
NeAT: Neural Artistic Tracing for Beautiful Style Transfer Open
Style transfer is the task of reproducing the semantic contents of a source image in the artistic style of a second target image. In this paper, we present NeAT, a new state-of-the art feed-forward style transfer method. We re-formulate fe…
View article: Automatic High Resolution Wire Segmentation and Removal
Automatic High Resolution Wire Segmentation and Removal Open
Wires and powerlines are common visual distractions that often undermine the aesthetics of photographs. The manual process of precisely segmenting and removing them is extremely tedious and may take up hours, especially on high-resolution …