Kfir Aberman
YOU?
Author Swipe
View article: ComposeMe: Attribute-Specific Image Prompts for Controllable Human Image Generation
ComposeMe: Attribute-Specific Image Prompts for Controllable Human Image Generation Open
Generating high-fidelity images of humans with fine-grained control over attributes such as hairstyle and clothing remains a core challenge in personalized text-to-image synthesis. While prior methods emphasize identity preservation from a…
View article: Scaling Group Inference for Diverse and High-Quality Generation
Scaling Group Inference for Diverse and High-Quality Generation Open
Generative models typically sample outputs independently, and recent inference-time guidance and scaling algorithms focus on improving the quality of individual samples. However, in real-world applications, users are often presented with a…
View article: Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA
Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA Open
Recent advances in text-to-video generation have enabled high-quality synthesis from text and image prompts. While the personalization of dynamic concepts, which capture subject-specific appearance and motion from a single video, is now fe…
View article: 3D PixBrush: Image-Guided Local Texture Synthesis
3D PixBrush: Image-Guided Local Texture Synthesis Open
We present 3D PixBrush, a method for performing image-driven edits of local regions on 3D meshes. 3D PixBrush predicts a localization mask and a synthesized texture that faithfully portray the object in the reference image. Our predicted l…
View article: Be Decisive: Noise-Induced Layouts for Multi-Subject Generation
Be Decisive: Noise-Induced Layouts for Multi-Subject Generation Open
Generating multiple distinct subjects remains a challenge for existing text-to-image diffusion models. Complex prompts often lead to subject leakage, causing inaccuracies in quantities, attributes, and visual features. Preventing leakage a…
View article: Dynamic Concepts Personalization from Single Videos
Dynamic Concepts Personalization from Single Videos Open
Personalizing generative text-to-image models has seen remarkable progress, but extending this personalization to text-to-video models presents unique challenges. Unlike static concepts, personalizing text-to-video models has the potential…
View article: I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Open
This paper presents ThinkDiff, a novel alignment paradigm that empowers text-to-image diffusion models with multimodal in-context understanding and reasoning capabilities by integrating the strengths of vision-language models (VLMs). Exist…
View article: Multi-subject Open-set Personalization in Video Generation
Multi-subject Open-set Personalization in Video Generation Open
Video personalization methods allow us to synthesize videos with specific concepts such as people, pets, and places. However, existing methods often focus on limited domains, require time-consuming optimization per subject, or support only…
View article: Object-level Visual Prompts for Compositional Image Generation
Object-level Visual Prompts for Compositional Image Generation Open
We introduce a method for composing object-level visual prompts within a text-to-image diffusion model. Our approach addresses the task of generating semantically coherent compositions across diverse scenes and styles, similar to the versa…
View article: Nested Attention: Semantic-aware Attention Values for Concept Personalization
Nested Attention: Semantic-aware Attention Values for Concept Personalization Open
Personalizing text-to-image models to generate images of specific subjects across diverse scenes and styles is a rapidly advancing field. Current approaches often face challenges in maintaining a balance between identity preservation and a…
View article: Omni-ID: Holistic Identity Representation Designed for Generative Tasks
Omni-ID: Holistic Identity Representation Designed for Generative Tasks Open
We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. …
View article: InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention
InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention Open
Face image restoration aims to enhance degraded facial images while addressing challenges such as diverse degradation types, real-time processing demands, and, most crucially, the preservation of identity-specific features. Existing method…
View article: Stable Flow: Vital Layers for Training-Free Image Editing
Stable Flow: Vital Layers for Training-Free Image Editing Open
Diffusion models have revolutionized the field of content synthesis and editing. Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT), and employed flow-matching for improved training and sampl…
View article: RealFill: Reference-Driven Generation for Authentic Image Completion
RealFill: Reference-Driven Generation for Authentic Image Completion Open
Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions. However, the content these models hallucinate is necessarily inauthentic,…
View article: Efficient Training with Denoised Neural Weights
Efficient Training with Denoised Neural Weights Open
Good weight initialization serves as an effective measure to reduce the training cost of a deep neural network (DNN) model. The choice of how to initialize parameters is challenging and may require manual tuning, which can be time-consumin…
View article: TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis
TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis Open
The gradual nature of a diffusion process that synthesizes samples in small increments constitutes a key ingredient of Denoising Diffusion Probabilistic Models (DDPM), which have presented unprecedented quality in image synthesis and been …
View article: Interpreting the Weight Space of Customized Diffusion Models
Interpreting the Weight Space of Customized Diffusion Models Open
We investigate the space of weights spanned by a large collection of customized diffusion models. We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person's v…
View article: MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation Open
We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA distributes the generation…
View article: Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Open
Text-to-image diffusion models have an unprecedented ability to generate diverse and high-quality images. However, they often struggle to faithfully capture the intended semantics of complex input prompts that include multiple subjects. Re…
View article: MyVLM: Personalizing VLMs for User-Specific Queries
MyVLM: Personalizing VLMs for User-Specific Queries Open
Recent large-scale vision-language models (VLMs) have demonstrated remarkable capabilities in understanding and generating textual descriptions for visual content. However, these models lack an understanding of user-specific concepts. In t…
View article: AToM: Amortized Text-to-Mesh using 2D Diffusion
AToM: Amortized Text-to-Mesh using 2D Diffusion Open
We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization an…
View article: E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation Open
One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models to generate paired datasets used for training generative adversa…
View article: Personalized Restoration via Dual-Pivot Tuning
Personalized Restoration via Dual-Pivot Tuning Open
Generative diffusion models can serve as a prior which ensures that solutions of image restoration systems adhere to the manifold of natural images. However, for restoring facial images, a personalized prior is necessary to accurately repr…
View article: Break-A-Scene: Extracting Multiple Concepts from a Single Image
Break-A-Scene: Extracting Multiple Concepts from a Single Image Open
Text-to-image model personalization aims to introduce a user-provided concept to the model, allowing its synthesis in diverse contexts. However, current methods primarily focus on the case of learning a single concept from multiple images …
View article: Orthogonal Adaptation for Modular Customization of Diffusion Models
Orthogonal Adaptation for Modular Customization of Diffusion Models Open
Customization techniques for text-to-image models have paved the way for a wide range of previously unattainable applications, enabling the generation of specific concepts across diverse contexts and styles. While existing methods facilita…
View article: 3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation Open
In this work we develop 3D Paintbrush, a technique for automatically texturing local semantic regions on meshes via text descriptions. Our method is designed to operate directly on meshes, producing texture maps which seamlessly integrate …
View article: State of the Art on Diffusion Models for Visual Computing
State of the Art on Diffusion Models for Visual Computing Open
The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. …