Explanipedia

ComposeMe: Attribute-Specific Image Prompts for Controllable Human Image Generation Open

Guocheng Qian, Daniil Ostashev, Egor Nemchinov, Avihay Assouline, Sergey Tulyakov , et al. · 2025

Generating high-fidelity images of humans with fine-grained control over attributes such as hairstyle and clothing remains a core challenge in personalized text-to-image synthesis. While prior methods emphasize identity preservation from a…

Scaling Group Inference for Diverse and High-Quality Generation Open

Gaurav Parmar, Or Patashnik, Daniil Ostashev, Kuan-Chieh Wang, Kfir Aberman , et al. · 2025

Generative models typically sample outputs independently, and recent inference-time guidance and scaling algorithms focus on improving the quality of individual samples. However, in real-world applications, users are often presented with a…

Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA Open

Rameen Abdal, Or Patashnik, Ekaterina Deyneka, Hao Chen, Aliaksandr Siarohin , et al. · 2025

Recent advances in text-to-video generation have enabled high-quality synthesis from text and image prompts. While the personalization of dynamic concepts, which capture subject-specific appearance and motion from a single video, is now fe…

3D PixBrush: Image-Guided Local Texture Synthesis Open

Dale Decatur, Itai Lang, Kfir Aberman, Rana Hanocka · 2025

We present 3D PixBrush, a method for performing image-driven edits of local regions on 3D meshes. 3D PixBrush predicts a localization mask and a synthesized texture that faithfully portray the object in the reference image. Our predicted l…

Be Decisive: Noise-Induced Layouts for Multi-Subject Generation Open

Omer Dahary, Yonni Cohen, Or Patashnik, Kfir Aberman, Daniel Cohen‐Or · 2025

Generating multiple distinct subjects remains a challenge for existing text-to-image diffusion models. Complex prompts often lead to subject leakage, causing inaccuracies in quantities, attributes, and visual features. Preventing leakage a…

Dynamic Concepts Personalization from Single Videos Open

Rameen Abdal, Or Patashnik, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin , et al. · 2025

Computer science

Personalizing generative text-to-image models has seen remarkable progress, but extending this personalization to text-to-video models presents unique challenges. Unlike static concepts, personalizing text-to-video models has the potential…

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Open

Zetian Mi, Kuan-Chieh Wang, Guocheng Qian, Hanrong Ye, Runtao Liu , et al. · 2025

Computer science Psychology Physics

This paper presents ThinkDiff, a novel alignment paradigm that empowers text-to-image diffusion models with multimodal in-context understanding and reasoning capabilities by integrating the strengths of vision-language models (VLMs). Exist…

Multi-subject Open-set Personalization in Video Generation Open

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Yuwei Fang, Kwot Sin Lee , et al. · 2025

Computer science

Video personalization methods allow us to synthesize videos with specific concepts such as people, pets, and places. However, existing methods often focus on limited domains, require time-consuming optimization per subject, or support only…

Object-level Visual Prompts for Compositional Image Generation Open

Gaurav Parmar, Or Patashnik, Kuan-Chieh Wang, Daniil Ostashev, Srinivasa G. Narasimhan , et al. · 2025

Computer science

We introduce a method for composing object-level visual prompts within a text-to-image diffusion model. Our approach addresses the task of generating semantically coherent compositions across diverse scenes and styles, similar to the versa…

Nested Attention: Semantic-aware Attention Values for Concept Personalization Open

Or Patashnik, Rinon Gal, Daniil Ostashev, Sergey Tulyakov, Kfir Aberman , et al. · 2025

Computer science Psychology

Personalizing text-to-image models to generate images of specific subjects across diverse scenes and styles is a rapidly advancing field. Current approaches often face challenges in maintaining a balance between identity preservation and a…

Omni-ID: Holistic Identity Representation Designed for Generative Tasks Open

Guocheng Qian, Kuan-Chieh Wang, Or Patashnik, Negin Heravi, Daniil Ostashev , et al. · 2024

Computer science Psychology Engineering

We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. …

InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention Open

Howard Zhang, Yuval Alaluf, Sizhuo Ma, Achuta Kadambi, Jian Wang , et al. · 2024

Computer science Philosophy

Face image restoration aims to enhance degraded facial images while addressing challenges such as diverse degradation types, real-time processing demands, and, most crucially, the preservation of identity-specific features. Existing method…

Stable Flow: Vital Layers for Training-Free Image Editing Open

Omri Avrahami, Or Patashnik, Ohad Fried, Egor Nemchinov, Kfir Aberman , et al. · 2024

Computer science Geography Physics

Diffusion models have revolutionized the field of content synthesis and editing. Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT), and employed flow-matching for improved training and sampl…

RealFill: Reference-Driven Generation for Authentic Image Completion Open

Luming Tang, Nataniel Ruiz, Qinghao Chu, Yuanzhen Li, Aleksander Holynski , et al. · 2024

Computer science

Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions. However, the content these models hallucinate is necessarily inauthentic,…

Efficient Training with Denoised Neural Weights Open

Yifan Gong, Zheng Zhan, Yanyu Li, Yerlan Idelbayev, Andrey Zharkov , et al. · 2024

Computer science Geography

Good weight initialization serves as an effective measure to reduce the training cost of a deep neural network (DNN) model. The choice of how to initialize parameters is challenging and may require manual tuning, which can be time-consumin…

TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis Open

Z Zhang, Richard Liu, Rana Hanocka, Kfir Aberman · 2024

Computer science Physics

The gradual nature of a diffusion process that synthesizes samples in small increments constitutes a key ingredient of Denoising Diffusion Probabilistic Models (DDPM), which have presented unprecedented quality in image synthesis and been …

Interpreting the Weight Space of Customized Diffusion Models Open

Amil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein , et al. · 2024

Computer science Physics

We investigate the space of weights spanned by a large collection of customized diffusion models. We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person's v…

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation Open

Kuan-Chieh, Wang, Daniil Ostashev, Yuwei Fang, Sergey Tulyakov , et al. · 2024

Computer science Geography

We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA distributes the generation…

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Open

Omer Dahary, Or Patashnik, Kfir Aberman, Daniel Cohen‐Or · 2024

Computer science Mathematics

Text-to-image diffusion models have an unprecedented ability to generate diverse and high-quality images. However, they often struggle to faithfully capture the intended semantics of complex input prompts that include multiple subjects. Re…

MyVLM: Personalizing VLMs for User-Specific Queries Open

Yuval Alaluf, Elad Richardson, Sergey Tulyakov, Kfir Aberman, Daniel Cohen‐Or · 2024

Computer science

Recent large-scale vision-language models (VLMs) have demonstrated remarkable capabilities in understanding and generating textual descriptions for visual content. However, these models lack an understanding of user-specific concepts. In t…

AToM: Amortized Text-to-Mesh using 2D Diffusion Open

Guocheng Qian, Jun‐Li Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang , et al. · 2024

Mathematics Computer science Physics

We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization an…

E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation Open

Yifan Gong, Zheng Zhan, Qing Jin, Yanyu Li, Yerlan Idelbayev , et al. · 2024

Computer science Mathematics Chemistry

One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models to generate paired datasets used for training generative adversa…

Personalized Restoration via Dual-Pivot Tuning Open

Pradyumna Chari, Sizhuo Ma, Daniil Ostashev, Achuta Kadambi, Gurunandan Krishnan , et al. · 2023

Computer science Physics Art

Generative diffusion models can serve as a prior which ensures that solutions of image restoration systems adhere to the manifold of natural images. However, for restoring facial images, a personalized prior is necessary to accurately repr…

Break-A-Scene: Extracting Multiple Concepts from a Single Image Open

Omri Avrahami, Kfir Aberman, Ohad Fried, Daniel Cohen‐Or, Dani Lischinski · 2023

Computer science Economics Physics

Text-to-image model personalization aims to introduce a user-provided concept to the model, allowing its synthesis in diverse contexts. However, current methods primarily focus on the case of learning a single concept from multiple images …

Orthogonal Adaptation for Modular Customization of Diffusion Models Open

Ryan Po, Guandao Yang, Kfir Aberman, Gordon Wetzstein · 2023

Computer science Mathematics Physics

Customization techniques for text-to-image models have paved the way for a wide range of previously unattainable applications, enabling the generation of specific concepts across diverse contexts and styles. While existing methods facilita…

3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation Open

Dale Decatur, Itai Lang, Kfir Aberman, Rana Hanocka · 2023

Computer science

In this work we develop 3D Paintbrush, a technique for automatically texturing local semantic regions on meshes via text descriptions. Our method is designed to operate directly on meshes, producing texture maps which seamlessly integrate …

State of the Art on Diffusion Models for Visual Computing Open

Ryan Po, Wang Yi-fan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron , et al. · 2023

Computer science Mathematics

The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. …

Kfir Aberman YOU? Author Swipe