Amit H. Bermano
YOU?
Author Swipe
View article: PractiLight: Practical Light Control Using Foundational Diffusion Models
PractiLight: Practical Light Control Using Foundational Diffusion Models Open
Light control in generated images is a difficult task, posing specific challenges, spanning over the entire image and frequency spectrum. Most approaches tackle this problem by training on extensive yet domain-specific datasets, limiting t…
View article: Express4D: Expressive, Friendly, and Extensible 4D Facial Motion Generation Benchmark
Express4D: Expressive, Friendly, and Extensible 4D Facial Motion Generation Benchmark Open
Dynamic facial expression generation from natural language is a crucial task in Computer Graphics, with applications in Animation, Virtual Avatars, and Human-Computer Interaction. However, current generative models suffer from datasets tha…
View article: Attention (as Discrete-Time Markov) Chains
Attention (as Discrete-Time Markov) Chains Open
We introduce a new interpretation of the attention matrix as a discrete-time Markov chain. Our interpretation sheds light on common operations involving attention scores such as selection, summation, and averaging in a unified framework. I…
View article: HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization
HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization Open
We present HOIDiNi, a text-driven diffusion framework for synthesizing realistic and plausible human-object interaction (HOI). HOI generation is extremely challenging since it induces strict contact accuracies alongside a diverse motion ma…
View article: AnyTop: Character Animation Diffusion with Any Topology
AnyTop: Character Animation Diffusion with Any Topology Open
Generating motion for arbitrary skeletons is a longstanding challenge in computer graphics, remaining largely unexplored due to the scarcity of diverse datasets and the irregular nature of the data. In this work, we introduce AnyTop, a dif…
View article: ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation Open
Diffusion models enable high-quality and diverse visual content synthesis. However, they struggle to generate rare or unseen concepts. To address this challenge, we explore the usage of Retrieval-Augmented Generation (RAG) with image gener…
View article: Data Efficient Molecular Image Representation Learning using Foundation Models
Data Efficient Molecular Image Representation Learning using Foundation Models Open
Deep learning (DL) in chemistry has made significant progress, yet its applicability is limited by the scarcity of large, labeled datasets and the difficulty of extracting meaningful molecular features. Recently, molecular representation l…
View article: Data efficient molecular image representation learning using foundation models
Data efficient molecular image representation learning using foundation models Open
A general image foundation model was used as the basis for molecular representation learning, showcasing its benefits in chemical property prediction through a stratified pretraining workflow.
View article: Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects
Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects Open
We propose a generative technique to edit 3D shapes, represented as meshes, NeRFs, or Gaussian Splats, in approximately 3 seconds, without the need for running an SDS type of optimization. Our key insight is to cast 3D editing as a multivi…
View article: CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control
CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control Open
Motion diffusion models and Reinforcement Learning (RL) based control for physics-based simulations have complementary strengths for human motion generation. The former is capable of generating a wide variety of motions, adhering to intuit…
View article: ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Open
The practical use of text-to-image generation has evolved from simple, monolithic models to complex workflows that combine multiple specialized components. While workflow-based approaches can lead to improved image quality, crafting effect…
View article: Casper DPM: Cascaded Perceptual Dynamic Projection Mapping onto Hands
Casper DPM: Cascaded Perceptual Dynamic Projection Mapping onto Hands Open
We present a technique for dynamically projecting 3D content onto human hands with short perceived motion-to-photon latency. Computing the pose and shape of human hands accurately and quickly is a challenging task due to their articulated …
View article: Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion
Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion Open
This work addresses the challenge of quantifying originality in text-to-image (T2I) generative diffusion models, with a focus on copyright originality. We begin by evaluating T2I models' ability to innovate and generalize through controlle…
View article: Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild
Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild Open
Virtual Try-On (VTON) is a highly active line of research, with increasing demand. It aims to replace a piece of garment in an image with one from another, while preserving person and garment characteristics as well as image fidelity. Curr…
View article: V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data
V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data Open
Diffusion-based generative models have recently shown remarkable image and video editing capabilities. However, local video editing, particularly removal of small attributes like glasses, remains a challenge. Existing methods either alter …
View article: Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer
Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer Open
Given the remarkable results of motion synthesis with diffusion models, a natural question arises: how can we effectively leverage these models for motion editing? Existing diffusion-based motion editing methods overlook the profound poten…
View article: LCM-Lookahead for Encoder-based Text-to-Image Personalization
LCM-Lookahead for Encoder-based Text-to-Image Personalization Open
Recent advancements in diffusion models have introduced fast sampling methods that can effectively produce high-quality images in just one or a few denoising steps. Interestingly, when these are distilled from existing diffusion models, th…
View article: Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes
Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes Open
The advent of Generative Artificial Intelligence (GenAI) models, including GitHub Copilot, OpenAI GPT, and Stable Diffusion, has revolutionized content creation, enabling non-professionals to produce high-quality content across various dom…
View article: MagicClay: Sculpting Meshes With Generative Neural Fields
MagicClay: Sculpting Meshes With Generative Neural Fields Open
The recent developments in neural fields have brought phenomenal capabilities to the field of shape generation, but they lack crucial properties, such as incremental control - a fundamental requirement for artistic work. Triangular meshes,…
View article: Breathing Life Into Sketches Using Text-to-Video Priors
Breathing Life Into Sketches Using Text-to-Video Priors Open
A sketch is one of the most intuitive and versatile tools humans use to convey their ideas visually. An animated sketch opens another dimension to the expression of ideas and is widely used by designers for a variety of purposes. Animating…
View article: MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion
MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion Open
We introduce Multi-view Ancestral Sampling (MAS), a method for 3D motion generation, using 2D diffusion models that were trained on motions obtained from in-the-wild videos. As such, MAS opens opportunities to exciting and diverse fields o…
View article: State of the Art on Diffusion Models for Visual Computing
State of the Art on Diffusion Models for Visual Computing Open
The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. …
View article: OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks
OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks Open
Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data to misguide the model into incorrect classifications. Creating these attacks is a challenging task, especially with the ever-increasin…