Ron Mokady
YOU?
Author Swipe
Image Generation from Contextually-Contradictory Prompts Open
Text-to-image diffusion models excel at generating high-quality, diverse images from natural language prompts. However, they often fail to produce semantically accurate results when the prompt contains concept combinations that contradict …
Null-text Inversion for Editing Real Images using Guided Diffusion Models Open
Recent text-guided diffusion models provide powerful image generation capabilities. Currently, a massive effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. To edit …
Text-Only Training for Image Captioning using Noise-Injected CLIP Open
We consider the task of image-captioning using only the CLIP model and additional text data at training time, and no additional captioned images. Our approach relies on the fact that CLIP is trained to make visual and textual embeddings si…
Prompt-to-Prompt Image Editing with Cross Attention Control Open
Recent large-scale text-driven synthesis models have attracted much attention thanks to their remarkable capabilities of generating highly diverse images that follow given text prompts. Such text-based synthesis methods are particularly ap…
Self-Distilled StyleGAN: Towards Generation from Internet Photos Open
StyleGAN is known to produce high-fidelity images, while also offering unprecedented semantic editing. However, these fascinating abilities have been demonstrated only on a limited set of datasets, which are usually structurally aligned an…
JOKR: Joint Keypoint Representation for Unsupervised Video Retargeting Open
In unsupervised video retargeting, content is transferred from one video to another while preserving the original appearance and style, without any additional annotations. While this challenge has seen substantial advancements through the …
State-of-the-Art in the Architecture, Methods and Applications of StyleGAN Open
Generative Adversarial Networks (GANs) have established themselves as a prevalent approach to image synthesis. Of these, StyleGAN offers a fascinating case study, owing to its remarkable visual quality and an ability to support a large arr…
Self-Distilled StyleGAN: Towards Generation from Internet Photos Open
StyleGAN is known to produce high-fidelity images, while also offering unprecedented semantic editing. However, these fascinating abilities have been demonstrated only on a limited set of datasets, which are usually structurally aligned an…
Stitch it in Time: GAN-Based Facial Editing of Real Videos Open
The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing. However, replicating their success with videos has proven challenging. Sets of high-quality…
Text-Only Training for Image Captioning using Noise-Injected CLIP Open
We consider the task of image-captioning using only the CLIP model and additional text data at training time and no additional captioned images. Our approach relies on the fact that CLIP is trained to make visual and textual embeddings sim…
HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing Open
The inversion of real images into StyleGAN's latent space is a well-studied problem. Nevertheless, applying existing approaches to real-world scenarios remains an open challenge, due to an inherent trade-off between reconstruction and edit…
ClipCap: CLIP Prefix for Image Captioning Open
Image captioning is a fundamental task in vision-language understanding, where the model predicts a textual informative caption to a given input image. In this paper, we present a simple approach to address this task. We use CLIP encoding …
JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting Open
The task of unsupervised motion retargeting in videos has seen substantial advancements through the use of deep neural networks. While early works concentrated on specific object priors such as a human face or body, recent work considered …
Pivotal Tuning for Latent-based Editing of Real Images Open
Recently, a surge of advanced facial editing techniques have been proposed that leverage the generative power of a pre-trained StyleGAN. To successfully edit an image this way, one must first project (or invert) the image into the pre-trai…
Structural Analogy from a Single Image Pair Open
The task of unsupervised image‐to‐image translation has seen substantial advancements in recent years through the use of deep neural networks. Typically, the proposed solutions learn the characterizing distribution of two large, unpaired c…
Masked Based Unsupervised Content Transfer Open
We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other. The proposed method disentangles the common and separate parts of these domains an…
Mask Based Unsupervised Content Transfer Open
We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other. The proposed method disentangles the common and separate parts of these domains an…