K J Joseph
YOU?
Author Swipe
View article: Teleportraits: Training-Free People Insertion into Any Scene
Teleportraits: Training-Free People Insertion into Any Scene Open
The task of realistically inserting a human from a reference image into a background scene is highly challenging, requiring the model to (1) determine the correct location and poses of the person and (2) perform high-quality personalizatio…
View article: Do It Yourself (DIY): Modifying Images for Poems in a Zero-Shot Setting Using Weighted Prompt Manipulation
Do It Yourself (DIY): Modifying Images for Poems in a Zero-Shot Setting Using Weighted Prompt Manipulation Open
Poetry is an expressive form of art that invites multiple interpretations, as readers often bring their own emotions, experiences, and cultural backgrounds into their understanding of a poem. Recognizing this, we aim to generate images for…
View article: Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models
Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models Open
The task of text-to-image generation has encountered significant challenges when applied to literary works, especially poetry. Poems are a distinct form of literature, with meanings that frequently transcend beyond the literal words. To ad…
View article: Design-o-meter: Towards Evaluating and Refining Graphic Designs
Design-o-meter: Towards Evaluating and Refining Graphic Designs Open
Graphic designs are an effective medium for visual communication. They range from greeting cards to corporate flyers and beyond. Off-late, machine learning techniques are able to generate such designs, which accelerates the rate of content…
View article: MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models Open
Music is a universal language that can communicate emotions and feelings. It forms an essential part of the whole spectrum of creative media, ranging from movies to social media posts. Machine learning models that can synthesize music are …
View article: CoPL: Contextual Prompt Learning for Vision-Language Understanding
CoPL: Contextual Prompt Learning for Vision-Language Understanding Open
Recent advances in multimodal learning has resulted in powerful vision-language models, whose representations are generalizable across a variety of downstream tasks. Recently, their generalization ability has been further extended by incor…
View article: Iterative Multi-granular Image Editing using Diffusion Models
Iterative Multi-granular Image Editing using Diffusion Models Open
Recent advances in text-guided image synthesis has dramatically changed how creative professionals generate artistic and aesthetically pleasing visual assets. To fully support such creative endeavors, the process should possess the ability…
View article: CoPL: Contextual Prompt Learning for Vision-Language Understanding
CoPL: Contextual Prompt Learning for Vision-Language Understanding Open
Recent advances in multimodal learning has resulted in powerful vision-language models, whose representations are generalizable across a variety of downstream tasks. Recently, their generalization ability has been further extended by incor…
View article: A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis
A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis Open
While recent developments in text-to-image generative models have led to a suite of high-performing methods capable of producing creative imagery from free-form text, there are several limitations. By analyzing the cross-attention represen…
View article: Incremental Object Detection via Meta-Learning
Incremental Object Detection via Meta-Learning Open
In a real-world setting, object instances from new classes can be continuously encountered by object detectors. When existing object detectors are applied to such scenarios, their performance on old classes deteriorates significantly. A fe…
View article: Towards Open World Object Detection
Towards Open World Object Detection Open
Humans have a natural instinct to identify unknown object instances in their environments. The intrinsic curiosity about these unknown instances aids in learning about them, when the corresponding knowledge is eventually available. This mo…
View article: Meta-Consolidation for Continual Learning
Meta-Consolidation for Continual Learning Open
The ability to continuously learn and adapt itself to new tasks, without losing grasp of already acquired knowledge is a hallmark of biological learning systems, which current deep learning systems fall short of. In this work, we present a…
View article: Meta-Consolidation for Continual Learning
Meta-Consolidation for Continual Learning Open
The ability to continuously learn and adapt itself to new tasks, without losing grasp of already acquired knowledge is a hallmark of biological learning systems, which current deep learning systems fall short of. In this work, we present a…
View article: Zero Shot Domain Generalization
Zero Shot Domain Generalization Open
Standard supervised learning setting assumes that training data and test data come from the same distribution (domain). Domain generalization (DG) methods try to learn a model that when trained on data from multiple domains, would generali…
View article: Submodular Batch Selection for Training Deep Neural Networks
Submodular Batch Selection for Training Deep Neural Networks Open
Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today.We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation …
View article: Submodular Batch Selection for Training Deep Neural Networks
Submodular Batch Selection for Training Deep Neural Networks Open
Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation…
View article: MASON: A Model AgnoStic ObjectNess Framework
MASON: A Model AgnoStic ObjectNess Framework Open
This paper proposes a simple, yet very effective method to localize dominant foreground objects in an image, to pixel-level precision. The proposed method 'MASON' (Model-AgnoStic ObjectNess) uses a deep convolutional network to generate ca…
View article: C4Synth: Cross-Caption Cycle-Consistent Text-to-Image Synthesis
C4Synth: Cross-Caption Cycle-Consistent Text-to-Image Synthesis Open
Generating an image from its description is a challenging task worth solving because of its numerous practical applications ranging from image editing to virtual reality. All existing methods use one single caption to generate a plausible …