Jaegul Choo
YOU?
Author Swipe
View article: MM-SeR: Multimodal Self-Refinement for Lightweight Image Captioning
MM-SeR: Multimodal Self-Refinement for Lightweight Image Captioning Open
Systems such as video chatbots and navigation robots often depend on streaming image captioning to interpret visual inputs. Existing approaches typically employ large multimodal language models (MLLMs) for this purpose, but their substanti…
View article: The 1st International Workshop on Disentangled Representation Learning for Controllable Generation (DRL4Real): Methods and Results
The 1st International Workshop on Disentangled Representation Learning for Controllable Generation (DRL4Real): Methods and Results Open
This paper reviews the 1st International Workshop on Disentangled Representation Learning for Controllable Generation (DRL4Real), held in conjunction with ICCV 2025. The workshop aimed to bridge the gap between the theoretical promise of D…
View article: DesignLab: Designing Slides Through Iterative Detection and Correction
DesignLab: Designing Slides Through Iterative Detection and Correction Open
Designing high-quality presentation slides can be challenging for non-experts due to the complexity involved in navigating various design choices. Numerous automated tools can suggest layouts and color schemes, yet often lack the ability t…
View article: From Wardrobe to Canvas: Wardrobe Polyptych LoRA for Part-level Controllable Human Image Generation
From Wardrobe to Canvas: Wardrobe Polyptych LoRA for Part-level Controllable Human Image Generation Open
Recent diffusion models achieve personalization by learning specific subjects, allowing learned attributes to be integrated into generated images. However, personalized human image generation remains challenging due to the need for precise…
View article: Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention
Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention Open
Recent advancements in diffusion-based text-to-image (T2I) models have enabled the generation of high-quality and photorealistic images from text. However, they often exhibit societal biases related to gender, race, and socioeconomic statu…
View article: Good Noise Makes Good Edits: A Training-Free Diffusion-Based Video Editing with Image and Text Prompts
Good Noise Makes Good Edits: A Training-Free Diffusion-Based Video Editing with Image and Text Prompts Open
We propose VINO, the first zero-shot, training-free video editing method conditioned on both image and text. Our approach introduces $ρ$-start sampling and dilated dual masking to construct structured noise maps that enable coherent and ac…
View article: Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models
Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models Open
Fine-tuning Video Diffusion Models (VDMs) at the user level to generate videos that reflect specific attributes of training data presents notable challenges, yet remains underexplored despite its practical importance. Meanwhile, recent wor…
View article: Temporal In-Context Fine-Tuning with Temporal Reasoning for Versatile Control of Video Diffusion Models
Temporal In-Context Fine-Tuning with Temporal Reasoning for Versatile Control of Video Diffusion Models Open
Recent advances in text-to-video diffusion models have enabled high-quality video synthesis, but controllable generation remains challenging, particularly under limited data and compute. Existing fine-tuning methods for conditional generat…
View article: Exploring In-context Example Generation for Machine Translation
Exploring In-context Example Generation for Machine Translation Open
Large language models (LLMs) have demonstrated strong performance across various tasks, leveraging their exceptional in-context learning ability with only a few examples. Accordingly, the selection of optimal in-context examples has been a…
View article: Revisiting LLMs as Zero-Shot Time-Series Forecasters: Small Noise Can Break Large Models
Revisiting LLMs as Zero-Shot Time-Series Forecasters: Small Noise Can Break Large Models Open
Large Language Models (LLMs) have shown remarkable performance across diverse tasks without domain-specific training, fueling interest in their potential for time-series forecasting. While LLMs have shown potential in zero-shot forecasting…
View article: Talk to Your Slides: Language-Driven Agents for Efficient Slide Editing
Talk to Your Slides: Language-Driven Agents for Efficient Slide Editing Open
Editing presentation slides remains one of the most common and time-consuming tasks faced by millions of users daily, despite significant advances in automated slide generation. Existing approaches have successfully demonstrated slide edit…
View article: Beyond the Mirror: Personal Analytics through Visual Juxtaposition with Other People's Data
Beyond the Mirror: Personal Analytics through Visual Juxtaposition with Other People's Data Open
An individual's data can reveal facets of behavior and identity, but its interpretation is context dependent. We can easily identify various self-tracking applications that help people reflect on their lives. However, self-tracking confine…
View article: SphereDiff: Tuning-free 360° Static and Dynamic Panorama Generation via Spherical Latent Representation
SphereDiff: Tuning-free 360° Static and Dynamic Panorama Generation via Spherical Latent Representation Open
The increasing demand for AR/VR applications has highlighted the need for high-quality content, such as 360° live wallpapers. However, generating high-quality 360° panoramic contents remains a challenging task due to the severe distortions…
View article: What to Preserve and What to Transfer: Faithful, Identity-Preserving Diffusion-based Hairstyle Transfer
What to Preserve and What to Transfer: Faithful, Identity-Preserving Diffusion-based Hairstyle Transfer Open
Hairstyle transfer is a challenging task in the image editing field that modifies the hairstyle of a given face image while preserving its other appearance and background features. The existing hairstyle transfer approaches heavily rely on…
View article: Enabling Region-Specific Control via Lassos in Point-Based Colorization
Enabling Region-Specific Control via Lassos in Point-Based Colorization Open
Point-based interactive colorization techniques allow users to effortlessly colorize grayscale images using user-provided color hints. However, point-based methods often face challenges when different colors are given to semantically simil…
View article: Zero-Shot Head Swapping in Real-World Scenarios
Zero-Shot Head Swapping in Real-World Scenarios Open
With growing demand in media and social networks for personalized images, the need for advanced head-swapping techniques, integrating an entire head from the head image with the body from the body image, has increased. However, traditional…
View article: GaussianMotion: End-to-End Learning of Animatable Gaussian Avatars with Pose Guidance from Text
GaussianMotion: End-to-End Learning of Animatable Gaussian Avatars with Pose Guidance from Text Open
In this paper, we introduce GaussianMotion, a novel human rendering model that generates fully animatable scenes aligned with textual descriptions using Gaussian Splatting. Although existing methods achieve reasonable text-to-3D generation…
View article: PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask Open
Recent virtual try-on approaches have advanced by finetuning pre-trained text-to-image diffusion models to leverage their powerful generative ability. However, the use of text prompts in virtual try-on remains underexplored. This paper tac…
View article: Enabling Region-Specific Control via Lassos in Point-Based Colorization
Enabling Region-Specific Control via Lassos in Point-Based Colorization Open
Point-based interactive colorization techniques allow users to effortlessly colorize grayscale images using user-provided color hints. However, point-based methods often face challenges when different colors are given to semantically simil…
View article: Sparse autoencoders reveal selective remapping of visual concepts during adaptation
Sparse autoencoders reveal selective remapping of visual concepts during adaptation Open
Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new…
View article: Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling Open
Diffusion models have emerged as a powerful tool for generating high-quality images, videos, and 3D content. While sampling guidance techniques like CFG improve quality, they reduce diversity and motion. Autoguidance mitigates these issues…
View article: Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling
Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling Open
Predicting future international events from textual information, such as news articles, has tremendous potential for applications in global policy, strategic decision-making, and geopolitics. However, existing datasets available for this t…
View article: Evaluating Automatic Speech Recognition Systems for Korean Meteorological Experts
Evaluating Automatic Speech Recognition Systems for Korean Meteorological Experts Open
This paper explores integrating Automatic Speech Recognition (ASR) into natural language query systems to improve weather forecasting efficiency for Korean meteorologists. We address challenges in developing ASR systems for the Korean weat…
View article: Imagining the Unseen: Generative Location Modeling for Object Placement
Imagining the Unseen: Generative Location Modeling for Object Placement Open
Location modeling, or determining where non-existing objects could feasibly appear in a scene, has the potential to benefit numerous computer vision tasks, from automatic object insertion to scene creation in virtual reality. Yet, this cap…
View article: Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning
Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning Open
Large language models (LLMs) serve as giant information stores, often including personal or copyrighted data, and retraining them from scratch is not a viable option. This has led to the development of various fast, approximate unlearning …
View article: SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars
SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars Open
Recent advancements in head avatar rendering using Gaussian primitives have achieved significantly high-fidelity results. Although precise head geometry is crucial for applications like mesh reconstruction and relighting, current methods s…