Explanipedia

Part-Aware Bottom-Up Group Reasoning for Fine-Grained Social Interaction Detection Open

Dong-Keun Kim, Minsu Cho, Suha Kwak · 2025

Social interactions often emerge from subtle, fine-grained cues such as facial expressions, gaze, and gestures. However, existing methods for social interaction detection overlook such nuanced cues and primarily rely on holistic representa…

GroupCoOp: Group-robust Fine-tuning via Group Prompt Learning Open

Nayeong Kim, Seong Joon Oh, Suha Kwak · 2025

Parameter-efficient fine-tuning (PEFT) of vision-language models (VLMs) excels in various vision tasks thanks to the rich knowledge and generalization ability of VLMs. However, recent studies revealed that such fine-tuned VLMs are vulnerab…

GaRA-SAM: Robustifying Segment Anything Model with Gated-Rank Adaptation Open

Yeho Gwon, Lukas Hoyer, Suha Kwak · 2025

Improving robustness of the Segment Anything Model (SAM) to input degradations is critical for its deployment in high-stakes applications such as autonomous driving and robotics. Our approach to this challenge prioritizes three key aspects…

TestDG: Test-time Domain Generalization for Continual Test-time Adaptation Open

SoHyun Lee, Na-Yeong Kim, Juwon Kang, Seung‐June Oh, Suha Kwak · 2025

This paper studies continual test-time adaptation (CTTA), the task of adapting a model to constantly changing unseen domains in testing while preserving previously learned knowledge. Existing CTTA methods mostly focus on adaptation to the …

Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval Open

Boseung Jeong, Jicheol Park, Sung‐Yeon Kim, Suha Kwak · 2025

Video-text retrieval, the task of retrieving videos based on a textual query or vice versa, is of paramount importance for video understanding and multimodal information retrieval. Recent methods in this area rely primarily on visual and t…

Enhancing Cost Efficiency in Active Learning with Candidate Set Query Open

Yeho Gwon, Sehyun Hwang, Hoyoung Kim, Jungseul Ok, Suha Kwak · 2025

This paper introduces a cost-efficient active learning (AL) framework for classification, featuring a novel query design called candidate set query. Unlike traditional AL queries requiring the oracle to examine all possible classes, our me…

Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens Open

Dongwon Kim, Ju He, Qihang Yu, Chenglin Yang, Xiaohui Shen , et al. · 2025

Image tokenizers form the foundation of modern text-to-image generative models but are notoriously difficult to train. Furthermore, most existing text-to-image models rely on large-scale, high-quality private datasets, making them challeng…

Improving Text-based Person Search via Part-level Cross-modal Correspondence Open

Jicheol Park, Boseung Jeong, Dongwon Kim, Suha Kwak · 2024

Text-based person search is the task of finding person images that are the most relevant to the natural language text description given as query. The main challenge of this task is a large gap between the target images and text queries, wh…

ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation Open

Dayoung Gong, Suha Kwak, Minsu Cho · 2024

Temporal action segmentation and long-term action anticipation are two popular vision tasks for the temporal analysis of actions in videos. Despite apparent relevance and potential complementarity, these two problems have been investigated…

Bootstrapping Top-down Information for Self-modulating Slot Attention Open

Dongwon Kim, Seo-Yeon Kim, Suha Kwak · 2024

Object-centric learning (OCL) aims to learn representations of individual objects within visual scenes without manual supervision, facilitating efficient and effective visual reasoning. Traditional OCL methods primarily employ bottom-up ap…

PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery Open

Jicheol Park, Dong‐Won Kim, Boseung Jeong, Suha Kwak · 2024

Text-based person search, employing free-form text queries to identify individuals within a vast image collection, presents a unique challenge in aligning visual and textual representations, particularly at the human part level. Existing m…

Efficient and Versatile Robust Fine-Tuning of Zero-shot Models Open

Sung‐Yeon Kim, Boseung Jeong, Donghyun Kim, Suha Kwak · 2024

Large-scale image-text pre-trained models enable zero-shot classification and provide consistent accuracy across various data distributions. Nonetheless, optimizing these models in downstream tasks typically requires fine-tuning, which red…

Online Temporal Action Localization with Memory-Augmented Transformer Open

Youngkil Song, Dong-Keun Kim, Minsu Cho, Suha Kwak · 2024

Online temporal action localization (On-TAL) is the task of identifying multiple action instances given a streaming video. Since existing methods take as input only a video segment of fixed size per iteration, they are limited in consideri…

Classification Matters: Improving Video Action Detection with Class-Specific Attention Open

Jinsung Lee, Taeoh Kim, Inwoong Lee, Minho Shim, Dongyoon Wee , et al. · 2024

Video action detection (VAD) aims to detect actors and classify their actions in a video. We figure that VAD suffers more from classification rather than localization of actors. Hence, we analyze how prevailing methods form features for cl…

FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions Open

SoHyun Lee, Namyup Kim, Sungyeon Kim, Suha Kwak · 2024

Robust semantic segmentation under adverse conditions is crucial in real-world applications. To address this challenging task in practical scenarios where labeled normal condition images are not accessible in training, we propose FREST, a …

Extreme Point Supervised Instance Segmentation Open

Hyunjun Lee, Sehyun Hwang, Suha Kwak · 2024

This paper introduces a novel approach to learning instance segmentation using extreme points, i.e., the topmost, leftmost, bottommost, and rightmost points, of each object. These points are readily available in the modern bounding box ann…

Active Label Correction for Semantic Segmentation with Foundation Models Open

Hoyoung Kim, Sehyun Hwang, Suha Kwak, Jungseul Ok · 2024

Training and validating models for semantic segmentation require datasets with pixel-wise annotations, which are notoriously labor-intensive. Although useful priors such as foundation models or crowdsourced datasets are available, they are…

Activity Grammars for Temporal Action Segmentation Open

Dayoung Gong, Joonseok Lee, Deunsol Jung, Suha Kwak, Minsu Cho · 2023

Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an u…

Towards More Practical Group Activity Detection: A New Benchmark and Model Open

Dong-Keun Kim, Youngkil Song, Minsu Cho, Suha Kwak · 2023

Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video. While GAD has been studied recently, there is still much room for improvement in both da…

Active Learning for Semantic Segmentation with Multi-class Label Query Open

Sehyun Hwang, SoHyun Lee, Hoyoung Kim, Minhyeon Oh, Jungseul Ok , et al. · 2023

This paper proposes a new active learning method for semantic segmentation. The core of our method lies in a new annotation query design. It samples informative local image regions (e.g., superpixels), and for each of such regions, asks an…

Learning Unified Distance Metric Across Diverse Data Distributions with Parameter-Efficient Transfer Learning Open

Sung‐Yeon Kim, Donghyun Kim, Suha Kwak · 2023

A common practice in metric learning is to train and test an embedding model for each dataset. This dataset-specific approach fails to simulate real-world scenarios that involve multiple heterogeneous distributions of data. In this regard,…

Shatter and Gather: Learning Referring Image Segmentation with Text Supervision Open

Dong-Won Kim, Namyup Kim, Cuiling Lan, Suha Kwak · 2023

Referring image segmentation, the task of segmenting any arbitrary entities described in free-form texts, opens up a variety of vision applications. However, manual labeling of training data for this task is prohibitively costly, leading t…

SYNAuG: Exploiting Synthetic Data for Data Imbalance Problems Open

Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Nayeong Kim, Suha Kwak , et al. · 2023

Data imbalance in training data often leads to biased predictions from trained models, which in turn causes ethical and social issues. A straightforward solution is to carefully curate training data, but given the enormous scale of modern …

PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization Open

Junhyeong Cho, Gilhyun Nam, Sungyeon Kim, Hunmin Yang, Suha Kwak · 2023

In a joint vision-language space, a text feature (e.g., from "a photo of a dog") could effectively represent its relevant image features (e.g., from dog photos). Also, a recent study has demonstrated the cross-modal transferability phenome…

Adaptive Superpixel for Active Learning in Semantic Segmentation Open

Hoyoung Kim, Minhyeon Oh, Sehyun Hwang, Suha Kwak, Jungseul Ok · 2023

Learning semantic segmentation requires pixel-wise annotations, which can be time-consuming and expensive. To reduce the annotation cost, we propose a superpixel-based active learning (AL) framework, which collects a dominant label per sup…

Human Pose Estimation in Extremely Low-Light Conditions Open

SoHyun Lee, Jaesung Rim, Boseung Jeong, Geonu Kim, Byungju Woo , et al. · 2023

We study human pose estimation in extremely low-light images. This task is challenging due to the difficulty of collecting real low-light images with accurate labels, and severely corrupted inputs that degrade prediction quality significan…

HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization Open

Sung‐Yeon Kim, Boseung Jung, Suha Kwak · 2022

Supervision for metric learning has long been given in the form of equivalence between human-labeled classes. Although this type of supervision has been a basis of metric learning for decades, we argue that it hinders further advances in t…

Learning to Detect Semantic Boundaries with Image-level Class Labels Open

Namyup Kim, Sehyun Hwang, Suha Kwak · 2022

This paper presents the first attempt to learn semantic boundary detection using image-level class labels as supervision. Our method starts by estimating coarse areas of object classes through attentions drawn by an image classification ne…

Improving Cross-Modal Retrieval with Set of Diverse Embeddings Open

Dong-Won Kim, Namyup Kim, Suha Kwak · 2022

Cross-modal retrieval across image and text modalities is a challenging task due to its inherent ambiguity: An image often exhibits various situations, and a caption can be coupled with diverse images. Set-based embedding has been studied …

Cross-Domain Ensemble Distillation for Domain Generalization Open

kyungmoon lee, Sung‐Yeon Kim, Suha Kwak · 2022

Domain generalization is the task of learning models that generalize to unseen target domains. We propose a simple yet effective method for domain generalization, named cross-domain ensemble distillation (XDED), that learns domain-invarian…

Suha Kwak YOU? Author Swipe