Bumsub Ham
YOU?
Author Swipe
View article: Efficient Few-Shot Neural Architecture Search by Counting the Number of Nonlinear Functions
Efficient Few-Shot Neural Architecture Search by Counting the Number of Nonlinear Functions Open
Neural architecture search (NAS) enables finding the best-performing architecture from a search space automatically. Most NAS methods exploit an over-parameterized network (i.e., a supernet) containing all possible architectures (i.e., sub…
View article: Maximizing the Position Embedding for Vision Transformers with Global Average Pooling
Maximizing the Position Embedding for Vision Transformers with Global Average Pooling Open
In vision transformers, position embedding (PE) plays a crucial role in capturing the order of tokens. However, in vision transformer structures, there is a limitation in the expressiveness of PE due to the structure where position embeddi…
View article: Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
Subnet-Aware Dynamic Supernet Training for Neural Architecture Search Open
N-shot neural architecture search (NAS) exploits a supernet containing all candidate subnets for a given search space. The subnets are typically trained with a static training strategy (e.g., using the same learning rate (LR) scheduler and…
View article: ELITE: Enhanced Language-Image Toxicity Evaluation for Safety
ELITE: Enhanced Language-Image Toxicity Evaluation for Safety Open
Current Vision Language Models (VLMs) remain vulnerable to malicious prompts that induce harmful outputs. Existing safety benchmarks for VLMs primarily rely on automated evaluation methods, but these methods struggle to detect implicit har…
View article: Maximizing the Position Embedding for Vision Transformers with Global Average Pooling
Maximizing the Position Embedding for Vision Transformers with Global Average Pooling Open
In vision transformers, position embedding (PE) plays a crucial role in capturing the order of tokens. However, in vision transformer structures, there is a limitation in the expressiveness of PE due to the structure where position embeddi…
View article: Efficient Few-Shot Neural Architecture Search by Counting the Number of Nonlinear Functions
Efficient Few-Shot Neural Architecture Search by Counting the Number of Nonlinear Functions Open
Neural architecture search (NAS) enables finding the best-performing architecture from a search space automatically. Most NAS methods exploit an over-parameterized network (i.e., a supernet) containing all possible architectures (i.e., sub…
View article: Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients
Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients Open
Network quantization generally converts full-precision weights and/or activations into low-bit fixed-point values in order to accelerate an inference process. Recent approaches to network quantization further discretize the gradients into …
View article: FYI: Flip Your Images for Dataset Distillation
FYI: Flip Your Images for Dataset Distillation Open
Dataset distillation synthesizes a small set of images from a large-scale real dataset such that synthetic and real images share similar behavioral properties (e.g, distributions of gradients or features) during a training process. Through…
View article: Scheduling Weight Transitions for Quantization-Aware Training
Scheduling Weight Transitions for Quantization-Aware Training Open
Quantization-aware training (QAT) simulates a quantization process during training to lower bit-precision of weights/activations. It learns quantized weights indirectly by updating latent weights,i.e., full-precision inputs to a quantizer,…
View article: Instance-Aware Group Quantization for Vision Transformers
Instance-Aware Group Quantization for Vision Transformers Open
Post-training quantization (PTQ) is an efficient model compression technique that quantizes a pretrained full-precision model using only a small calibration set of unlabeled samples without retraining. PTQ methods for convolutional neural …
View article: RankMixup: Ranking-Based Mixup Training for Network Calibration
RankMixup: Ranking-Based Mixup Training for Network Calibration Open
Network calibration aims to accurately estimate the level of confidences, which is particularly important for employing deep neural networks in real-world systems. Recent approaches leverage mixup to calibrate the network's predictions dur…
View article: Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification
Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification Open
We present a novel unsupervised domain adaption method for person re-identification (reID) that generalizes a model trained on a labeled source domain to an unlabeled target domain. We introduce a camera-driven curriculum learning (CaCL) f…
View article: ACLS: Adaptive and Conditional Label Smoothing for Network Calibration
ACLS: Adaptive and Conditional Label Smoothing for Network Calibration Open
We address the problem of network calibration adjusting miscalibrated confidences of deep neural networks. Many approaches to network calibration adopt a regularization-based method that exploits a regularization term to smooth the miscali…
View article: ALIFE: Adaptive Logit Regularizer and Feature Replay for Incremental Semantic Segmentation
ALIFE: Adaptive Logit Regularizer and Feature Replay for Incremental Semantic Segmentation Open
We address the problem of incremental semantic segmentation (ISS) recognizing novel object/stuff categories continually without forgetting previous ones that have been learned. The catastrophic forgetting problem is particularly severe in …
View article: Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation
Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation Open
Class-incremental semantic segmentation (CISS) labels each pixel of an image with a corresponding object/stuff class continually. To this end, it is crucial to learn novel classes incrementally without forgetting previously learned knowled…
View article: Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation
Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation Open
We present a novel unsupervised domain adaptation method for semantic segmentation that generalizes a model trained with source images and corresponding ground-truth labels to a target domain. A key to domain adaptive semantic segmentation…
View article: OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search
OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search Open
We address the task of person search, that is, localizing and re-identifying query persons from a set of raw scene images. Recent approaches are typically built upon OIMNet, a pioneer work on person search, that learns joint person represe…
View article: Disentangled Representations for Short-Term and Long-Term Person Re-Identification
Disentangled Representations for Short-Term and Long-Term Person Re-Identification Open
We address the problem of person re-identification (reID), that is, retrieving person images from a large dataset, given a query image of the person of interest. A key challenge is to learn person representations robust to intra-class vari…
View article: Video-based Person Re-identification with Spatial and Temporal Memory Networks
Video-based Person Re-identification with Spatial and Temporal Memory Networks Open
Video-based person re-identification (reID) aims to retrieve person videos with the same identity as a query person across multiple cameras. Spatial and temporal distractors in person videos, such as background clutter and partial occlusio…
View article: Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences
Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences Open
We address the problem of visible-infrared person re-identification (VI-reID), that is, retrieving a set of person images, captured by visible or infrared cameras, in a cross-modal setting. Two main challenges in VI-reID are intra-class va…
View article: Distance-aware Quantization
Distance-aware Quantization Open
We address the problem of network quantization, that is, reducing bit-widths of weights and/or activations to lighten network architectures. Quantization methods use a rounding function to map full-precision values to the nearest quantized…
View article: Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation Open
We address the problem of generalized zero-shot semantic segmentation (GZS3) predicting pixel-wise semantic labels for seen and unseen classes. Most GZS3 methods adopt a generative approach that synthesizes visual features of unseen classe…
View article: Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation
Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation Open
We address the problem of weakly-supervised semantic segmentation (WSSS) using bounding box annotations. Although object bounding boxes are good indicators to segment corresponding objects, they do not specify object boundaries, making it …
View article: HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection
HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection Open
We address the problem of 3D object detection, that is, estimating 3D object bounding boxes from point clouds. 3D object detection methods exploit either voxel-based or point-based features to represent 3D objects in a scene. Voxel-based f…
View article: Network Quantization with Element-wise Gradient Scaling
Network Quantization with Element-wise Gradient Scaling Open
Network quantization aims at reducing bit-widths of weights and/or activations, particularly important for implementing deep neural networks with limited hardware resources. Most methods use the straight-through estimator (STE) to train qu…
View article: Learning Semantic Correspondence Exploiting an Object-Level Prior
Learning Semantic Correspondence Exploiting an Object-Level Prior Open
We address the problem of semantic correspondence, that is, establishing a dense flow field between images depicting different instances of the same object or scene category. We propose to use images annotated with binary foreground masks …