Junmo Kim
YOU?
Author Swipe
View article: Prior knowledge of layer-specific pruning numbers guarantees effective random pruning at initialization
Prior knowledge of layer-specific pruning numbers guarantees effective random pruning at initialization Open
Several pruning methods prune a neural network at initialization. These methods carefully determine the importance of each weight, retaining only the important ones and pruning the others. However, subsequent studies have shown that random…
View article: Preference Distillation via Value based Reinforcement Learning
Preference Distillation via Value based Reinforcement Learning Open
Direct Preference Optimization (DPO) is a powerful paradigm to align language models with human preferences using pairwise comparisons. However, its binary win-or-loss supervision often proves insufficient for training small models with li…
View article: Comparison Reveals Commonality: Customized Image Generation through Contrastive Inversion
Comparison Reveals Commonality: Customized Image Generation through Contrastive Inversion Open
The recent demand for customized image generation raises a need for techniques that effectively extract the common concept from small sets of images. Existing methods typically rely on additional guidance, such as text prompts or spatial m…
View article: DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization
DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization Open
Diffusion models have achieved remarkable success in image generation but come with significant computational costs, posing challenges for deployment in resource-constrained environments. Recent post-training quantization (PTQ) methods hav…
View article: FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition
FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition Open
Large-scale ASR models have achieved remarkable gains in accuracy and robustness. However, fairness issues remain largely unaddressed despite their critical importance in real-world applications. In this work, we introduce FairASR, a syste…
View article: InfiniteAudio: Infinite-Length Audio Generation with Consistency
InfiniteAudio: Infinite-Length Audio Generation with Consistency Open
This paper presents InfiniteAudio, a simple yet effective strategy for generating infinite-length audio using diffusion-based text-to-audio methods. Current approaches face memory constraints because the output size increases with input le…
View article: PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion
PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion Open
Video dataset condensation has emerged as a critical technique for addressing the computational challenges associated with large-scale video data processing in deep learning applications. While significant progress has been made in image d…
View article: DAM: Domain-Aware Module for Multi-Domain Dataset Condensation
DAM: Domain-Aware Module for Multi-Domain Dataset Condensation Open
Dataset Condensation (DC) has emerged as a promising solution to mitigate the computational and storage burdens associated with training deep learning models. However, existing DC methods largely overlook the multi-domain nature of modern …
View article: Enhancing self-supervised visual representation learning through adversarially generated examples
Enhancing self-supervised visual representation learning through adversarially generated examples Open
Self-supervised learning has emerged as a powerful paradigm for leveraging unlabeled data to learn rich feature representations. However, the efficacy of self-supervised models is often limited by the degree and complexity of the augmentat…
View article: SFLD: Reducing the content bias for AI-generated Image Detection
SFLD: Reducing the content bias for AI-generated Image Detection Open
Identifying AI-generated content is critical for the safe and ethical use of generative AI. Recent research has focused on developing detectors that generalize to unknown generators, with popular methods relying either on high-level featur…
View article: Instruct-4DGS: Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation
Instruct-4DGS: Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation Open
Recent 4D dynamic scene editing methods require editing thousands of 2D images used for dynamic scene synthesis and updating the entire scene with additional training loops, resulting in several hours of processing to edit a single dynamic…
View article: Tailored Channel Pruning: Achieve Targeted Model Complexity Through Adaptive Sparsity Regularization
Tailored Channel Pruning: Achieve Targeted Model Complexity Through Adaptive Sparsity Regularization Open
In deep learning, the size and complexity of neural networks have been rapidly increased to achieve higher performance. However, this poses a challenge when utilized in resource-limited environments, such as mobile devices, particularly wh…
View article: Lacticaseibacillus casei IDCC 3451 alleviates cognitive and behavioral functions by reshaping the gut microbiome and regulating intestinal barrier integrity in chronic stress animal models
Lacticaseibacillus casei IDCC 3451 alleviates cognitive and behavioral functions by reshaping the gut microbiome and regulating intestinal barrier integrity in chronic stress animal models Open
Lacticaseibacillus casei IDCC 3451 (3451) was evaluated for its effects on the gut-brain axis using Caenorhabditis elegans (C. elegans) and mouse models of stress and inflammation. In C. elegans, 3451 extended lifespans by 25 %, improved m…
View article: Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance
Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance Open
Masked generative models (MGMs) have shown impressive generative ability while providing an order of magnitude efficient sampling steps compared to continuous diffusion models. However, MGMs still underperform in image synthesis compared t…
View article: StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for Large Language Models
StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for Large Language Models Open
Finding appropriate prompts for the specific task has become an important issue as the usage of Large Language Models (LLM) has expanded. Reinforcement Learning (RL) is widely used for prompt tuning, but its inherent instability and enviro…
View article: Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality Open
In this paper, we propose a new method to enhance compositional understanding in pre-trained vision and language models (VLMs) without sacrificing performance in zero-shot multi-modal tasks. Traditional fine-tuning approaches often improve…
View article: Pretrained Patient Trajectories for Adverse Drug Event Prediction Using Common Data Model-based Electronic Health Records
Pretrained Patient Trajectories for Adverse Drug Event Prediction Using Common Data Model-based Electronic Health Records Open
Background Pretraining electronic health record (EHR) data using language models by treating patient trajectories as natural language sentences has enhanced performance across various medical tasks. However, EHR pretraining models have nev…
View article: Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis
Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis Open
Generative diffusion models have emerged as a powerful tool for high-quality image synthesis, yet their iterative nature demands significant computational resources. This paper proposes an efficient time step sampling method based on an im…
View article: AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning
AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning Open
In recent years, advancements in representation learning and language models have propelled Automated Captioning (AC) to new heights, enabling the generation of human-level descriptions. Leveraging these advancements, we propose AVCap, an …
View article: Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition
Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition Open
Vision and language models (VLMs) such as CLIP have showcased remarkable zero-shot recognition abilities yet face challenges in visio-linguistic compositionality, particularly in linguistic comprehension and fine-grained image-text alignme…
View article: Towards Understanding Dual BN In Hybrid Adversarial Training
Towards Understanding Dual BN In Hybrid Adversarial Training Open
There is a growing concern about applying batch normalization (BN) in adversarial training (AT), especially when the model is trained on both adversarial samples and clean samples (termed Hybrid-AT). With the assumption that adversarial an…
View article: ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object Open
We establish rigorous benchmarks for visual perception robustness. Synthetic images such as ImageNet-C, ImageNet-9, and Stylized ImageNet provide specific type of evaluation over synthetic corruptions, backgrounds, and textures, yet those …
View article: FRED: Towards a Full Rotation-Equivariance in Aerial Image Object Detection
FRED: Towards a Full Rotation-Equivariance in Aerial Image Object Detection Open
Rotation-equivariance is an essential yet challenging property in oriented object detection. While general object detectors naturally leverage robustness to spatial shifts due to the translation-equivariance of the conventional CNNs, achie…
View article: Modeling Stereo-Confidence out of the End-to-End Stereo-Matching Network via Disparity Plane Sweep
Modeling Stereo-Confidence out of the End-to-End Stereo-Matching Network via Disparity Plane Sweep Open
We propose a novel stereo-confidence that can be measured externally to various stereo-matching networks, offering an alternative input modality choice of the cost volume for learning-based approaches, especially in safety-critical systems…
View article: Foreseeing Reconstruction Quality of Gradient Inversion: An Optimization Perspective
Foreseeing Reconstruction Quality of Gradient Inversion: An Optimization Perspective Open
Gradient inversion attacks can leak data privacy when clients share weight updates with the server in federated learning (FL). Existing studies mainly use L2 or cosine distance as the loss function for gradient matching in the attack. Our …
View article: EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning Open
Recent advancements in self-supervised audio-visual representation learning have demonstrated its potential to capture rich and comprehensive representations. However, despite the advantages of data augmentation verified in many learning m…
View article: Modeling Stereo-Confidence Out of the End-to-End Stereo-Matching Network via Disparity Plane Sweep
Modeling Stereo-Confidence Out of the End-to-End Stereo-Matching Network via Disparity Plane Sweep Open
We propose a novel stereo-confidence that can be measured externally to various stereo-matching networks, offering an alternative input modality choice of the cost volume for learning-based approaches, especially in safety-critical systems…
View article: Stereo-Matching Knowledge Distilled Monocular Depth Estimation Filtered by Multiple Disparity Consistency
Stereo-Matching Knowledge Distilled Monocular Depth Estimation Filtered by Multiple Disparity Consistency Open
In stereo-matching knowledge distillation methods of the self-supervised monocular depth estimation, the stereo-matching network's knowledge is distilled into a monocular depth network through pseudo-depth maps. In these methods, the learn…