Wei-Chen Chiu
YOU?
Author Swipe
View article: Controllable Collision Scenario Generation via Collision Pattern Prediction
Controllable Collision Scenario Generation via Collision Pattern Prediction Open
Evaluating the safety of autonomous vehicles (AVs) requires diverse, safety-critical scenarios, with collisions being especially important yet rare and unsafe to collect in the real world. Therefore, the community has been focusing on gene…
View article: RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network
RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network Open
This paper presents a groundbreaking approach - the first online automatic geometric calibration method for radar and camera systems. Given the significant data sparsity and measurement uncertainty in radar height data, achieving automatic…
View article: Boosting Diffusion Guidance via Learning Degradation-Aware Models for Blind Super Resolution
Boosting Diffusion Guidance via Learning Degradation-Aware Models for Blind Super Resolution Open
Recently, diffusion-based blind super-resolution (SR) methods have shown great ability to generate high-resolution images with abundant high-frequency detail, but the detail is often achieved at the expense of fidelity. Meanwhile, another …
View article: Exemplar Masking for Multimodal Incremental Learning
Exemplar Masking for Multimodal Incremental Learning Open
Multimodal incremental learning needs to digest the information from multiple modalities while concurrently learning new knowledge without forgetting the previously learned information. There are numerous challenges for this task, mainly i…
View article: In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models Open
Text-to-image (T2I) models have shown remarkable progress, but their potential to generate harmful content remains a critical concern in the ML community. While various safety mechanisms have been developed, the field lacks systematic tool…
View article: T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition
T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition Open
To address the risks of encountering inappropriate or harmful content, researchers managed to incorporate several harmful contents datasets with machine learning methods to detect harmful concepts. However, existing harmful datasets are cu…
View article: Two Heads Better Than One: Dual Degradation Representation for Blind Super-Resolution
Two Heads Better Than One: Dual Degradation Representation for Blind Super-Resolution Open
Previous methods have demonstrated remarkable performance in single image super-resolution (SISR) tasks with known and fixed degradation (e.g., bicubic downsampling). However, when the actual degradation deviates from these assumptions, th…
View article: Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis
Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis Open
How can balance be quantified in game settings? This question is crucial for game designers, especially in player-versus-player (PvP) games, where analyzing the strength relations among predefined team compositions-such as hero combination…
View article: Perceptual Similarity for Measuring Decision-Making Style and Policy Diversity in Games
Perceptual Similarity for Measuring Decision-Making Style and Policy Diversity in Games Open
Defining and measuring decision-making styles, also known as playstyles, is crucial in gaming, where these styles reflect a broad spectrum of individuality and diversity. However, finding a universally applicable measure for these styles p…
View article: A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting
A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting Open
Class agnostic counting (CAC) is a vision task that can be used to count the total occurrence number of any given reference objects in the query image. The task is usually formulated as a density map estimation problem through similarity c…
View article: MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes Open
Recent advancements in post-hoc and inherently interpretable methods have markedly enhanced the explanations of black box classifier models. These methods operate either through post-analysis or by integrating concept learning during model…
View article: Improving Robustness for Joint Optimization of Camera Pose and Decomposed Low-Rank Tensorial Radiance Fields
Improving Robustness for Joint Optimization of Camera Pose and Decomposed Low-Rank Tensorial Radiance Fields Open
In this paper, we propose an algorithm that allows joint refinement of camera pose and scene geometry represented by decomposed low-rank tensor, using only 2D images as supervision. First, we conduct a pilot study based on a 1D signal and …
View article: Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields
Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields Open
In this paper, we propose an algorithm that allows joint refinement of camera pose and scene geometry represented by decomposed low-rank tensor, using only 2D images as supervision. First, we conduct a pilot study based on a 1D signal and …
View article: Two Heads Better than One: Dual Degradation Representation for Blind Super-Resolution
Two Heads Better than One: Dual Degradation Representation for Blind Super-Resolution Open
Previous methods have demonstrated remarkable performance in single image super-resolution (SISR) tasks with known and fixed degradation (e.g., bicubic downsampling). However, when the actual degradation deviates from these assumptions, th…
View article: AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors
AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors Open
Deep generative models can create remarkably photorealistic fake images while raising concerns about misinformation and copyright infringement, known as deepfake threats. Deepfake detection technique is developed to distinguish between rea…
View article: Skin the sheep not only once: Reusing Various Depth Datasets to Drive the Learning of Optical Flow
Skin the sheep not only once: Reusing Various Depth Datasets to Drive the Learning of Optical Flow Open
Optical flow estimation is crucial for various applications in vision and robotics. As the difficulty of collecting ground truth optical flow in real-world scenarios, most of the existing methods of learning optical flow still adopt synthe…
View article: MENTOR: Multilingual Text Detection Toward Learning by Analogy
MENTOR: Multilingual Text Detection Toward Learning by Analogy Open
Text detection is frequently used in vision-based mobile robots when they\nneed to interpret texts in their surroundings to perform a given task. For\ninstance, delivery robots in multilingual cities need to be capable of doing\nmultilingu…
View article: Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where
Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where Open
While image data starts to enjoy the simple-but-effective self-supervised learning scheme built upon masking and self-reconstruction objective thanks to the introduction of tokenization procedure and vision transformer backbone, convolutio…
View article: Transformer-based Image Compression with Variable Image Quality Objectives
Transformer-based Image Compression with Variable Image Quality Objectives Open
This paper presents a Transformer-based image compression system that allows for a variable image quality objective according to the user's preference. Optimizing a learned codec for different quality objectives leads to reconstructed imag…
View article: Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts Open
Text-to-image diffusion models, e.g. Stable Diffusion (SD), lately have shown remarkable ability in high-quality content generation, and become one of the representatives for the recent wave of transformative AI. Nevertheless, such advance…
View article: Scalable Spatial Memory for Scene Rendering and Navigation
Scalable Spatial Memory for Scene Rendering and Navigation Open
Neural scene representation and rendering methods have shown promise in learning the implicit form of scene structure without supervision. However, the implicit representation learned in most existing methods is non-expandable and cannot b…
View article: TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception
TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception Open
This work aims for transferring a Transformer-based image compression codec from human perception to machine perception without fine-tuning the codec. We propose a transferable Transformer-based image compression framework, termed TransTIC…
View article: Transformer-based Variable-rate Image Compression with Region-of-interest Control
Transformer-based Variable-rate Image Compression with Region-of-interest Control Open
This paper proposes a transformer-based learned image compression system. It is capable of achieving variable-rate compression with a single model while supporting the region-of-interest (ROI) functionality. Inspired by prompt tuning, we i…
View article: Multimodal Prompting with Missing Modalities for Visual Recognition
Multimodal Prompting with Missing Modalities for Visual Recognition Open
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to f…
View article: Mitigating Forgetting in Online Continual Learning via Contrasting Semantically Distinct Augmentations
Mitigating Forgetting in Online Continual Learning via Contrasting Semantically Distinct Augmentations Open
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one, under the constraints of having limited system size and computational c…
View article: RPG: Learning Recursive Point Cloud Generation
RPG: Learning Recursive Point Cloud Generation Open
In this paper we propose a novel point cloud generator that is able to reconstruct and generate 3D point clouds composed of semantic parts. Given a latent representation of the target 3D model, the generation starts from a single point and…
View article: 3D-PL: Domain Adaptive Depth Estimation with 3D-aware Pseudo-Labeling
3D-PL: Domain Adaptive Depth Estimation with 3D-aware Pseudo-Labeling Open
For monocular depth estimation, acquiring ground truths for real data is not easy, and thus domain adaptation methods are commonly adopted using the supervised synthetic data. However, this may still incur a large domain gap due to the lac…
View article: Adaptively-Realistic Image Generation from Stroke and Sketch with Diffusion Model
Adaptively-Realistic Image Generation from Stroke and Sketch with Diffusion Model Open
Generating images from hand-drawings is a crucial and fundamental task in content creation. The translation is difficult as there exist infinite possibilities and the different users usually expect different outcomes. Therefore, we propose…
View article: Vector Quantized Image-to-Image Translation
Vector Quantized Image-to-Image Translation Open
Current image-to-image translation methods formulate the task with conditional generation models, leading to learning only the recolorization or regional changes as being constrained by the rich structural information provided by the condi…