Zheng-Jun Zha
YOU?
Author Swipe
View article: Geometry-Aware Enhancement and Data Augmentation for Street-to-Satellite Geo-localization
Geometry-Aware Enhancement and Data Augmentation for Street-to-Satellite Geo-localization Open
View article: SafeCFG: Controlling Harmful Features with Dynamic Safe Guidance for Safe Generation
SafeCFG: Controlling Harmful Features with Dynamic Safe Guidance for Safe Generation Open
View article: VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs
VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs Open
Multimodal Large Language Models (MLLMs) encounter significant computational and memory bottlenecks from the massive number of visual tokens generated by high-resolution images or multi-image inputs. Previous token compression techniques a…
View article: AIM 2025 Challenge on High FPS Motion Deblurring: Methods and Results
AIM 2025 Challenge on High FPS Motion Deblurring: Methods and Results Open
This paper presents a comprehensive review of the AIM 2025 High FPS Non-Uniform Motion Deblurring Challenge, highlighting the proposed solutions and final results. The objective of this challenge is to identify effective networks capable o…
View article: AIM 2025 challenge on Inverse Tone Mapping Report: Methods and Results
AIM 2025 challenge on Inverse Tone Mapping Report: Methods and Results Open
This paper presents a comprehensive review of the AIM 2025 Challenge on Inverse Tone Mapping (ITM). The challenge aimed to push forward the development of effective ITM algorithms for HDR image reconstruction from single LDR inputs, focusi…
View article: Likelihood-Aware Semantic Alignment for Full-Spectrum Out-of-Distribution Detection
Likelihood-Aware Semantic Alignment for Full-Spectrum Out-of-Distribution Detection Open
Full-spectrum out-of-distribution (F-OOD) detection aims to accurately recognize in-distribution (ID) samples while encountering semantic and covariate shifts simultaneously. However, existing out-of-distribution (OOD) detectors tend to ov…
View article: Fractional Spike Differential Equations Neural Network with Efficient Adjoint Parameters Training
Fractional Spike Differential Equations Neural Network with Efficient Adjoint Parameters Training Open
Spiking Neural Networks (SNNs) draw inspiration from biological neurons to create realistic models for brain-like computation, demonstrating effectiveness in processing temporal information with energy efficiency and biological realism. Mo…
View article: NTIRE 2025 Image Shadow Removal Challenge Report
NTIRE 2025 Image Shadow Removal Challenge Report Open
This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editio…
View article: Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay
Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay Open
Current image de-raining methods primarily learn from a limited dataset, leading to inadequate performance in varied real-world rainy conditions. To tackle this, we introduce a new framework that enables networks to progressively expand th…
View article: PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation
PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation Open
Continual Test-Time Adaptation (CTTA) aims to online adapt a pre-trained model to changing environments during inference. Most existing methods focus on exploiting target data, while overlooking another crucial source of information, the p…
View article: Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning
Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning Open
Continual test-time adaptive object detection (CTTA-OD) aims to online adapt a source pre-trained detector to ever-changing environments during inference under continuous domain shifts. Most existing CTTA-OD methods prioritize effectivenes…
View article: NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution
NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution Open
This paper reviews the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Restoration and Super-Resolution could be essential in modern Image Signal Process…
View article: Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning Open
The rapid spread of multimodal misinformation on social media has raised growing concerns, while research on video misinformation detection remains limited due to the lack of large-scale, diverse datasets. Existing methods often overfit to…
View article: A Lottery Ticket Hypothesis Approach with Sparse Fine-tuning and MAE for Image Forgery Detection and Localization
A Lottery Ticket Hypothesis Approach with Sparse Fine-tuning and MAE for Image Forgery Detection and Localization Open
The rise in sophisticated image forgery techniques, driven by advancements in image editing and generation, has posed new security challenges. Traditional methods, designed for specific tampering artifacts, struggle with out-of-distributio…
View article: SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation Open
The iterative sampling procedure employed by diffusion models (DMs) often leads to significant latency. To address this, we propose Stochastic Consistency Distillation (SCott) to enable accelerated text-to-image generation, where high-qual…
View article: Boosting Image De-Raining via Central-Surrounding Synergistic Convolution
Boosting Image De-Raining via Central-Surrounding Synergistic Convolution Open
Rainy images suffer from quality degradation due to the synergistic effect of rain streaks and accumulation. The rain streaks are anisotropic and show a specific directional arrangement, while the rain accumulation is isotropic and shows a…
View article: EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction
EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction Open
Leveraging its robust linear global modeling capability, Mamba has notably excelled in computer vision. Despite its success, existing Mamba-based vision models have overlooked the nuances of event-driven tasks, especially in video reconstr…
View article: HOIMamba: Efficient Mamba-based Disentangled Progressive Learning for HOI Detection
HOIMamba: Efficient Mamba-based Disentangled Progressive Learning for HOI Detection Open
Human-object interaction (HOI) detection aims to detect the spatial positions of human-object pairs and recognize their interactions. Existing single-branch, two-branch, and three-branch methods are challenging to make an appropriate trade…
View article: DCTMamba: Advancing JPEG Image Restoration Through Long-Sequence Modeling and Adaptive Frequency Strategy
DCTMamba: Advancing JPEG Image Restoration Through Long-Sequence Modeling and Adaptive Frequency Strategy Open
Despite the advanced long-sequence modeling of Mamba, which has expanded its applications in image restoration, there remains a lack of exploration combining its strengths with the specific characteristics of JPEG image restoration, where …
View article: SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets
SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets Open
3D human digitization has long been a highly pursued yet challenging task. Existing methods aim to generate high-quality 3D digital humans from single or multiple views, but remain primarily constrained by current paradigms and the scarcit…
View article: EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction
EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction Open
Leveraging its robust linear global modeling capability, Mamba has notably excelled in computer vision. Despite its success, existing Mamba-based vision models have overlooked the nuances of event-driven tasks, especially in video reconstr…
View article: HERO: Human Reaction Generation from Videos
HERO: Human Reaction Generation from Videos Open
Human reaction generation represents a significant research domain for interactive AI, as humans constantly interact with their surroundings. Previous works focus mainly on synthesizing the reactive motion given a human motion sequence. Th…
View article: WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat Open
Existing multimodal generative models fall short as qualified design copilots, as they often struggle to generate imaginative outputs once instructions are less detailed or lack the ability to maintain consistency with the provided referen…
View article: PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments
PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments Open
Recent advancements in autonomous driving perception have revealed exceptional capabilities within structured environments dominated by vehicular traffic. However, current perception models exhibit significant limitations in semi-structure…
View article: Dexmedetomidine ameliorates hepatic ischemia reperfusion injury via modulating SIRT3 mediated mitochondrial quality control
Dexmedetomidine ameliorates hepatic ischemia reperfusion injury via modulating SIRT3 mediated mitochondrial quality control Open
Ischaemia-reperfusion (IR) damage is an inevitable adverse effect of liver surgery. Recent research has found that IR damage is involved in severe mitochondrial dysfunction. Mitochondrial biosynthesis and dynamics control mitochondrial mas…
View article: Optimizing Large Language Model Training Using FP4 Quantization
Optimizing Large Language Model Training Using FP4 Quantization Open
The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 prec…
View article: Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration
Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration Open
Image restoration aims to recover details and enhance contrast in degraded images. With the growing demand for high-quality imaging (\textit{e.g.}, 4K and 8K), achieving a balance between restoration quality and computational efficiency ha…
View article: VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization
VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization Open
Video colorization aims to transform grayscale videos into vivid color representations while maintaining temporal consistency and structural integrity. Existing video colorization methods often suffer from color bleeding and lack comprehen…
View article: Non-causal selective state space model for image restoration
Non-causal selective state space model for image restoration Open
View article: RAIN: Real-time Animation of Infinite Video Stream
RAIN: Real-time Animation of Infinite Video Stream Open
Live animation has gained immense popularity for enhancing online engagement, yet achieving high-quality, real-time, and stable animation with diffusion models remains challenging, especially on consumer-grade GPUs. Existing methods strugg…