Explanipedia

Geometry-Aware Enhancement and Data Augmentation for Street-to-Satellite Geo-localization Open

Xingbo Wang, Yongchao Xu, Yuanfei Bao, Xueyang Fu, Zheng-Jun Zha · 2025

SafeCFG: Controlling Harmful Features with Dynamic Safe Guidance for Safe Generation Open

Jiadong Pan, Liang Li, H. Gao, Zheng-Jun Zha, Qingming Huang , et al. · 2025

VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs Open

Junjie Zhu, Yurui Zhu, Xin Lü, Wenrui Yan, Dong Li , et al. · 2025

Multimodal Large Language Models (MLLMs) encounter significant computational and memory bottlenecks from the massive number of visual tokens generated by high-resolution images or multi-image inputs. Previous token compression techniques a…

AIM 2025 Challenge on High FPS Motion Deblurring: Methods and Results Open

George Ciubotariu, Florin-Alexandru Vasluianu, Zhuyun Zhou, Nancy Mehta, Radu Timofte , et al. · 2025

This paper presents a comprehensive review of the AIM 2025 High FPS Non-Uniform Motion Deblurring Challenge, highlighting the proposed solutions and final results. The objective of this challenge is to identify effective networks capable o…

AIM 2025 challenge on Inverse Tone Mapping Report: Methods and Results Open

Chao Wang, Francesco Banterle, Bin Ren, Radu Timofte, Xin Lu , et al. · 2025

This paper presents a comprehensive review of the AIM 2025 Challenge on Inverse Tone Mapping (ITM). The challenge aimed to push forward the development of effective ITM algorithms for HDR image reconstruction from single LDR inputs, focusi…

Likelihood-Aware Semantic Alignment for Full-Spectrum Out-of-Distribution Detection Open

Lu Fan, Kai Zhu, Wei Zhai, Yang Cao, Zheng-Jun Zha · 2025

Full-spectrum out-of-distribution (F-OOD) detection aims to accurately recognize in-distribution (ID) samples while encountering semantic and covariate shifts simultaneously. However, existing out-of-distribution (OOD) detectors tend to ov…

Fractional Spike Differential Equations Neural Network with Efficient Adjoint Parameters Training Open

Chengjie Ge, Yufeng Peng, Xueyang Fu, Qiyu Kang, Xuhao Li , et al. · 2025

Spiking Neural Networks (SNNs) draw inspiration from biological neurons to create realistic models for brain-like computation, demonstrating effectiveness in processing temporal information with energy efficiency and biological realism. Mo…

NTIRE 2025 Image Shadow Removal Challenge Report Open

Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Cailian Chen, Zongwei Wu , et al. · 2025

This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editio…

Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay Open

Kunyu Wang, Xueyang Fu, Chengzhi Cao, Chengjie Ge, Wei Zhai , et al. · 2025

Current image de-raining methods primarily learn from a limited dataset, leading to inadequate performance in varied real-world rainy conditions. To tackle this, we introduce a new framework that enables networks to progressively expand th…

PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation Open

Kunyu Wang, Xueyang Fu, Yīmíng Bào, Chengjie Ge, Chengzhi Cao , et al. · 2025

Continual Test-Time Adaptation (CTTA) aims to online adapt a pre-trained model to changing environments during inference. Most existing methods focus on exploiting target data, while overlooking another crucial source of information, the p…

Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning Open

Kunyu Wang, Xueyang Fu, Xinzheng Lu, Chengjie Ge, Chengzhi Cao , et al. · 2025

Continual test-time adaptive object detection (CTTA-OD) aims to online adapt a source pre-trained detector to ever-changing environments during inference under continuous domain shifts. Most existing CTTA-OD methods prioritize effectivenes…

NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution Open

Marcos V. Conde, Radu Timofte, Zhihong Lu, Xiang‐Yu Kong, Xiaoxia Xing , et al. · 2025

This paper reviews the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Restoration and Super-Resolution could be essential in modern Image Signal Process…

Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning Open

Fanrui Zhang, Dian Li, Qiang Zhang, Jun Chen, Gang Liu , et al. · 2025

The rapid spread of multimodal misinformation on social media has raised growing concerns, while research on video misinformation detection remains limited due to the lack of large-scale, diverse datasets. Existing methods often overfit to…

A Lottery Ticket Hypothesis Approach with Sparse Fine-tuning and MAE for Image Forgery Detection and Localization Open

Jiayíng Zhu, Li Dong, Xueyang Fu, Gege Shi, Jie Xiao , et al. · 2025

The rise in sophisticated image forgery techniques, driven by advancements in image editing and generation, has posed new security challenges. Traditional methods, designed for specific tampering artifacts, struggle with out-of-distributio…

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation Open

Hongjian Liu, Qingsong Xie, Tianbo Ye, Zhijie Deng, Chen Chen , et al. · 2025

The iterative sampling procedure employed by diffusion models (DMs) often leads to significant latency. To address this, we propose Stochastic Consistency Distillation (SCott) to enable accelerated text-to-image generation, where high-qual…

Boosting Image De-Raining via Central-Surrounding Synergistic Convolution Open

Long Peng, Yang Wang, Xin Di, PeizheXia, Xueyang Fu , et al. · 2025

Rainy images suffer from quality degradation due to the synergistic effect of rain streaks and accumulation. The rain streaks are anisotropic and show a specific directional arrangement, while the rain accumulation is isotropic and shows a…

EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction Open

Chengjie Ge, Xueyang Fu, Peng He, Kunyu Wang, Chengzhi Cao , et al. · 2025

Leveraging its robust linear global modeling capability, Mamba has notably excelled in computer vision. Despite its success, existing Mamba-based vision models have overlooked the nuances of event-driven tasks, especially in video reconstr…

HOIMamba: Efficient Mamba-based Disentangled Progressive Learning for HOI Detection Open

Yongchao Xu, Jiawei Liu, Shuman Tao, Qiang Zhang, Zheng-Jun Zha · 2025

Human-object interaction (HOI) detection aims to detect the spatial positions of human-object pairs and recognize their interactions. Existing single-branch, two-branch, and three-branch methods are challenging to make an appropriate trade…

DCTMamba: Advancing JPEG Image Restoration Through Long-Sequence Modeling and Adaptive Frequency Strategy Open

Xi Wang, Xueyang Fu, Liang Li, Zheng-Jun Zha · 2025

Despite the advanced long-sequence modeling of Mamba, which has expanded its applications in image restoration, there remains a lack of exploration combining its strengths with the specific characteristics of JPEG image restoration, where …

SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets Open

Yuhang Yang, Qin Zhao, P.C. Wu, Yang Cao, Zheng-Jun Zha · 2025

3D human digitization has long been a highly pursued yet challenging task. Existing methods aim to generate high-quality 3D digital humans from single or multiple views, but remain primarily constrained by current paradigms and the scarcit…

EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction Open

Chengjie Ge, Xueyang Fu, Peng He, Kunyu Wang, Chengzhi Cao , et al. · 2025

Leveraging its robust linear global modeling capability, Mamba has notably excelled in computer vision. Despite its success, existing Mamba-based vision models have overlooked the nuances of event-driven tasks, especially in video reconstr…

HERO: Human Reaction Generation from Videos Open

Chengjun Yu, Wei Zhai, Yuhang Yang, Yiling Cao, Zheng-Jun Zha · 2025

Human reaction generation represents a significant research domain for interactive AI, as humans constantly interact with their surroundings. Previous works focus mainly on synthesizing the reactive motion given a human motion sequence. Th…

WeGen: A Unified Model for Interactive Multimodal Generation as We Chat Open

Zhipeng Huang, Shaobin Zhuang, Canmiao Fu, Binxin Yang, Ying Zhang , et al. · 2025

Existing multimodal generative models fall short as qualified design copilots, as they often struggle to generate imaginative outputs once instructions are less detailed or lack the ability to maintain consistency with the provided referen…

PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments Open

Yuzi Liu, Hanshi Wang, Zheng-Jun Zha, Weiming Hu, Jin Gao · 2025

Recent advancements in autonomous driving perception have revealed exceptional capabilities within structured environments dominated by vehicular traffic. However, current perception models exhibit significant limitations in semi-structure…

Dexmedetomidine ameliorates hepatic ischemia reperfusion injury via modulating SIRT3 mediated mitochondrial quality control Open

Xiaqing Ning, Jilang Tang, Xueqin Li, Jiaqi Wang, Zheng-Jun Zha , et al. · 2025

Ischaemia-reperfusion (IR) damage is an inevitable adverse effect of liver surgery. Recent research has found that IR damage is involved in severe mitochondrial dysfunction. Mitochondrial biosynthesis and dynamics control mitochondrial mas…

Optimizing Large Language Model Training Using FP4 Quantization Open

Ruizhe Wang, Yeyun Gong, Xiao Liu, Guoshuai Zhao, Ziyue Yang , et al. · 2025

The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 prec…

Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration Open

Long Peng, Xin Di, Zhengqian Feng, Wenbo Li, Renjing Pei , et al. · 2025

Image restoration aims to recover details and enhance contrast in degraded images. With the growing demand for high-quality imaging (\textit{e.g.}, 4K and 8K), achieving a balance between restoration quality and computational efficiency ha…

VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization Open

Zixun Fang, Zhiheng Liu, Kai Zhu, Yu Liu, Ka Leong Cheng , et al. · 2025

Video colorization aims to transform grayscale videos into vivid color representations while maintaining temporal consistency and structural integrity. Existing video colorization methods often suffer from color bleeding and lack comprehen…

Non-causal selective state space model for image restoration Open

Jie Xiao, Fan Zihao, Dong Li, Xueyang Fu, Zheng-Jun Zha · 2025

RAIN: Real-time Animation of Infinite Video Stream Open

Zhilei Shu, Ruili Feng, 洋大草, Zheng-Jun Zha · 2024

Live animation has gained immense popularity for enhancing online engagement, yet achieving high-quality, real-time, and stable animation with diffusion models remains challenging, especially on consumer-grade GPUs. Existing methods strugg…

Zheng-Jun Zha YOU? Author Swipe