Explanipedia

Revealing nanostructures in high-entropy alloys via machine-learning accelerated scalable Monte Carlo simulation Open

Xianglin Liu, Kai Yang, Yongxiang Liu, Fei Zhou, Dengdong Fan , et al. · 2025

First-principles Monte Carlo (MC) simulations at finite temperatures are computationally prohibitive for large systems due to the high cost of quantum calculations and poor parallelizability of sequential Markov chains in MC algorithms. We…

A Self-Ensemble Inspired Approach for Effective Training of Binary-Weight Spiking Neural Networks Open

Qingyan Meng, Mingqing Xiao, Zhengyu Ma, Huihui Zhou, Yonghong Tian , et al. · 2025

Spiking Neural Networks (SNNs) are a promising approach to low-power applications on neuromorphic hardware due to their energy efficiency. However, training SNNs is challenging because of the non-differentiable spike generation function. T…

SGEMM-cube: Emulating FP32 GEMM on Ascend NPUs Using FP16 Cube Units with Precision Recovery Open

Weicheng Xue, B. D. Xu, Kai Yang, Yongxiang Liu, Deyuan Fan , et al. · 2025

Low-precision matrix engines, such as FP16 cube, offer high throughput but lack support for full-precision computation. In this work, we propose SGEMM-cube, a high-performance algorithm for emulating FP32 general matrix-matrix multiplicati…

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations Open

Hao Li, He Cao, Bin Feng, Yanjun Shao, Xiangru Tang , et al. · 2025

While large language models (LLMs) with Chain-of-Thought (CoT) reasoning excel in mathematics and coding, their potential for systematic reasoning in chemistry, a domain demanding rigorous structural analysis for real-world tasks like drug…

Multi-Timescale Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning Open

Wenrui Li, Penghong Wang, Xingtao Wang, Wangmeng Zuo, Xiaopeng Fan , et al. · 2025

Audio-visual zero-shot learning (ZSL) has been extensively researched for its capability to classify video data from unseen classes during training. Nevertheless, current methodologies often struggle with background scene biases and inadeq…

GS2E: Gaussian Splatting is an Effective Data Generator for Event Stream Generation Open

Yuchen Li, Chaoran Feng, Zhenyu Tang, Kaiyuan Deng, Wangbo Yu , et al. · 2025

We introduce GS2E (Gaussian Splatting to Event), a large-scale synthetic event dataset for high-fidelity event vision tasks, captured from real-world sparse multi-view RGB images. Existing event datasets are often synthesized from dense RG…

Dynamic Graph Induced Contour-aware Heat Conduction Network for Event-based Object Detection Open

Xiao Wang, Yu Jin, Lan Chen, Bo Jiang, Lin Zhu , et al. · 2025

Event-based Vision Sensors (EVS) have demonstrated significant advantages over traditional RGB frame-based cameras in low-light conditions, high-speed motion capture, and low latency. Consequently, object detection based on EVS has attract…

Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach Open

Shiao Wang, Xiaoping Wang, Liye Jin, Bo Jiang, Lin Zhu , et al. · 2025

Existing tracking algorithms typically rely on low-frame-rate RGB cameras coupled with computationally intensive deep neural network architectures to achieve effective tracking. However, such frame-based methods inherently face challenges …

SAKPE: A Site Attention Kinetic Parameters Prediction Method for Enzyme Engineering Open

Jiuchun Qiu, Zhaoxian Lin, Kewei Chen, Tian‐Yu Sun, Xian Zhang , et al. · 2025

0. Abstract The quantitative determination of enzyme kinetic parameters traditionally relies on experimental methods that are both time-intensive and costly. Machine learning models have demonstrated significant potential for predicting en…

Toward general object search in open reality Open

Gang Shen, Wenjun Ma, Guangyao Chen, Yonghong Tian · 2025

Real-world scenarios are inherently dynamic and open-ended, necessitating that current deep models adapt to general objects in open realities to be practically useful. In this paper, we extend a valuable computer vision task called G enera…

Retina-Inspired Models Enhance Visual Saliency Prediction Open

Gang Shen, Wenjun Ma, Wenguang Zhai, Xuefei Lv, Guangyao Chen , et al. · 2025

Biologically inspired retinal preprocessing improves visual perception by efficiently encoding and reducing entropy in images. In this study, we introduce a new saliency prediction framework that combines a retinal model with deep neural n…

AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scenes Open

Chaoran Feng, Wangbo Yu, Xinhua Cheng, Zhenyu Tang, Junwu Zhang , et al. · 2025

Compared to frame-based methods, computational neuromorphic imaging using event cameras offers significant advantages, such as minimal motion blur, enhanced temporal resolution, and high dynamic range. The multi-view consistency of Neural …

Visual Reinforcement Learning with Residual Action Open

Zhenxian Liu, Peixi Peng, Yonghong Tian · 2025

Learning control policy from continuous action space by visual observations is a fundamental and challenging task in reinforcement learning (RL). An essential problem is how to accurately map the high-dimensional images to the optimal acti…

How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension Open

Hao Li, Liuzhenghao Lv, He Cao, Zijing Liu, Yu Wang , et al. · 2025

Large language models are increasingly used in scientific domains, especially for molecular understanding and analysis. However, existing models are affected by hallucination issues, resulting in errors in drug design and utilization. In t…

Content-Distortion High-Order Interaction for Blind Image Quality Assessment Open

Qingyu Mao, Jiacong Chen, Yonghong Tian, Yongsheng Liang · 2025

The content and distortion are widely recognized as the two primary factors affecting the visual quality of an image. While existing No-Reference Image Quality Assessment (NR-IQA) methods have modeled these factors, they fail to capture th…

ShiftLIC: Lightweight Learned Image Compression with Spatial-Channel Shift Operations Open

Wen Tan, Chuanmin Jia, Mu Li, Yongsheng Liang, Yonghong Tian · 2025

Learned Image Compression (LIC) has attracted considerable attention due to their outstanding rate-distortion (R-D) performance and flexibility. However, the substantial computational cost poses challenges for practical deployment. The iss…

Bridging human emotion processing and deep neural networks: insights from representational similarity analysis Open

Lu Nie, Ke Chen, Ting Li, Yonghong Tian, Yixuan Ku · 2025

Emotion is a complex psychophysiological response to external stimuli, essential for human survival, social interaction, and human-computer interaction. Emotion recognition plays a critical role in both biological systems and artificial ag…

Revealing Nanostructures in High-Entropy Alloys via Machine-Learning Accelerated Scalable Monte Carlo Simulation Open

Xianglin Liu, Kai Yang, Yongxiang Liu, Fei Zhou, Dengdong Fan , et al. · 2025

The computational cost of traditional first-principles method quickly becomes prohibitively expensive as the number of atoms increases. This challenge is further amplified by the need to evaluate finite-temperature properties with Monte Ca…

Delta-Triplane Transformers as Occupancy World Models Open

Haoran Xu, Peixi Peng, Guang Tan, Yang-Lang Chang, Yulong Zhao , et al. · 2025

Occupancy World Models (OWMs) aim to predict future scenes via 3D voxelized representations of the environment to support intelligent motion planning. Existing approaches typically generate full future occupancy states from VAE-style laten…

Deep neural networks and fractional grey lag Goose optimization for music genre identification Open

Yonghong Tian · 2025

Magic 1-For-1: Generating One Minute Video Clips within One Minute Open

Hongwei Yi, Shitong Shao, Ye Tian, J. Y. Zhao, Qing‐Zhu Yin , et al. · 2025

In this technical report, we present Magic 1-For-1 (Magic141), an efficient video generation model with optimized memory consumption and inference latency. The key idea is simple: factorize the text-to-video generation task into two separa…

Navigating Chemical-Linguistic Sharing Space with Heterogeneous Molecular Encoding Open

Liangyu Lv, Hao Li, Yu Wang, Zhiyuan Yan, Zijun Chen , et al. · 2025

Chemical language models (CLMs) are prominent for their effectiveness in exploring chemical space and enabling molecular design and engineering. However, while exploring chemical-linguistic space, CLMs suffer from the semantic gap between …

From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning Open

Yuhui Zeng, Haoxiang Wu, Wenjie Nie, Guangyao Chen, Xiawu Zheng , et al. · 2025

Current object detectors excel at entity localization and classification, yet exhibit inherent limitations in event recognition capabilities. This deficiency arises from their architecture's emphasis on discrete object identification rathe…

Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark Open

Shiao Wang, Xiaoping Wang, Chao Wang, Liye Jin, Lin Zhu , et al. · 2025

We then introduce a novel hierarchical knowledge distillation strategy that incorporates the similarity matrix, feature representation, and response map-based distillation to guide the learning of the student Transformer network. We also e…

Multiplication-Free Parallelizable Spiking Neurons with Efficient Spatio-Temporal Dynamics Open

Peng Xue, Fang Wei, Zhenqiang Ma, Zihan Huang, Zhaokun Zhou , et al. · 2025

Spiking Neural Networks (SNNs) are distinguished from Artificial Neural Networks (ANNs) for their complex neuronal dynamics and sparse binary activations (spikes) inspired by the biological neural system. Traditional neuron models use iter…

Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation Open

Xiao Wang, Fuling Wang, Haowen Wang, Bo Jiang, Chuanfu Li , et al. · 2025

X-ray image based medical report generation achieves significant progress in recent years with the help of the large language model, however, these models have not fully exploited the effective information in visual image regions, resultin…

AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene Open

Chaoran Feng, Wangbo Yu, Xinhua Cheng, Zhenyu Tang, Junwu Zhang , et al. · 2025

Compared to frame-based methods, computational neuromorphic imaging using event cameras offers significant advantages, such as minimal motion blur, enhanced temporal resolution, and high dynamic range. The multi-view consistency of Neural …

VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition Open

Chen Lan, Haoxiang Yang, Pengpeng Shao, Haoyu Song, Xiao Wang , et al. · 2025

Efficiently Training Time-to-First-Spike Spiking Neural Networks from Scratch Open

Kaiwei Che, Wei Fang, Zhengyu Ma, Yifan Huang, Timothée Masquelier , et al. · 2025

VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition Open

Chen Lan, Haoxiang Yang, Pengpeng Shao, Haoyu Song, Xiao Wang , et al. · 2024

Pattern recognition leveraging both RGB and Event cameras can significantly enhance performance by deploying deep neural networks that utilize a fine-tuning strategy. Inspired by the successful application of large models, the introduction…

Yonghong Tian YOU? Author Swipe