Yonghong Tian
YOU?
Author Swipe
View article: Revealing nanostructures in high-entropy alloys via machine-learning accelerated scalable Monte Carlo simulation
Revealing nanostructures in high-entropy alloys via machine-learning accelerated scalable Monte Carlo simulation Open
First-principles Monte Carlo (MC) simulations at finite temperatures are computationally prohibitive for large systems due to the high cost of quantum calculations and poor parallelizability of sequential Markov chains in MC algorithms. We…
View article: A Self-Ensemble Inspired Approach for Effective Training of Binary-Weight Spiking Neural Networks
A Self-Ensemble Inspired Approach for Effective Training of Binary-Weight Spiking Neural Networks Open
Spiking Neural Networks (SNNs) are a promising approach to low-power applications on neuromorphic hardware due to their energy efficiency. However, training SNNs is challenging because of the non-differentiable spike generation function. T…
View article: SGEMM-cube: Emulating FP32 GEMM on Ascend NPUs Using FP16 Cube Units with Precision Recovery
SGEMM-cube: Emulating FP32 GEMM on Ascend NPUs Using FP16 Cube Units with Precision Recovery Open
Low-precision matrix engines, such as FP16 cube, offer high throughput but lack support for full-precision computation. In this work, we propose SGEMM-cube, a high-performance algorithm for emulating FP32 general matrix-matrix multiplicati…
View article: Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations Open
While large language models (LLMs) with Chain-of-Thought (CoT) reasoning excel in mathematics and coding, their potential for systematic reasoning in chemistry, a domain demanding rigorous structural analysis for real-world tasks like drug…
View article: Multi-Timescale Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning
Multi-Timescale Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning Open
Audio-visual zero-shot learning (ZSL) has been extensively researched for its capability to classify video data from unseen classes during training. Nevertheless, current methodologies often struggle with background scene biases and inadeq…
View article: GS2E: Gaussian Splatting is an Effective Data Generator for Event Stream Generation
GS2E: Gaussian Splatting is an Effective Data Generator for Event Stream Generation Open
We introduce GS2E (Gaussian Splatting to Event), a large-scale synthetic event dataset for high-fidelity event vision tasks, captured from real-world sparse multi-view RGB images. Existing event datasets are often synthesized from dense RG…
View article: Dynamic Graph Induced Contour-aware Heat Conduction Network for Event-based Object Detection
Dynamic Graph Induced Contour-aware Heat Conduction Network for Event-based Object Detection Open
Event-based Vision Sensors (EVS) have demonstrated significant advantages over traditional RGB frame-based cameras in low-light conditions, high-speed motion capture, and low latency. Consequently, object detection based on EVS has attract…
View article: Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach
Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach Open
Existing tracking algorithms typically rely on low-frame-rate RGB cameras coupled with computationally intensive deep neural network architectures to achieve effective tracking. However, such frame-based methods inherently face challenges …
View article: SAKPE: A Site Attention Kinetic Parameters Prediction Method for Enzyme Engineering
SAKPE: A Site Attention Kinetic Parameters Prediction Method for Enzyme Engineering Open
0. Abstract The quantitative determination of enzyme kinetic parameters traditionally relies on experimental methods that are both time-intensive and costly. Machine learning models have demonstrated significant potential for predicting en…
View article: Toward general object search in open reality
Toward general object search in open reality Open
Real-world scenarios are inherently dynamic and open-ended, necessitating that current deep models adapt to general objects in open realities to be practically useful. In this paper, we extend a valuable computer vision task called G enera…
View article: Retina-Inspired Models Enhance Visual Saliency Prediction
Retina-Inspired Models Enhance Visual Saliency Prediction Open
Biologically inspired retinal preprocessing improves visual perception by efficiently encoding and reducing entropy in images. In this study, we introduce a new saliency prediction framework that combines a retinal model with deep neural n…
View article: AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scenes
AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scenes Open
Compared to frame-based methods, computational neuromorphic imaging using event cameras offers significant advantages, such as minimal motion blur, enhanced temporal resolution, and high dynamic range. The multi-view consistency of Neural …
View article: Visual Reinforcement Learning with Residual Action
Visual Reinforcement Learning with Residual Action Open
Learning control policy from continuous action space by visual observations is a fundamental and challenging task in reinforcement learning (RL). An essential problem is how to accurately map the high-dimensional images to the optimal acti…
View article: How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension
How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension Open
Large language models are increasingly used in scientific domains, especially for molecular understanding and analysis. However, existing models are affected by hallucination issues, resulting in errors in drug design and utilization. In t…
View article: Content-Distortion High-Order Interaction for Blind Image Quality Assessment
Content-Distortion High-Order Interaction for Blind Image Quality Assessment Open
The content and distortion are widely recognized as the two primary factors affecting the visual quality of an image. While existing No-Reference Image Quality Assessment (NR-IQA) methods have modeled these factors, they fail to capture th…
View article: ShiftLIC: Lightweight Learned Image Compression with Spatial-Channel Shift Operations
ShiftLIC: Lightweight Learned Image Compression with Spatial-Channel Shift Operations Open
Learned Image Compression (LIC) has attracted considerable attention due to their outstanding rate-distortion (R-D) performance and flexibility. However, the substantial computational cost poses challenges for practical deployment. The iss…
View article: Bridging human emotion processing and deep neural networks: insights from representational similarity analysis
Bridging human emotion processing and deep neural networks: insights from representational similarity analysis Open
Emotion is a complex psychophysiological response to external stimuli, essential for human survival, social interaction, and human-computer interaction. Emotion recognition plays a critical role in both biological systems and artificial ag…
View article: Revealing Nanostructures in High-Entropy Alloys via Machine-Learning Accelerated Scalable Monte Carlo Simulation
Revealing Nanostructures in High-Entropy Alloys via Machine-Learning Accelerated Scalable Monte Carlo Simulation Open
The computational cost of traditional first-principles method quickly becomes prohibitively expensive as the number of atoms increases. This challenge is further amplified by the need to evaluate finite-temperature properties with Monte Ca…
View article: Delta-Triplane Transformers as Occupancy World Models
Delta-Triplane Transformers as Occupancy World Models Open
Occupancy World Models (OWMs) aim to predict future scenes via 3D voxelized representations of the environment to support intelligent motion planning. Existing approaches typically generate full future occupancy states from VAE-style laten…
View article: Deep neural networks and fractional grey lag Goose optimization for music genre identification
Deep neural networks and fractional grey lag Goose optimization for music genre identification Open
View article: Magic 1-For-1: Generating One Minute Video Clips within One Minute
Magic 1-For-1: Generating One Minute Video Clips within One Minute Open
In this technical report, we present Magic 1-For-1 (Magic141), an efficient video generation model with optimized memory consumption and inference latency. The key idea is simple: factorize the text-to-video generation task into two separa…
View article: Navigating Chemical-Linguistic Sharing Space with Heterogeneous Molecular Encoding
Navigating Chemical-Linguistic Sharing Space with Heterogeneous Molecular Encoding Open
Chemical language models (CLMs) are prominent for their effectiveness in exploring chemical space and enabling molecular design and engineering. However, while exploring chemical-linguistic space, CLMs suffer from the semantic gap between …
View article: From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning
From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning Open
Current object detectors excel at entity localization and classification, yet exhibit inherent limitations in event recognition capabilities. This deficiency arises from their architecture's emphasis on discrete object identification rathe…
View article: Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark
Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark Open
We then introduce a novel hierarchical knowledge distillation strategy that incorporates the similarity matrix, feature representation, and response map-based distillation to guide the learning of the student Transformer network. We also e…
View article: Multiplication-Free Parallelizable Spiking Neurons with Efficient Spatio-Temporal Dynamics
Multiplication-Free Parallelizable Spiking Neurons with Efficient Spatio-Temporal Dynamics Open
Spiking Neural Networks (SNNs) are distinguished from Artificial Neural Networks (ANNs) for their complex neuronal dynamics and sparse binary activations (spikes) inspired by the biological neural system. Traditional neuron models use iter…
View article: Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation
Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation Open
X-ray image based medical report generation achieves significant progress in recent years with the help of the large language model, however, these models have not fully exploited the effective information in visual image regions, resultin…
View article: AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene
AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene Open
Compared to frame-based methods, computational neuromorphic imaging using event cameras offers significant advantages, such as minimal motion blur, enhanced temporal resolution, and high dynamic range. The multi-view consistency of Neural …
View article: VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition
VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition Open
View article: Efficiently Training Time-to-First-Spike Spiking Neural Networks from Scratch
Efficiently Training Time-to-First-Spike Spiking Neural Networks from Scratch Open
View article: VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition
VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition Open
Pattern recognition leveraging both RGB and Event cameras can significantly enhance performance by deploying deep neural networks that utilize a fine-tuning strategy. Inspired by the successful application of large models, the introduction…