Yutao Yue
YOU?
Author Swipe
View article: ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model
ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model Open
While diffusion models advance text-to-motion generation, their static semantic conditioning ignores temporal-frequency demands: early denoising requires structural semantics for motion foundations while later stages need localized details…
View article: NUMINA: A Natural Understanding Benchmark for Multi-dimensional Intelligence and Numerical Reasoning Abilities
NUMINA: A Natural Understanding Benchmark for Multi-dimensional Intelligence and Numerical Reasoning Abilities Open
Recent advancements in 2D multimodal large language models (MLLMs) have significantly improved performance in vision-language tasks. However, extending these capabilities to 3D environments remains a distinct challenge due to the complexit…
View article: Text2Weight: Bridging Natural Language and Neural Network Weight Spaces
Text2Weight: Bridging Natural Language and Neural Network Weight Spaces Open
How far are we really from automatically generating neural networks? While neural network weight generation shows promise, current approaches struggle with generalization to unseen tasks and practical application exploration. To address th…
View article: Da Yu: Towards USV-Based Image Captioning for Waterway Surveillance and Scene Understanding
Da Yu: Towards USV-Based Image Captioning for Waterway Surveillance and Scene Understanding Open
Automated waterway environment perception is crucial for enabling unmanned surface vessels (USVs) to understand their surroundings and make informed decisions. Most existing waterway perception models primarily focus on instance-level obje…
View article: From High-SNR Radar Signal to ECG: A Transfer Learning Model with Cardio-Focusing Algorithm for Scenarios with Limited Data
From High-SNR Radar Signal to ECG: A Transfer Learning Model with Cardio-Focusing Algorithm for Scenarios with Limited Data Open
Electrocardiogram (ECG), as a crucial find-grained cardiac feature, has been successfully recovered from radar signals in the literature, but the performance heavily relies on the high-quality radar signal and numerous radar-ECG pairs for …
View article: IMTS is Worth Time $\times$ Channel Patches: Visual Masked Autoencoders for Irregular Multivariate Time Series Prediction
IMTS is Worth Time $\times$ Channel Patches: Visual Masked Autoencoders for Irregular Multivariate Time Series Prediction Open
Irregular Multivariate Time Series (IMTS) forecasting is challenging due to the unaligned nature of multi-channel signals and the prevalence of extensive missing data. Existing methods struggle to capture reliable temporal patterns from su…
View article: Robust Hypothesis Generation: LLM-Automated Language Bias for Inductive Logic Programming
Robust Hypothesis Generation: LLM-Automated Language Bias for Inductive Logic Programming Open
Automating robust hypothesis generation in open environments is pivotal for AI cognition. We introduce a novel framework integrating a multi-agent system, powered by Large Language Models (LLMs), with Inductive Logic Programming (ILP). Our…
View article: An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations
An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations Open
The rise of Large Language Models (LLMs) like ChatGPT has advanced natural language processing, yet concerns about cognitive biases are growing. In this paper, we investigate the anchoring effect, a cognitive bias where the mind relies hea…
View article: Cognitive Disentanglement for Referring Multi-Object Tracking
Cognitive Disentanglement for Referring Multi-Object Tracking Open
As a significant application of multi-source information fusion in intelligent transportation perception systems, Referring Multi-Object Tracking (RMOT) involves localizing and tracking specific objects in video sequences based on language…
View article: Talk2PC: Enhancing 3D Visual Grounding through LiDAR and Radar Point Clouds Fusion for Autonomous Driving
Talk2PC: Enhancing 3D Visual Grounding through LiDAR and Radar Point Clouds Fusion for Autonomous Driving Open
Embodied outdoor scene understanding forms the foundation for autonomous agents to perceive, analyze, and react to dynamic driving environments. However, existing 3D understanding is predominantly based on 2D Vision-Language Models (VLMs),…
View article: BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA
BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA Open
Biomedical visual question answering (VQA) has been widely studied and has demonstrated significant application value and potential in fields such as assistive medical diagnosis. Despite their success, current biomedical VQA models perform…
View article: Physics-Informed Representation Alignment for Sparse Radio-Map Reconstruction
Physics-Informed Representation Alignment for Sparse Radio-Map Reconstruction Open
Radio map reconstruction is essential for enabling advanced applications, yet challenges such as complex signal propagation and sparse observational data hinder accurate reconstruction in practical scenarios. Existing methods often fail to…
View article: Free-T2M: Robust Text-to-Motion Generation for Humanoid Robots via Frequency-Domain
Free-T2M: Robust Text-to-Motion Generation for Humanoid Robots via Frequency-Domain Open
Enabling humanoid robots to synthesize complex, physically coherent motions from natural language commands is a cornerstone of autonomous robotics and human-robot interaction. While diffusion models have shown promise in this text-to-motio…
View article: RadarNeXt: Real-Time and Reliable 3D Object Detector Based On 4D mmWave Imaging Radar
RadarNeXt: Real-Time and Reliable 3D Object Detector Based On 4D mmWave Imaging Radar Open
3D object detection is crucial for Autonomous Driving (AD) and Advanced Driver Assistance Systems (ADAS). However, most 3D detectors prioritize detection accuracy, often overlooking network inference speed in practical applications. In thi…
View article: RePaIR: Repaired pruning at initialization resilience
RePaIR: Repaired pruning at initialization resilience Open
Over the past decade, the size of neural network models has gradually increased in both breadth and depth, leading to a growing interest in the application of neural network pruning. Unstructured pruning provides fine-grained sparsity and …
View article: CFSSeg: Closed-Form Solution for Class-Incremental Semantic Segmentation of 2D Images and 3D Point Clouds
CFSSeg: Closed-Form Solution for Class-Incremental Semantic Segmentation of 2D Images and 3D Point Clouds Open
2D images and 3D point clouds are foundational data types for multimedia applications, including real-time video analysis, augmented reality (AR), and 3D scene understanding. Class-incremental semantic segmentation (CSS) requires increment…
View article: Guarding the Gate: ConceptGuard Battles Concept-Level Backdoors in Concept Bottleneck Models
Guarding the Gate: ConceptGuard Battles Concept-Level Backdoors in Concept Bottleneck Models Open
The increasing complexity of AI models, especially in deep learning, has raised concerns about transparency and accountability, particularly in high-stakes applications like medical diagnostics, where opaque models can undermine trust. Exp…
View article: Learning New Concepts, Remembering the Old: Continual Learning for Multimodal Concept Bottleneck Models
Learning New Concepts, Remembering the Old: Continual Learning for Multimodal Concept Bottleneck Models Open
Concept Bottleneck Models (CBMs) enhance the interpretability of AI systems, particularly by bridging visual input with human-understandable concepts, effectively acting as a form of multimodal interpretability model. However, existing CBM…
View article: Beyond Task Vectors: Selective Task Arithmetic Based on Importance Metrics
Beyond Task Vectors: Selective Task Arithmetic Based on Importance Metrics Open
Pretrained models have revolutionized deep learning by enabling significant performance improvements across a wide range of tasks, leveraging large-scale, pre-learned knowledge representations. However, deploying these models in real-world…
View article: DRIVE: Dual-Robustness via Information Variability and Entropic Consistency in Source-Free Unsupervised Domain Adaptation
DRIVE: Dual-Robustness via Information Variability and Entropic Consistency in Source-Free Unsupervised Domain Adaptation Open
Adapting machine learning models to new domains without labeled data, especially when source data is inaccessible, is a critical challenge in applications like medical imaging, autonomous driving, and remote sensing. This task, known as So…
View article: Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains
Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains Open
Large Language Models (LLMs) are powerful tools for text generation, translation, and summarization, but they often suffer from hallucinations-instances where they fail to maintain the fidelity and coherence of contextual information durin…
View article: radarODE-MTL: A Multi-Task Learning Framework with Eccentric Gradient Alignment for Robust Radar-Based ECG Reconstruction
radarODE-MTL: A Multi-Task Learning Framework with Eccentric Gradient Alignment for Robust Radar-Based ECG Reconstruction Open
Millimeter-wave radar is promising to provide robust and accurate vital sign monitoring in an unobtrusive manner. However, the radar signal might be distorted in propagation by ambient noise or random body movement, ruining the subtle card…
View article: Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion
Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion Open
In the rapidly evolving field of deep learning, specialized models have driven significant advancements in tasks such as computer vision and natural language processing. However, this specialization leads to a fragmented ecosystem where mo…
View article: MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models Open
In recent years, multimodal large language models (MLLMs) have significantly advanced, integrating more modalities into diverse applications. However, the lack of explainability remains a major barrier to their use in scenarios requiring d…
View article: CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models
CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models Open
Despite the transformative impact of deep learning across multiple domains, the inherent opacity of these models has driven the development of Explainable Artificial Intelligence (XAI). Among these efforts, Concept Bottleneck Models (CBMs)…
View article: UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection
UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection Open
4D millimeter-wave (MMW) radar, which provides both height information and dense point cloud data over 3D MMW radar, has become increasingly popular in 3D object detection. In recent years, radar-vision fusion models have demonstrated perf…
View article: DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving
DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving Open
Recent advancements in autonomous driving have seen a paradigm shift towards end-to-end learning paradigms, which map sensory inputs directly to driving actions, thereby enhancing the robustness and adaptability of autonomous vehicles. How…
View article: PEPL: Precision-Enhanced Pseudo-Labeling for Fine-Grained Image Classification in Semi-Supervised Learning
PEPL: Precision-Enhanced Pseudo-Labeling for Fine-Grained Image Classification in Semi-Supervised Learning Open
Fine-grained image classification has witnessed significant advancements with the advent of deep learning and computer vision technologies. However, the scarcity of detailed annotations remains a major challenge, especially in scenarios wh…
View article: Structural, magnetic, and electrical properties of LaMnO<sub>3</sub> doped with Na by microwave-assisted synthesis method
Structural, magnetic, and electrical properties of LaMnO<sub>3</sub> doped with Na by microwave-assisted synthesis method Open
The electric and magnetic transport properties of La 1-x Na x MnO 3 (x = 0.15, 0.2, 0.25, 0.3) are studied using the X-ray diffraction (XRD), vibrating sample magnetometer (VSM), and four-probe resistivity measurement method. The samples a…
View article: NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar Open
Recently, visual grounding and multi-sensors setting have been incorporated into perception system for terrestrial autonomous driving systems and Unmanned Surface Vehicles (USVs), yet the high complexity of modern learning-based visual gro…