Xiangmin Xu
YOU?
Author Swipe
View article: MimicParts: Part-aware Style Injection for Speech-Driven 3D Motion Generation
MimicParts: Part-aware Style Injection for Speech-Driven 3D Motion Generation Open
Generating stylized 3D human motion from speech signals presents substantial challenges, primarily due to the intricate and fine-grained relationships among speech signals, individual styles, and the corresponding body movements. Current s…
View article: CATCH: A Novel Data Synthesis Framework for High Therapy Fidelity and Memory-Driven Planning Chain of Thought in AI Counseling
CATCH: A Novel Data Synthesis Framework for High Therapy Fidelity and Memory-Driven Planning Chain of Thought in AI Counseling Open
Recently, advancements in AI counseling based on large language models have shown significant progress. However, existing studies employ a one-time generation approach to synthesize multi-turn dialogue samples, resulting in low therapy fid…
View article: HVI-Based Spatial–Frequency-Domain Multi-Scale Fusion for Low-Light Image Enhancement
HVI-Based Spatial–Frequency-Domain Multi-Scale Fusion for Low-Light Image Enhancement Open
Low-light image enhancement aims to restore images captured under extreme low-light conditions. Existing methods demonstrate that fusing Fourier transform magnitude and phase information within the RGB color space effectively improves enha…
View article: HD-PPT: Hierarchical Decoding of Content- and Prompt-Preference Tokens for Instruction-based TTS
HD-PPT: Hierarchical Decoding of Content- and Prompt-Preference Tokens for Instruction-based TTS Open
Large Language Model (LLM)-based Text-to-Speech (TTS) models have already reached a high degree of naturalness. However, the precision control of TTS inference is still challenging. Although instruction-based Text-to-Speech (Instruct-TTS) …
View article: Road Surface State Change Detection Based on Binocular Vision for Autonomous Driving System
Road Surface State Change Detection Based on Binocular Vision for Autonomous Driving System Open
Road surface condition monitoring is crucial for enhancing transportation safety and efficiency, with applications in autonomous driving and urban infrastructure management. Existing methods often rely on single-camera setups or manual ins…
View article: New “Quality” Driving New “Governance”: A Dual-Driven Approach to Rural Digital Governance from the Perspective of Adapting Production Relations
New “Quality” Driving New “Governance”: A Dual-Driven Approach to Rural Digital Governance from the Perspective of Adapting Production Relations Open
The innovative concept of “new quality productive forces” highlights technological innovation as the core driving force behind high-quality economic development. With the increasing empowerment of modern information technologies such as di…
View article: Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection
Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection Open
Open vocabulary Human-Object Interaction (HOI) detection is a challenging task that detects all triplets of interest in an image, even those that are not pre-defined in the training set. Existing approaches typically rely on output feature…
View article: S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models
S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models Open
End-to-end speech large language models ((LLMs)) extend the capabilities of text-based models to directly process and generate audio tokens. However, this often leads to a decline in reasoning and generation performance compared to text in…
View article: HedgeAgents: A Balanced-aware Multi-agent Financial Trading System
HedgeAgents: A Balanced-aware Multi-agent Financial Trading System Open
View article: Early autism diagnosis based on path signature and Siamese unsupervised feature compressor
Early autism diagnosis based on path signature and Siamese unsupervised feature compressor Open
Autism spectrum disorder has been emerging as a growing public health threat. Early diagnosis of autism spectrum disorder is crucial for timely, effective intervention and treatment. However, conventional diagnosis methods based on communi…
View article: Revitalising Aging Oocytes: Echinacoside Restores Mitochondrial Function and Cellular Homeostasis Through Targeting <scp>GJA1</scp>/<scp>SIRT1</scp> Pathway
Revitalising Aging Oocytes: Echinacoside Restores Mitochondrial Function and Cellular Homeostasis Through Targeting <span>GJA1</span>/<span>SIRT1</span> Pathway Open
As maternal age increases, the decline in oocyte quality emerges as a critical factor contributing to reduced reproductive capacity, highlighting the urgent need for effective strategies to combat oocyte aging. This study investigated the …
View article: Transient Synchronization Stability in Grid-Following Converters: Mechanistic Insights and Technological Prospects—A Review
Transient Synchronization Stability in Grid-Following Converters: Mechanistic Insights and Technological Prospects—A Review Open
This paper investigates the transient synchronization stability mechanisms and technological advancements associated with grid-following (GFL) converters, providing a systematic review of the current research landscape and future direction…
View article: Ion Transport Mechanism in the Sub-Nano Channels of Edge-Capping Modified Transition Metal Carbides/Nitride Membranes
Ion Transport Mechanism in the Sub-Nano Channels of Edge-Capping Modified Transition Metal Carbides/Nitride Membranes Open
Edge-capping modified MXene membranes with new channels created by lateral nanosheets are of great research significance. After introducing tripolyphosphate (STPP) to Ti edges of Ti3C2Tx nanosheets and fabricating the STPP-MXene membranes …
View article: Wearable fall risk assessment by discriminating recessive weak foot individual
Wearable fall risk assessment by discriminating recessive weak foot individual Open
Background Sensor-based technologies have been widely used in fall risk assessment. To enhance the model's robustness and reliability, it is crucial to analyze and discuss the factors contributing to the misclassification of certain indivi…
View article: Spatial profiling of the interplay between cell type- and vision-dependent transcriptomic programs in the visual cortex
Spatial profiling of the interplay between cell type- and vision-dependent transcriptomic programs in the visual cortex Open
How early sensory experience during “critical periods” of postnatal life affects the organization of the mammalian neocortex at the resolution of neuronal cell types is poorly understood. We previously reported that the functional and mole…
View article: Uncertainty-Aware Cross Entropy for Robust Learning with Noisy Labels
Uncertainty-Aware Cross Entropy for Robust Learning with Noisy Labels Open
View article: PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling
PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling Open
View article: CATCH: A Novel Data Synthesis Framework for High Therapy Fidelity and Memory-Driven Planning Chain of Thought in AI Counseling
CATCH: A Novel Data Synthesis Framework for High Therapy Fidelity and Memory-Driven Planning Chain of Thought in AI Counseling Open
View article: PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling
PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling Open
Currently, large language models (LLMs) have made significant progress in the field of psychological counseling. However, existing mental health LLMs overlook a critical issue where they do not consider the fact that different psychologica…
View article: ViTGaze: gaze following with interaction features in vision transformers
ViTGaze: gaze following with interaction features in vision transformers Open
Gaze following aims to interpret human-scene interactions by predicting the person’s focal point of gaze. Prevailing approaches often adopt a two-stage framework, whereby multi-modality information is extracted in the initial stage for gaz…
View article: Multi-Scale Temporal Transformer For Speech Emotion Recognition
Multi-Scale Temporal Transformer For Speech Emotion Recognition Open
Speech emotion recognition plays a crucial role in human-machine interaction systems. Recently various optimized Transformers have been successfully applied to speech emotion recognition. However, the existing Transformer architectures foc…
View article: Online Multi-level Contrastive Representation Distillation for Cross-Subject fNIRS Emotion Recognition
Online Multi-level Contrastive Representation Distillation for Cross-Subject fNIRS Emotion Recognition Open
Utilizing functional near-infrared spectroscopy (fNIRS) signals for emotion recognition is a significant advancement in understanding human emotions. However, due to the lack of artificial intelligence data and algorithms in this field, cu…
View article: The Application of Blockchain Technology in the Financial Field
The Application of Blockchain Technology in the Financial Field Open
The advent of the digital age has made innovative technologies exceptionally important, many research institutions and businesses are continuously increasing their investments in the field of new digital technologies. Blockchain, as one of…
View article: RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation
RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation Open
As an extension of machine translation, the primary objective of multi-modal machine translation is to optimize the utilization of visual information. Technically, image information is integrated into multi-modal fusion and alignment as an…
View article: Disentangled Pre-training for Human-Object Interaction Detection
Disentangled Pre-training for Human-Object Interaction Detection Open
Detecting human-object interaction (HOI) has long been limited by the amount of supervised data available. Recent approaches address this issue by pre-training according to pseudo-labels, which align object regions with HOI triplets parsed…
View article: Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On Open
Image-based virtual try-on is an increasingly important task for online shopping. It aims to synthesize images of a specific person wearing a specified garment. Diffusion model-based approaches have recently become popular, as they are exc…
View article: Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset
Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset Open
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e.…
View article: ViTGaze: Gaze Following with Interaction Features in Vision Transformers
ViTGaze: Gaze Following with Interaction Features in Vision Transformers Open
Gaze following aims to interpret human-scene interactions by predicting the person's focal point of gaze. Prevailing approaches often adopt a two-stage framework, whereby multi-modality information is extracted in the initial stage for gaz…
View article: A joint brain extraction and image quality assessment framework for fetal brain MRI slices
A joint brain extraction and image quality assessment framework for fetal brain MRI slices Open
Brain extraction and image quality assessment are two fundamental steps in fetal brain magnetic resonance imaging (MRI) 3D reconstruction and quantification. However, the randomness of fetal position and orientation, the variability of fet…
View article: A novel double-sided fabric strain sensor array fabricated with a facile and cost-effective process
A novel double-sided fabric strain sensor array fabricated with a facile and cost-effective process Open
Electronic textiles face challenge in fabricating stretchable, double-sided circuits with reliable interfaces. In this study, a double-sided strain sensor array was designed and prepared on an elastic fabric substrate by printing the sensi…