Xiu Li
YOU?
Author Swipe
View article: ICE: Intercede Concept Erasure in Text-to-Image Diffusion Models
ICE: Intercede Concept Erasure in Text-to-Image Diffusion Models Open
View article: Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis
Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis Open
View article: A Motion is Worth a Hybrid Sentence: Taming Language Model for Unified Motion Generation by Fine-grained Planning
A Motion is Worth a Hybrid Sentence: Taming Language Model for Unified Motion Generation by Fine-grained Planning Open
View article: ASPO: Asymmetric Importance Sampling Policy Optimization
ASPO: Asymmetric Importance Sampling Policy Optimization Open
Recent Large Language Model (LLM) post-training methods rely on token-level clipping mechanisms during Reinforcement Learning (RL). However, we identify a fundamental flaw in this Outcome-Supervised RL (OSRL) paradigm: the Importance Sampl…
View article: Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Open
Reinforcement Learning (RL) has shown remarkable success in enhancing the reasoning capabilities of Large Language Models (LLMs). Process-Supervised RL (PSRL) has emerged as a more effective paradigm compared to outcome-based RL. However, …
View article: Mycobacterium tuberculosis infection status and associated factors among household close contacts of rifampicin-resistant pulmonary tuberculosis patients: A single-center cross-sectional study
Mycobacterium tuberculosis infection status and associated factors among household close contacts of rifampicin-resistant pulmonary tuberculosis patients: A single-center cross-sectional study Open
View article: Reversible Authentication Watermarking Based on Improved 2D Histogram and Adaptive Difference Expansion
Reversible Authentication Watermarking Based on Improved 2D Histogram and Adaptive Difference Expansion Open
To address the limitations of low authentication accuracy and ineffective protection for complex-texture images/regions in existing reversible schemes, an improved algorithm based on two-Dimensional (2D) histogram and difference expansion …
View article: Enhancing Online Video Recommendation via a Coarse-to-fine Dynamic Uplift Modeling Framework
Enhancing Online Video Recommendation via a Coarse-to-fine Dynamic Uplift Modeling Framework Open
View article: Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance Open
Vision-Language-Action (VLA) models pre-trained on large, diverse datasets show remarkable potential for general-purpose robotic manipulation. However, a primary bottleneck remains in adapting these models to downstream tasks, especially w…
View article: One policy to rule them all: Handling multiple emergent accidents in nuclear power plants with ensemble-based behavior cloning
One policy to rule them all: Handling multiple emergent accidents in nuclear power plants with ensemble-based behavior cloning Open
View article: Development of a multi-indicator risk prediction model for cervical cancer associated with benzo[a]pyrene and nicotine exposure: A multi-omics study integrating toxicological analyses and molecular docking
Development of a multi-indicator risk prediction model for cervical cancer associated with benzo[a]pyrene and nicotine exposure: A multi-omics study integrating toxicological analyses and molecular docking Open
Risk prediction models based on multi-omics data and machine learning algorithms provide potential reference targets for prognosis prediction and personalised treatment of cervical cancer patients. The results of this study provide importa…
View article: S$^2$-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models
S$^2$-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models Open
Classifier-free Guidance (CFG) is a widely used technique in modern diffusion models for enhancing sample quality and prompt adherence. However, through an empirical analysis on Gaussian mixture modeling with a closed-form solution, we obs…
View article: X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention Open
We propose X-NeMo, a novel zero-shot diffusion-based portrait animation pipeline that animates a static portrait using facial movements from a driving video of a different individual. Our work first identifies the root causes of the key is…
View article: Advancing Financial Engineering with Foundation Models: Progress, Applications, and Challenges
Advancing Financial Engineering with Foundation Models: Progress, Applications, and Challenges Open
The advent of foundation models (FMs) - large-scale pre-trained models with strong generalization capabilities - has opened new frontiers for financial engineering. While general-purpose FMs such as GPT-4 and Gemini have demonstrated promi…
View article: Segment Concealed Objects With Incomplete Supervision
Segment Concealed Objects With Incomplete Supervision Open
Incompletely-Supervised Concealed Object Segmentation (ISCOS) involves segmenting objects that seamlessly blend into their surrounding environments, utilizing incompletely annotated data, such as weak and semi-annotations, for model traini…
View article: SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning Open
Leveraging multimodal large models for image segmentation has become a prominent research direction. However, existing approaches typically rely heavily on manually annotated datasets that include explicit reasoning processes, which are co…
View article: Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images
Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images Open
Background The development of medical artificial intelligence (AI) models is primarily driven by the need to address healthcare resource scarcity, particularly in underserved regions. Proposing an affordable, accessible, interpretable, and…
View article: CreativeSynth: Cross-Art-Attention for Artistic Image Synthesis With Multimodal Diffusion
CreativeSynth: Cross-Art-Attention for Artistic Image Synthesis With Multimodal Diffusion Open
Although remarkable progress has been made in image style transfer, style is just one of the components of artistic paintings. Directly transferring extracted style features to natural images often results in outputs with obvious synthetic…
View article: InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation
InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation Open
Recent video generation research has focused heavily on isolated actions, leaving interactive motions-such as hand-face interactions-largely unexamined. These interactions are essential for emerging biometric authentication systems, which …
View article: Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis
Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis Open
Automating the synthesis of coordinated bimanual piano performances poses significant challenges, particularly in capturing the intricate choreography between the hands while preserving their distinct kinematic signatures. In this paper, w…
View article: Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation Open
In the domain of computer vision, Parameter-Efficient Tuning (PET) is increasingly replacing the traditional paradigm of pre-training followed by full fine-tuning. PET is particularly favored for its effectiveness in large foundation model…
View article: Combinatorial Optimization Perspective based Framework for Multi-behavior Recommendation
Combinatorial Optimization Perspective based Framework for Multi-behavior Recommendation Open
View article: IQPFR: An Image Quality Prior for Blind Face Restoration and Beyond
IQPFR: An Image Quality Prior for Blind Face Restoration and Beyond Open
Blind Face Restoration (BFR) addresses the challenge of reconstructing degraded low-quality (LQ) facial images into high-quality (HQ) outputs. Conventional approaches predominantly rely on learning feature representations from ground-truth…
View article: Multi-Omics Analysis Revealed That TAOK1 Can Be Used as a Prognostic Marker and Target in a Variety of Tumors, Especially in Cervical Cancer
Multi-Omics Analysis Revealed That TAOK1 Can Be Used as a Prognostic Marker and Target in a Variety of Tumors, Especially in Cervical Cancer Open
TAOK1 serves as a promising prognostic biomarker and potential therapeutic target, especially for cervical cancer. These results support its clinical potential in cancer prognosis and treatment strategies.
View article: LETSmix: a spatially informed and learning-based domain adaptation method for cell-type deconvolution in spatial transcriptomics
LETSmix: a spatially informed and learning-based domain adaptation method for cell-type deconvolution in spatial transcriptomics Open
View article: ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Open
Multi-layer image generation is a fundamental task that enables users to isolate, select, and edit specific image layers, thereby revolutionizing interactions with generative models. In this paper, we introduce the Anonymous Region Transfo…
View article: Diffusion Models in Low-Level Vision: A Survey
Diffusion Models in Low-Level Vision: A Survey Open
Deep generative models have gained considerable attention in low-level vision tasks due to their powerful generative capabilities. Among these, diffusion model-based approaches, which employ a forward diffusion process to degrade an image …
View article: Integrating Extra Modality Helps Segmentor Find Camouflaged Objects Well
Integrating Extra Modality Helps Segmentor Find Camouflaged Objects Well Open
Camouflaged Object Segmentation (COS) remains challenging because camouflaged objects exhibit only subtle visual differences from their backgrounds and single-modality RGB methods provide limited cues, leading researchers to explore multim…
View article: VLP: Vision-Language Preference Learning for Embodied Manipulation
VLP: Vision-Language Preference Learning for Embodied Manipulation Open
Reward engineering is one of the key challenges in Reinforcement Learning (RL). Preference-based RL effectively addresses this issue by learning from human feedback. However, it is both time-consuming and expensive to collect human prefere…
View article: STViT+: improving self-supervised multi-camera depth estimation with spatial-temporal context and adversarial geometry regularization
STViT+: improving self-supervised multi-camera depth estimation with spatial-temporal context and adversarial geometry regularization Open
Multi-camera depth estimation has gained significant attention in autonomous driving due to its importance in perceiving complex environments. However, extending monocular self-supervised methods to multi-camera setups introduces unique ch…