Explanipedia

RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models Open

Hongyin Zhang, Shuo Zhang, Jun Jin, Qingyi Zeng, Bin Jiang , et al. · 2025

Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they often fail to generalize reliably in out-of-distri…

VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation Open

Ranran Haoran Zhang, Shuanghao Bai, Wanqi Zhou, Yu Zhang, Qi Zhang , et al. · 2025

Robotic grasping is one of the most fundamental tasks in robotic manipulation, and grasp detection/generation has long been the subject of extensive research. Recently, language-driven grasp generation has emerged as a promising direction …

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators Open

Hanyang Li, Pengxiang Ding, Runze Suo, Yihao Wang, Zirui Ge , et al. · 2025

Vision-Language-Action (VLA) models enable embodied decision-making but rely heavily on imitation learning, leading to compounding errors and poor robustness under distribution shift. Reinforcement learning (RL) can mitigate these issues y…

Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling Open

Z. Gong, Su-Xiang Lyu, Pengxiang Ding, Wei Xiao, Donglin Wang · 2025

Imitation learning (IL) enables efficient skill acquisition from demonstrations but often struggles with long-horizon tasks and high-precision control due to compounding errors. Residual policy learning offers a promising, model-agnostic s…

GCHR : Goal-Conditioned Hindsight Regularization for Sample-Efficient Reinforcement Learning Open

Lei Xing, Wenyan Yang, Kaiqiang Ke, Shangzong Yang, Xuetao Zhang , et al. · 2025

Goal-conditioned reinforcement learning (GCRL) with sparse rewards remains a fundamental challenge in reinforcement learning. While hindsight experience replay (HER) has shown promise by relabeling collected trajectories with achieved goal…

Multi-Task Multi-Agent Reinforcement Learning via Skill Graphs Open

Guobin Zhu, Rui Zhou, Wenkang Ji, Hongyin Zhang, Donglin Wang , et al. · 2025

Multi-task multi-agent reinforcement learning (MT-MARL) has recently gained attention for its potential to enhance MARL's adaptability across multiple tasks. However, it is challenging for existing multi-task learning methods to handle com…

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning Open

Yang Liu, Ming Ma, Xiaomin Yu, Pengxiang Ding, Han Zhao , et al. · 2025

Despite impressive advancements in Visual-Language Models (VLMs) for multi-modal tasks, their reliance on RGB inputs limits precise spatial understanding. Existing methods for integrating spatial cues, such as point clouds or depth, either…

OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Open

Can Cui, Pengxiang Ding, Wenxuan Song, Shuanghao Bai, Xinyang Tong , et al. · 2025

Dual-system VLA (Vision-Language-Action) architectures have become a hot topic in embodied intelligence research, but there is a lack of sufficient open-source work for further performance analysis and optimization. To address this problem…

MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning Open

Lei Xing, Xuetao Zhang, Donglin Wang · 2025

Recently, a state-of-the-art series of algorithms—Goal-Conditioned Weighted Supervised Learning (GCWSL) methods—has been introduced to address the challenges inherent in offline goal-conditioned reinforcement learning (RL). GCWSL optimizes…

CSL-L2M: Controllable Song-Level Lyric-to-Melody Generation Based on Conditional Transformer with Fine-Grained Lyric and Musical Controls Open

Li Chai, Donglin Wang · 2025

Lyric-to-melody generation is a highly challenging task in the field of AI music generation. Due to the difficulty of learning strict yet weak correlations between lyrics and melodies, previous methods have suffered from weak controllabili…

TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control Open

Zifeng Zhuang, Diyuan Shi, Runze Suo, Xiaolei He, Hongyin Zhang , et al. · 2025

Complex high-dimensional spaces with high Degree-of-Freedom and complicated action spaces, such as humanoid robots equipped with dexterous hands, pose significant challenges for reinforcement learning (RL) algorithms, which need to wisely …

Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration Open

Pengxiang Ding, Jianfei Ma, Xinyang Tong, B. S. Zou, Xiangjian Luo , et al. · 2025

This paper addresses the limitations of current humanoid robot control frameworks, which primarily rely on reactive mechanisms and lack autonomous interaction capabilities due to data scarcity. We propose Humanoid-VLA, a novel framework th…

VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation Open

Wei Zhao, Pengxiang Ding, Min Zhang, Zheng Gong, Shuanghao Bai , et al. · 2025

Vision-language-action models (VLAs) have become increasingly popular in robot manipulation for their end-to-end design and remarkable performance. However, existing VLAs rely heavily on vision-language models (VLMs) that only support text…

GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation Open

Hongyin Zhang, Pengxiang Ding, Shangke Lyu, Ying Peng, Donglin Wang · 2025

With the rapid development of embodied artificial intelligence, significant progress has been made in vision-language-action (VLA) models for general robot decision-making. However, the majority of existing VLAs fail to account for the ine…

Rethinking Latent Redundancy in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation Open

Shiwen Bai, Wanqi Zhou, Pengxiang Ding, Wei Zhao, Donglin Wang , et al. · 2025

Behavior Cloning (BC) is a widely adopted visual imitation learning method in robot manipulation. Current BC approaches often enhance generalization by leveraging large datasets and incorporating additional visual and textual modalities to…

Enhancing Adversarial Transferability via Component-Wise Transformation Open

Hangyu Liu, Bo Peng, Pengxiang Ding, Donglin Wang · 2025

Deep Neural Networks (DNNs) are highly vulnerable to adversarial examples, which pose significant challenges in security-sensitive applications. Among various adversarial attack strategies, input transformation-based attacks have demonstra…

Pyrolysis-Tuned Apple Branch-Derived Carbon Catalysts for Persulfate Activation: Dominance of 1O2 Non-Radical Pathways Open

Lang Li, Nan Chen, Ning An, Haolei Mou, Chuanping Feng , et al. · 2025

QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning Open

Xinyang Tong, Pengxiang Ding, Donglin Wang, Wenjie Zhang, Can Cui , et al. · 2024

This paper addresses the inherent inference latency challenges associated with deploying multimodal large language models (MLLM) in quadruped vision-language-action (QUAR-VLA) tasks. Our investigation reveals that conventional parameter re…

MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning Open

Lei Xing, Xuetao Zhang, Donglin Wang · 2024

Recently, a state-of-the-art family of algorithms, known as Goal-Conditioned Weighted Supervised Learning (GCWSL) methods, has been introduced to tackle challenges in offline goal-conditioned reinforcement learning (RL). GCWSL optimizes a …

CSL-L2M: Controllable Song-Level Lyric-to-Melody Generation Based on Conditional Transformer with Fine-Grained Lyric and Musical Controls Open

Li Chai, Donglin Wang · 2024

Lyric-to-melody generation is a highly challenging task in the field of AI music generation. Due to the difficulty of learning strict yet weak correlations between lyrics and melodies, previous methods have suffered from weak controllabili…

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction Open

Zheng Gong, Pengxiang Ding, Shangke Lyu, Siteng Huang, Mingyang Sun , et al. · 2024

In robotic visuomotor policy learning, diffusion-based models have achieved significant success in improving the accuracy of action trajectory generation compared to traditional autoregressive models. However, they suffer from inefficiency…

Nash CoT: Multi-Path Inference with Preference Equilibrium Open

Ziqi Zhang, Cunxiang Wang, Xiong Xiao, Yue Zhang, Donglin Wang · 2024

Chain of thought (CoT) is a reasoning framework that can enhance the performance of Large Language Models (LLMs) on complex inference tasks. In particular, among various studies related to CoT, multi-path inference stands out as a simple y…

DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding Open

Ting Liu, Xuyang Liu, Siteng Huang, Honggang Chen, Quanjun Yin , et al. · 2024

Visual grounding (VG) is a challenging task to localize an object in an image based on a textual description. Recent surge in the scale of VG models has substantially improved performance, but also introduced a significant burden on comput…

Expressive Forecasting of 3D Whole-Body Human Motions Open

Pengxiang Ding, Qiongjie Cui, Haofan Wang, Min Zhang, Mengyuan Liu , et al. · 2024

Human motion forecasting, with the goal of estimating future human behavior over a period of time, is a fundamental task in many real-world applications. However, existing works typically concentrate on foretelling the major joints of the …

GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot Open

Wenxuan Song, Han Zhao, Pengxiang Ding, Can Cui, Shangke Lyu , et al. · 2024

Multi-task robot learning holds significant importance in tackling diverse and complex scenarios. However, current approaches are hindered by performance issues and difficulties in collecting training datasets. In this paper, we propose Ge…

Context-Former: Stitching via Latent Conditioned Sequence Modeling Open

Ziqi Zhang, Jingzehua Xu, Zifeng Zhuang, Jinxin Liu, Donglin Wang · 2024

Offline reinforcement learning (RL) algorithms can learn better decision-making compared to behavior policies by stitching the suboptimal trajectories to derive more optimal ones. Meanwhile, Decision Transformer (DT) abstracts the RL as se…

Continual Reinforcement Learning for Quadruped Robot Locomotion Open

Sibo Gai, Shangke Lyu, Hongyin Zhang, Donglin Wang · 2024

The ability to learn continuously is crucial for a robot to achieve a high level of intelligence and autonomy. In this paper, we consider continual reinforcement learning (RL) for quadruped robots, which includes the ability to continuousl…

Broad-leaved forest’s impact on spontaneous activities of mice and their mental state Open

Donglin Wang, Qian Wang, Weining Du, Yuebin Wang, Jianhua Shu , et al. · 2024

Empirical studies on the effects of urban forests on the health of humans and other animals are needed to rationalize the construction of urban forests for healthcare. The effects of urban forests (coniferous, broad-leaved, and mixed conif…

Off-Dynamics Inverse Reinforcement Learning Open

Yachen Kang, Jinxin Liu, Donglin Wang · 2024

Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration i…

QUAR-VLA: Vision-Language-Action Model for Quadruped Robots Open

Pengxiang Ding, Han Zhao, Zhitao Wang, Zhenyu Wei, Shangke Lyu , et al. · 2023

The important manifestation of robot intelligence is the ability to naturally interact and autonomously make decisions. Traditional approaches to robot control often compartmentalize perception, planning, and decision-making, simplifying s…

Donglin Wang YOU? Author Swipe