Donglin Wang
YOU?
Author Swipe
View article: RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models
RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models Open
Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they often fail to generalize reliably in out-of-distri…
View article: VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation
VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation Open
Robotic grasping is one of the most fundamental tasks in robotic manipulation, and grasp detection/generation has long been the subject of extensive research. Recently, language-driven grasp generation has emerged as a promising direction …
View article: VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators Open
Vision-Language-Action (VLA) models enable embodied decision-making but rely heavily on imitation learning, leading to compounding errors and poor robustness under distribution shift. Reinforcement learning (RL) can mitigate these issues y…
View article: Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling
Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling Open
Imitation learning (IL) enables efficient skill acquisition from demonstrations but often struggles with long-horizon tasks and high-precision control due to compounding errors. Residual policy learning offers a promising, model-agnostic s…
View article: GCHR : Goal-Conditioned Hindsight Regularization for Sample-Efficient Reinforcement Learning
GCHR : Goal-Conditioned Hindsight Regularization for Sample-Efficient Reinforcement Learning Open
Goal-conditioned reinforcement learning (GCRL) with sparse rewards remains a fundamental challenge in reinforcement learning. While hindsight experience replay (HER) has shown promise by relabeling collected trajectories with achieved goal…
View article: Multi-Task Multi-Agent Reinforcement Learning via Skill Graphs
Multi-Task Multi-Agent Reinforcement Learning via Skill Graphs Open
Multi-task multi-agent reinforcement learning (MT-MARL) has recently gained attention for its potential to enhance MARL's adaptability across multiple tasks. However, it is challenging for existing multi-task learning methods to handle com…
View article: SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning Open
Despite impressive advancements in Visual-Language Models (VLMs) for multi-modal tasks, their reliance on RGB inputs limits precise spatial understanding. Existing methods for integrating spatial cues, such as point clouds or depth, either…
View article: OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Open
Dual-system VLA (Vision-Language-Action) architectures have become a hot topic in embodied intelligence research, but there is a lack of sufficient open-source work for further performance analysis and optimization. To address this problem…
View article: MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning
MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning Open
Recently, a state-of-the-art series of algorithms—Goal-Conditioned Weighted Supervised Learning (GCWSL) methods—has been introduced to address the challenges inherent in offline goal-conditioned reinforcement learning (RL). GCWSL optimizes…
View article: CSL-L2M: Controllable Song-Level Lyric-to-Melody Generation Based on Conditional Transformer with Fine-Grained Lyric and Musical Controls
CSL-L2M: Controllable Song-Level Lyric-to-Melody Generation Based on Conditional Transformer with Fine-Grained Lyric and Musical Controls Open
Lyric-to-melody generation is a highly challenging task in the field of AI music generation. Due to the difficulty of learning strict yet weak correlations between lyrics and melodies, previous methods have suffered from weak controllabili…
View article: TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control
TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control Open
Complex high-dimensional spaces with high Degree-of-Freedom and complicated action spaces, such as humanoid robots equipped with dexterous hands, pose significant challenges for reinforcement learning (RL) algorithms, which need to wisely …
View article: Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration
Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration Open
This paper addresses the limitations of current humanoid robot control frameworks, which primarily rely on reactive mechanisms and lack autonomous interaction capabilities due to data scarcity. We propose Humanoid-VLA, a novel framework th…
View article: VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation
VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation Open
Vision-language-action models (VLAs) have become increasingly popular in robot manipulation for their end-to-end design and remarkable performance. However, existing VLAs rely heavily on vision-language models (VLMs) that only support text…
View article: GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation Open
With the rapid development of embodied artificial intelligence, significant progress has been made in vision-language-action (VLA) models for general robot decision-making. However, the majority of existing VLAs fail to account for the ine…
View article: Rethinking Latent Redundancy in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation
Rethinking Latent Redundancy in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation Open
Behavior Cloning (BC) is a widely adopted visual imitation learning method in robot manipulation. Current BC approaches often enhance generalization by leveraging large datasets and incorporating additional visual and textual modalities to…
View article: Enhancing Adversarial Transferability via Component-Wise Transformation
Enhancing Adversarial Transferability via Component-Wise Transformation Open
Deep Neural Networks (DNNs) are highly vulnerable to adversarial examples, which pose significant challenges in security-sensitive applications. Among various adversarial attack strategies, input transformation-based attacks have demonstra…
View article: Pyrolysis-Tuned Apple Branch-Derived Carbon Catalysts for Persulfate Activation: Dominance of 1O2 Non-Radical Pathways
Pyrolysis-Tuned Apple Branch-Derived Carbon Catalysts for Persulfate Activation: Dominance of 1O2 Non-Radical Pathways Open
View article: QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning
QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning Open
This paper addresses the inherent inference latency challenges associated with deploying multimodal large language models (MLLM) in quadruped vision-language-action (QUAR-VLA) tasks. Our investigation reveals that conventional parameter re…
View article: MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning
MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning Open
Recently, a state-of-the-art family of algorithms, known as Goal-Conditioned Weighted Supervised Learning (GCWSL) methods, has been introduced to tackle challenges in offline goal-conditioned reinforcement learning (RL). GCWSL optimizes a …
View article: CSL-L2M: Controllable Song-Level Lyric-to-Melody Generation Based on Conditional Transformer with Fine-Grained Lyric and Musical Controls
CSL-L2M: Controllable Song-Level Lyric-to-Melody Generation Based on Conditional Transformer with Fine-Grained Lyric and Musical Controls Open
Lyric-to-melody generation is a highly challenging task in the field of AI music generation. Due to the difficulty of learning strict yet weak correlations between lyrics and melodies, previous methods have suffered from weak controllabili…
View article: CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction Open
In robotic visuomotor policy learning, diffusion-based models have achieved significant success in improving the accuracy of action trajectory generation compared to traditional autoregressive models. However, they suffer from inefficiency…
View article: Nash CoT: Multi-Path Inference with Preference Equilibrium
Nash CoT: Multi-Path Inference with Preference Equilibrium Open
Chain of thought (CoT) is a reasoning framework that can enhance the performance of Large Language Models (LLMs) on complex inference tasks. In particular, among various studies related to CoT, multi-path inference stands out as a simple y…
View article: DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding
DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding Open
Visual grounding (VG) is a challenging task to localize an object in an image based on a textual description. Recent surge in the scale of VG models has substantially improved performance, but also introduced a significant burden on comput…
View article: Expressive Forecasting of 3D Whole-Body Human Motions
Expressive Forecasting of 3D Whole-Body Human Motions Open
Human motion forecasting, with the goal of estimating future human behavior over a period of time, is a fundamental task in many real-world applications. However, existing works typically concentrate on foretelling the major joints of the …
View article: GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot
GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot Open
Multi-task robot learning holds significant importance in tackling diverse and complex scenarios. However, current approaches are hindered by performance issues and difficulties in collecting training datasets. In this paper, we propose Ge…
View article: Context-Former: Stitching via Latent Conditioned Sequence Modeling
Context-Former: Stitching via Latent Conditioned Sequence Modeling Open
Offline reinforcement learning (RL) algorithms can learn better decision-making compared to behavior policies by stitching the suboptimal trajectories to derive more optimal ones. Meanwhile, Decision Transformer (DT) abstracts the RL as se…
View article: Continual Reinforcement Learning for Quadruped Robot Locomotion
Continual Reinforcement Learning for Quadruped Robot Locomotion Open
The ability to learn continuously is crucial for a robot to achieve a high level of intelligence and autonomy. In this paper, we consider continual reinforcement learning (RL) for quadruped robots, which includes the ability to continuousl…
View article: Broad-leaved forest’s impact on spontaneous activities of mice and their mental state
Broad-leaved forest’s impact on spontaneous activities of mice and their mental state Open
Empirical studies on the effects of urban forests on the health of humans and other animals are needed to rationalize the construction of urban forests for healthcare. The effects of urban forests (coniferous, broad-leaved, and mixed conif…
View article: Off-Dynamics Inverse Reinforcement Learning
Off-Dynamics Inverse Reinforcement Learning Open
Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration i…
View article: QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots Open
The important manifestation of robot intelligence is the ability to naturally interact and autonomously make decisions. Traditional approaches to robot control often compartmentalize perception, planning, and decision-making, simplifying s…