Explanipedia

A Comparative User Evaluation of XRL Explanations using Goal Identification Open

M J Towers, Yali Du, Christopher Freeman, Timothy J. Norman · 2025

Debugging is a core application of explainable reinforcement learning (XRL) algorithms; however, limited comparative evaluations have been conducted to understand their relative performance. We propose a novel evaluation methodology to tes…

Self-Verifying Reflection Helps Transformers with CoT Reasoning Open

Zhongwei Yu, W. Xia, Yanbo Xue, Bo Xu, Haifeng Zhang , et al. · 2025

Advanced large language models (LLMs) frequently reflect in reasoning chain-of-thoughts (CoTs), where they self-verify the correctness of current solutions and explore alternatives. However, given recent findings that LLMs detect limited e…

Towards Communication Efficient Multi-Agent Cooperations: Reinforcement Learning and LLM Open

Yang Su, Yali Du, Yansha Deng, Mischa Döhler · 2025

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors Open

Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang · 2025

Evaluating deep reinforcement learning (DRL) agents against targeted behavior attacks is critical for assessing their robustness. These attacks aim to manipulate the victim into specific behaviors that align with the attacker’s objectives,…

VLP: Vision-Language Preference Learning for Embodied Manipulation Open

Runze Liu, Chenjia Bai, Jiafei Lyu, Steven Sun, Yali Du , et al. · 2025

Reward engineering is one of the key challenges in Reinforcement Learning (RL). Preference-based RL effectively addresses this issue by learning from human feedback. However, it is both time-consuming and expensive to collect human prefere…

Quantifying the Self-Interest Level of Markov Social Dilemmas Open

Richard H. Willis, Yali Du, Joel Z. Leibo, G. Flucke · 2025

This paper introduces a novel method for estimating the self-interest level of Markov social dilemmas. We extend the concept of self-interest level from normal-form games to Markov games, providing a quantitative measure of the minimum rew…

Will Systems of LLM Agents Cooperate: An Investigation into a Social Dilemma Open

Richard H. Willis, Yali Du, Joel Z. Leibo, G. Flucke · 2025

As autonomous agents become more prevalent, understanding their collective behaviour in strategic interactions is crucial. This study investigates the emergent cooperative tendencies of systems of Large Language Model (LLM) agents in a soc…

Towards Communication Efficient Multi-Agent Cooperations: Reinforcement Learning and LLM Open

Yang Su, Yali Du, Yansha Deng, Mischa Döhler · 2025

Computational Hermeneutics: Evaluating Generative AI as a Cultural Technology Open

Cody Kommers, Ruth Ahnert, Maria Antoniak, Steve Benford, Mercedes Bunz , et al. · 2025

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning Open

W. M. Liu, Siya Qi, X. J. Wang, Chen Qian, Yali Du , et al. · 2025

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors Open

Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang · 2024

Evaluating deep reinforcement learning (DRL) agents against targeted behavior attacks is critical for assessing their robustness. These attacks aim to manipulate the victim into specific behaviors that align with the attacker's objectives,…

Resolving social dilemmas with minimal reward transfer Open

Richard H. Willis, Yali Du, Joel Z. Leibo, G. Flucke · 2024

Social dilemmas present a significant challenge in multi-agent cooperation because individuals are incentivised to behave in ways that undermine socially optimal outcomes. Consequently, self-interested agents often avoid collective behavio…

A Review of Safe Reinforcement Learning: Methods, Theories, and Applications Open

Shangding Gu, Yang Long, Yali Du, Guang Chen, Florian Walter , et al. · 2024

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-making tasks. However, safety concerns are raised during deploying RL in real-world applications, leading to a growing demand for safe RL algorithms, such…

Intermediate dimensions of Moran sets and their visualization Open

Yali Du, Jinlin Miao, Tao Wang, Haojie Xu · 2024

Intermediate dimensions are a class of new fractal dimensions which provide a spectrum of dimensions interpolating between the Hausdorff and box-counting dimensions. In this paper, we study the intermediate dimensions of Moran sets. Moran …

Efficient and scalable reinforcement learning for large-scale network control Open

Chengdong Ma, Aming Li, Yali Du, Hao Dong, Yaodong Yang · 2024

The primary challenge in the development of large-scale artificial intelligence (AI) systems lies in achieving scalable decision-making—extending the AI models while maintaining sufficient performance. Existing research indicates that dist…

Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey Open

Ruiqi Zhang, Jing Hou, Florian Walter, Shangding Gu, Jiayi Guan , et al. · 2024

Reinforcement Learning (RL) is a potent tool for sequential decision-making and has achieved performance surpassing human capabilities across many challenging real-world tasks. As the extension of RL in the multi-agent system domain, multi…

Explaining an Agent's Future Beliefs through Temporally Decomposing Future Reward Estimators Open

M J Towers, Yali Du, Christopher Freeman, Timothy J. Norman · 2024

Future reward estimation is a core component of reinforcement learning agents; i.e., Q-value and state-value functions, predicting an agent's sum of future rewards. Their scalar output, however, obfuscates when or what individual future re…

Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf Open

Xuanfa Jin, Ziyan Wang, Yali Du, Meng Fang, Haifeng Zhang , et al. · 2024

Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control o…

Human-Guided Moral Decision Making in Text-Based Games Open

Zijing Shi, Meng Fang, Ling Chen, Yali Du, Jun Wang · 2024

Training reinforcement learning (RL) agents to achieve desired goals while also acting morally is a challenging problem. Transformer-based language models (LMs) have shown some promise in moral awareness, but their use in different context…

STAS: Spatial-Temporal Return Decomposition for Solving Sparse Rewards Problems in Multi-agent Reinforcement Learning Open

Sirui Chen, Zhaowei Zhang, Yaodong Yang, Yali Du · 2024

Centralized Training with Decentralized Execution (CTDE) has been proven to be an effective paradigm in cooperative multi-agent reinforcement learning (MARL). One of the major challenges is credit assignment, which aims to credit agents by…

TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient Open

Xingzhou Lou, Junge Zhang, Timothy J. Norman, Kaiqi Huang, Yali Du · 2024

Multi-Agent Policy Gradient (MAPG) has made significant progress in recent years. However, centralized critics in state-of-the-art MAPG methods still face the centralized-decentralized mismatch (CDM) issue, which means sub-optimal actions …

All Language Models Large and Small Open

Zhixun Chen, Yali Du, David Mguni · 2024

Many leading language models (LMs) use high-intensity computational resources both during training and execution. This poses the challenge of lowering resource costs for deployment and faster execution of decision-making tasks among others…

Aligning Individual and Collective Objectives in Multi-Agent Cooperation Open

Yang Li, Wenhao Zhang, Jianhong Wang, Shao Zhang, Yali Du , et al. · 2024

Among the research topics in multi-agent learning, mixed-motive cooperation is one of the most prominent challenges, primarily due to the mismatch between individual and collective goals. The cutting-edge research is focused on incorporati…

Natural Language Reinforcement Learning Open

Xidong Feng, Ziyu Wan, Mengyue Yang, Ziyan Wang, Girish A. Koushiks , et al. · 2024

Reinforcement Learning (RL) has shown remarkable abilities in learning policies for decision-making tasks. However, RL is often hindered by issues such as low sample efficiency, lack of interpretability, and sparse supervision signals. To …

Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models Open

Xingzhou Lou, Junge Zhang, Ziyan Wang, Kaiqi Huang, Yali Du · 2024

Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints. Employing constraints expressed via easily-understandable human language offers considerable potential for real-world applications due t…

Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game Open

Zijing Shi, Fang Meng, Shunfeng Zheng, Shilong Deng, Ling Chen , et al. · 2023

Multi-agent collaboration with Large Language Models (LLMs) demonstrates proficiency in basic tasks, yet its efficiency in more complex scenarios remains unexplored. In gaming environments, these agents often face situations without establ…

TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient Open

Xingzhou Lou, Junge Zhang, Timothy J. Norman, Kaiqi Huang, Yali Du · 2023

Multi-Agent Policy Gradient (MAPG) has made significant progress in recent years. However, centralized critics in state-of-the-art MAPG methods still face the centralized-decentralized mismatch (CDM) issue, which means sub-optimal actions …

A Review of Cooperation in Multi-agent Learning Open

Yali Du, Joel Z. Leibo, Usman Islam, Richard H. Willis, Peter Sunehag · 2023

Cooperation in multi-agent learning (MAL) is a topic at the intersection of numerous disciplines, including game theory, economics, social sciences, and evolutionary biology. Research in this area aims to understand both how agents can coo…

MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment Open

Ziyan Wang, Yali Du, Yudi Zhang, Meng Fang, Biwei Huang · 2023

Offline Multi-agent Reinforcement Learning (MARL) is valuable in scenarios where online interaction is impractical or risky. While independent learning in MARL offers flexibility and scalability, accurately assigning credit to individual a…

A human-centered safe robot reinforcement learning framework with interactive behaviors Open

Shangding Gu, Alap Kshirsagar, Yali Du, Guang Chen, Jan Peters , et al. · 2023

Deployment of Reinforcement Learning (RL) algorithms for robotics applications in the real world requires ensuring the safety of the robot and its environment. Safe Robot RL (SRRL) is a crucial step toward achieving human-robot coexistence…

Yali Du YOU? Author Swipe