Mathematics • Vol 10 • No 15
Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning
August 2022 • Siying Wang, Wenyu Chen, Jian Hu, Siyue Hu, Liwei Huang
Leveraging global state information to enhance policy optimization is a common approach in multi-agent reinforcement learning (MARL). Even with the supplement of state information, the agents still suffer from insufficient exploration in the training stage. Moreover, training with batch-sampled examples from the replay buffer will induce the policy overfitting problem, i.e., multi-agent proximal policy optimization (MAPPO) may not perform as good as independent PPO (IPPO) even with additional information in the ce…