Exploring foci of:
arXiv (Cornell University)
B2MAPO: A Batch-by-Batch Multi-Agent Policy Optimization to Balance Performance and Efficiency
July 2024 • Wenjing Zhang, Wei Zhang, Wenqing Hu, Yifan Wang
Most multi-agent reinforcement learning approaches adopt two types of policy optimization methods that either update policy simultaneously or sequentially. Simultaneously updating policies of all agents introduces non-stationarity problem. Although sequentially updating policies agent-by-agent in an appropriate order improves policy performance, it is prone to low efficiency due to sequential execution, resulting in longer model training and execution time. Intuitively, partitioning policies of all agents accordin…
Computer Science
Economics
Neuroscience