Multi-robot Cooperation Learning Based on Powell Deep Deterministic Policy Gradient
Published in 2022 International Conference on Intelligent Robotics and Applications, 2022
Recommended citation: Li Z, Xiao C, Liu Z, et al. Multi-robot Cooperation Learning Based on Powell Deep Deterministic Policy Gradient[C]//International Conference on Intelligent Robotics and Applications. Springer, Cham, 2022: 77-87.
Abstract: Model-free deep reinforcement learning algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods could not perform well in multi-agent environments due to the instability of teammates’ strategies. In this paper, a novel reinforcement learning method called Powell Deep Deterministic Policy Gradient (PDDPG) is proposed, which integrates Powell’s unconstrained optimization method and deep deterministic policy gradient. Specifically, each agent is regarded as a one-dimensional variable and the process of multi-robot cooperation learning is corresponding to optimal vector searching. A conjugate direction in Powell-method is constructed and is used to update the policies of agents. Finally, the proposed method is validated in a dogfight-like multi-agent environment. The results suggest that the proposed method outperforms much better than independent Deep Deterministic Policy Gradient (IDDPG), revealing a promising way in realizing high-quality independent learning.