Coding implementation for paper: Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning (ICML 2023)
We study von Neumann's ratio game, a very simple stochastic game. We implement and compare two algorithms:
(1) Our algorithm with sequential policy updates
(2) Independent policy gradient algorithm, e.g. this paper.
Results are shown below.
Policies are initialized close to the stationary point, stepsize is 0.001.
Policies are initialized close to the stationary point, stepsize is 0.02.
Policies are initialized close to the stationary point, stepsize is 0.05.
Both policies are uniformly initialized, stepsize is 0.001.