We provide examples for how to run CSP-MARL training in a single machine for a couple of benchmark environments.
Pong-2p is a simple environment that is good for sanity check. It is a two-agent competitive game for pong. For each agent, the observation is a (84, 84, 4) stacked image of screen pixels, and the action is Discrete(6). See Appendix I of https://arxiv.org/abs/1907.09467
See the following examples for training with Pong-2p:
See the following:
See the following examples for Imitation Learning:
See the following:
See the following:
Soccer is a multi-agent environment with continuous control.
To run the soccer examples,
one should additionally install dm_control when installing Arena,
see the Arena docs here.
See the following:
TLeague also works for pure RL, which can be viewed as a special case of MARL where the number of agents equals to one.
Here we provide examples for how TLeague trains with Atari based on gym.
Make sure you've installed the correct dependencies, e.g.,
pip install gym[atari]==0.12.1See the following:
NOTE: the purpose of these examples are to show how TLeague trains in the case of single agent RL.
To get reasonable performance, one needs careful preprocessings.
Refer to the open/baselines code,
as well as the paper here.
Through all the examples above, we use the terminology:
unroll_length: how long the trajectory is when computing the RL Value Function using bootstrap. It must be a multiple ofbatch_size.rollout_length: the length for RNN BPTT.rollout_length=rollout_lenin thepolicy_config.rm_size: size for Replay Memory. It must be a multiple ofunroll_length.







