Skip to content

An extension of the PyMARL codebase that includes additional algorithms and environment support

License

Notifications You must be signed in to change notification settings

uoe-agents/epymarl

Repository files navigation

Extended Python MARL framework - EPyMARL

EPyMARL is an extension of PyMARL, and includes

  • New! Support for training in environments with individual rewards for all agents (for all algorithms that support such settings)
  • New! Support for logging to weights and biases (W&B)
  • Additional algorithms (IA2C, IPPO, MADDPG, MAA2C and MAPPO)
  • Support for Gym environments (on top of the existing SMAC support)
  • Option for no-parameter sharing between agents (original PyMARL only allowed for parameter sharing)
  • Flexibility with extra implementation details (e.g. hard/soft updates, reward standarization, and more)
  • Consistency of implementations between different algorithms (fair comparisons)

See our blog post here: https://agents.inf.ed.ac.uk/blog/epymarl/

Update as of June 2024!

Support for training in environments with individual rewards for all agents

Previously PyMARL and EPyMARL only supported training of MARL algorithms in common-reward environments. To support environments which naturally provide individual rewards for agents (e.g. LBF and RWARE), we previously scalarised the rewards of all agents using a sum operation to obtain a single common reward that was then given to all agents. We are glad to announce that EPyMARL now supports training in general-sum reward environments (for all algorithms that are sound to train in general-sum reward settings)!

  • Algorithms that support general-sum reward envs: IA2C, IPPO, MAA2C, MAPPO, IQL, PAC
  • Algorithms that only support common-reward envs: COMA, VDN, QMIX, QTRAN

By default, EPyMARL runs experiments with common rewards (as done previously). To run an experiment with individual rewards for all agents, set common_reward=False. For example to run MAPPO in a LBF task with individual rewards:

python3 src/main.py --config=mappo --env-config=gymma with env_args.time_limit=25 env_args.key="lbforaging:Foraging-8x8-2p-3f-v2" common_reward=False

When using the common_reward=True setup in environments which naturally provide individual rewards, by default we scalarise the rewards into a common reward by summing up all rewards. This is now configurable and we support the mean operation as an alternative scalarisation. To use the mean scalarisation, set reward_scalarisation="mean".

Plotting script

We have added a simple plotting script under plot_results.py to load data from sacred logs and visualise them for executed experiments. The script supports plotting of any logged metric, can apply simple window-smoothing, aggregates results across multiple runs of the same algorithm, and can filter which results to plot based on algorithm and environment names.

If multiple configs of the same algorithm exist within the loaded data and you only want to plot the best config per algorithm, then add the --best_per_alg argument! If this argument is not set, the script will visualise all configs of each (filtered) algorithm and show the values of the hyperparameter config that differ across all present configs in the legend.

Weights and Biases (W&B) Logging

We now support logging to W&B! To log data to W&B, you need to install the library with pip install wandb and setup W&B (see their documentation). To tell EPyMARL to log data to W&B, you then need to specify the following config parameters:

use_wandb: True # Log results to W&B
wandb_team: null # W&B team name
wandb_project: null # W&B project name

to specify the team and project you wish to log to within your account, and set use_wandb=True. By default, we log all W&B runs in "offline" mode, i.e. the data will only be stored locally and can be uploaded to your W&B account via wandb sync .... To directly log runs online, please specify wandb_mode="online" within the config.

We also support logging all stored models directly to W&B so you can download and inspect these from the W&B online dashboard. To do so, use the following config parameters:

wandb_save_model: True # Save models to W&B (only done if use_wandb is True and save_model is True)
save_model: True # Save the models to disk
save_model_interval: 50000

Note that models are only saved in general if save_model=True and to further log them to W&B you need to specify use_wandb, wandb_team, wandb_project, and wandb_save_model=True.

Update as of 15th July 2023!

We have released our Pareto Actor-Critic algorithm, accepted in TMLR, as part of the E-PyMARL source code.

Find the paper here: https://arxiv.org/abs/2209.14344

Pareto-AC (Pareto-AC), is an actor-critic algorithm that utilises a simple principle of no-conflict games (and, in turn, cooperative games with identical rewards): each agent can assume the others will choose actions that will lead to a Pareto-optimal equilibrium. Pareto-AC works especially well in environments with multiple suboptimal equilibria (a problem is also known as relative over-generalisation). We have seen impressive results in a diverse set of multi-agent games with suboptimal equilibria, including the matrix games of the MARL benchmark, but also LBF variations with high penalties.

PAC introduces additional dependencies specified in pac_requirements.txt. To install its dependencies, run

pip install -r pac_requirements.txt

To run Pareto-AC in an environment, for example the Penalty game, you can run:

python3 main.py --config=pac_ns --env-config=gymma with env_args.time_limit=1 env_args.key=matrixgames:penalty-100-nostate-v0

Table of Contents

Installation & Run instructions

For information on installing and using this codebase with SMAC, we suggest visiting and reading the original PyMARL README. Here, we maintain information on using the extra features EPyMARL offers. To install the codebase, clone this repo and run:

pip install -r requirements.txt

Note that the PAC algorithm and environments introduce additional dependencies. To install these dependencies, use the provided requirements files:

# install PAC dependencies
pip install -r pac_requirements.txt
# install environments
pip install -r env_requirements.txt

Installing Environments

In Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks we introduce the Level-Based Foraging (LBF) and Multi-Robot Warehouse (RWARE) environments, and additionally evaluate in SMAC, Multi-agent Particle environments. and a set of matrix games.

To install all environments, you can use the provided env_requirements.txt:

pip install -r env_requirements.txt

which will install LBF, RWARE, SMAC, our MPE form, and matrix games.

To install these individually, please visit:

Example of using LBF:

python3 src/main.py --config=qmix --env-config=gymma with env_args.time_limit=25 env_args.key="lbforaging:Foraging-8x8-2p-3f-v2"

Example of using RWARE:

python3 src/main.py --config=qmix --env-config=gymma with env_args.time_limit=500 env_args.key="rware:rware-tiny-2ag-v1"

For MPE, our fork is needed. Essentially all it does (other than fixing some gym compatibility issues) is i) registering the environments with the gym interface when imported as a package and ii) correctly seeding the environments iii) makes the action space compatible with Gym (I think MPE originally does a weird one-hot encoding of the actions).

The environments names in MPE are:

...
    "multi_speaker_listener": "MultiSpeakerListener-v0",
    "simple_adversary": "SimpleAdversary-v0",
    "simple_crypto": "SimpleCrypto-v0",
    "simple_push": "SimplePush-v0",
    "simple_reference": "SimpleReference-v0",
    "simple_speaker_listener": "SimpleSpeakerListener-v0",
    "simple_spread": "SimpleSpread-v0",
    "simple_tag": "SimpleTag-v0",
    "simple_world_comm": "SimpleWorldComm-v0",
...

Therefore, after installing them you can run it using:

python3 src/main.py --config=qmix --env-config=gymma with env_args.time_limit=25 env_args.key="mpe:SimpleSpeakerListener-v0"

The pretrained agents are included in this repo here. You can use them with:

python3 src/main.py --config=qmix --env-config=gymma with env_args.time_limit=25 env_args.key="mpe:SimpleAdversary-v0" env_args.pretrained_wrapper="PretrainedAdversary"

and

python3 src/main.py --config=qmix --env-config=gymma with env_args.time_limit=25 env_args.key="mpe:SimpleTag-v0" env_args.pretrained_wrapper="PretrainedTag"

Installing MARBLER

MARBLER is a gym built for the Robotarium to enable free and effortless Sim2Real evaluation of algorithms. Clone it and follow the instructions on its Github to install it.

Example of using MARBLER:

python3 src/main.py --config=qmix --env-config=gymma with env_args.time_limit=10000 env_args.key="robotarium_gym:PredatorCapturePrey-v0"

Using A Custom Gym Environment

EPyMARL supports environments that have been registered with Gym. The only difference with the Gym framework would be that the returned rewards should be a tuple (one reward for each agent). In this cooperative framework we sum these rewards together.

Environments that are supported out of the box are the ones that are registered in Gym automatically. Examples are: Level-Based Foraging and RWARE.

To register a custom environment with Gym, use the template below (taken from Level-Based Foraging).

from gym.envs.registration import registry, register, make, spec
register(
  id="Foraging-8x8-2p-3f-v2",                     # Environment ID.
  entry_point="lbforaging.foraging:ForagingEnv",  # The entry point for the environment class
  kwargs={
            ...                                   # Arguments that go to ForagingEnv's __init__ function.
        },
    )

Run an experiment on a Gym environment

python3 src/main.py --config=qmix --env-config=gymma with env_args.time_limit=50 env_args.key="lbforaging:Foraging-8x8-2p-3f-v2"

In the above command --env-config=gymma (in constrast to sc2 will use a Gym compatible wrapper). env_args.time_limit=50 sets the maximum episode length to 50 and env_args.key="..." provides the Gym's environment ID. In the ID, the lbforaging: part is the module name (i.e. import lbforaging will run automatically).

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

All results will be stored in the Results folder.

Run a hyperparameter search

We include a script named search.py which reads a search configuration file (e.g. the included search.config.example.yaml) and runs a hyperparameter search in one or more tasks. The script can be run using

python search.py run --config=search.config.example.yaml --seeds 5 locally

In a cluster environment where one run should go to a single process, it can also be called in a batch script like:

python search.py run --config=search.config.example.yaml --seeds 5 single 1

where the 1 is an index to the particular hyperparameter configuration and can take values from 1 to the number of different combinations.

Saving and loading learnt models

Saving models

You can save the learnt models to disk by setting save_model = True, which is set to False by default. The frequency of saving models can be adjusted using save_model_interval configuration. Models will be saved in the result directory, under the folder called models. The directory corresponding each run will contain models saved throughout the experiment, each within a folder corresponding to the number of timesteps passed since starting the learning process.

Loading models

Learnt models can be loaded using the checkpoint_path and load_step parameters. checkpoint_path should point to a directory stored for a run by epymarl as stated above. The pointed-to directory should contain sub-directories for various timesteps at which checkpoints were stored. If load_step is not provided (by default load_step=0) then the last checkpoint of the pointed-to run is loaded. Otherwise the checkpoint of the closest timestep to load_step will be loaded. After loading, the learning will proceed from the corresponding timestep.

To only evaluate loaded models without any training, set the checkpoint_path and load_step parameters accordingly for the loading, and additionally set evaluate=True. Then, the loaded checkpoint will be evaluated for test_nepisode episodes before terminating the run.

Citing EPyMARL and PyMARL

The Extended PyMARL (EPyMARL) codebase was used in Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks.

Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, & Stefano V. Albrecht. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS), 2021

In BibTeX format:

@inproceedings{papoudakis2021benchmarking,
   title={Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks},
   author={Georgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht},
   booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS)},
   year={2021},
   url = {http://arxiv.org/abs/2006.07869},
   openreview = {https://openreview.net/forum?id=cIrPX-Sn5n},
   code = {https://github.com/uoe-agents/epymarl},
}

If you use the original PyMARL in your research, please cite the SMAC paper.

M. Samvelyan, T. Rashid, C. Schroeder de Witt, G. Farquhar, N. Nardelli, T.G.J. Rudner, C.-M. Hung, P.H.S. Torr, J. Foerster, S. Whiteson. The StarCraft Multi-Agent Challenge, CoRR abs/1902.04043, 2019.

In BibTeX format:

@article{samvelyan19smac,
  title = {{The} {StarCraft} {Multi}-{Agent} {Challenge}},
  author = {Mikayel Samvelyan and Tabish Rashid and Christian Schroeder de Witt and Gregory Farquhar and Nantas Nardelli and Tim G. J. Rudner and Chia-Man Hung and Philiph H. S. Torr and Jakob Foerster and Shimon Whiteson},
  journal = {CoRR},
  volume = {abs/1902.04043},
  year = {2019},
}

License

All the source code that has been taken from the PyMARL repository was licensed (and remains so) under the Apache License v2.0 (included in LICENSE file). Any new code is also licensed under the Apache License v2.0

About

An extension of the PyMARL codebase that includes additional algorithms and environment support

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages