This repo contains the source code to reproduce the results in the paper A Closer Look at Invalid Action Masking in Policy Gradient Algorithms.
If you have pyenv or poetry:
pyenv install -s $(sed "s/\/envs.*//" .python-version)
pyenv virtualenv $(sed "s/\/envs\// /" .python-version)
pyenv activate $(cat .python-version)
poetry install
rm ~/microrts -fR && mkdir ~/microrts && \
wget -O ~/microrts/microrts.zip http://microrts.s3.amazonaws.com/microrts/artifacts/202004222224.microrts.zip && \
unzip ~/microrts/microrts.zip -d ~/microrts/ && \
rm ~/microrts/microrts.zip
Else, you can also install dependencies via pip install -r requirements.txt
.
python invalid_action_masking/ppo_10x10.py
python invalid_action_masking/ppo_no_adj_10x10.py
python invalid_action_masking/ppo_no_mask_10x10.py
python ppo.py # newer & recommended PPO implementation that matches implementation details in `openai/baselines`
We have tested these scripts to reproduce but it is possible that there is a bug and maybe we are assuming something specific regarding the environment. If you couldn't reproduce our results, please file an issue and we will address it as soon as the double-blind review is over.