Source code for the paper: Trust Region-Guided Proximal Policy Optmization (TRGPPO). The original code was forked from OpenAI baselines.
Method is tested on MuJoCo continuous control tasks and Atari discrete game tasks in OpenAI gym. Networks are trained using tensorflow1.10 and Python 3.6.
git clone --recursive https://github.com/wangyuhuix/TRGPPO
cd TRGPPO
pip install -r requirements.txt
- env: environment ID
- seed: random seed
- num_timesteps: number of timesteps
python -m baselines.ppo2_AdaClip.run --env=InvertedPendulum-v2 --seed=0
python -m baselines.ppo2_AdaClip.run --env=BeamRiderNoFrameskip-v4 --seed=0 --isatari