Refactor PPG and PPO for procgen #108

vwxyzjn · 2022-02-03T15:20:30Z

This PR refactors PPO and PPG. Specifically,

In PPO:

Match the learning rate of 5e-4: see the screenshot from the procgen paper.
Add reward normalization per the screenshot from the procgen paper.
Add layer initialization for output layers: while the IMPALA network itself uses the default layer initialization (common/models.py#L28-L69), the actor and critics head still uses the corresponding initialization.
Change the distribution mode from hard to easy (save compute) and correspondingly change total_timesteps from 100M to 25M.
Turn off learning rate annealing by default per the screenshot

In PPG:

In light of this, we would also require a new benchmark. Starting a PR to track this progress.

gitpod-io · 2022-02-03T15:20:32Z

vwxyzjn · 2022-02-05T01:02:45Z

vwxyzjn · 2022-02-06T22:10:25Z

vwxyzjn added 2 commits February 3, 2022 10:12

Refactor PPG and PPO for procgen

67b025f

Change distribution mode

d7d1c00

vwxyzjn added 4 commits February 3, 2022 10:22

Add a warning

946472c

Add a warning

ebee455

Turn off learning rate annealin by default

c02dbdc

quick fix

bca5080

vwxyzjn merged commit af6daa6 into master Feb 6, 2022

vwxyzjn deleted the PPG-refactor branch February 6, 2022 22:10

Provide feedback