Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor PPG and PPO for procgen #108

Merged
merged 6 commits into from
Feb 6, 2022
Merged

Refactor PPG and PPO for procgen #108

merged 6 commits into from
Feb 6, 2022

Conversation

vwxyzjn
Copy link
Owner

@vwxyzjn vwxyzjn commented Feb 3, 2022

This PR refactors PPO and PPG. Specifically,

In PPO:

  • Match the learning rate of 5e-4: see the screenshot from the procgen paper.
  • Add reward normalization per the screenshot from the procgen paper.
  • Add layer initialization for output layers: while the IMPALA network itself uses the default layer initialization (common/models.py#L28-L69), the actor and critics head still uses the corresponding initialization.
  • Change the distribution mode from hard to easy (save compute) and correspondingly change total_timesteps from 100M to 25M.
  • Turn off learning rate annealing by default per the screenshot
    image

In PPG:

  • Refactor according to the new format
  • Rescale the image by diving 255 (previously not done)

In light of this, we would also require a new benchmark. Starting a PR to track this progress.

@gitpod-io
Copy link

gitpod-io bot commented Feb 3, 2022

@vwxyzjn
Copy link
Owner Author

vwxyzjn commented Feb 5, 2022

@vwxyzjn
Copy link
Owner Author

vwxyzjn commented Feb 6, 2022

Based on results from https://wandb.ai/costa-huang/cleanRL/reports/Procgen-Our-PPO-vs-openai-baselines-PPO--VmlldzoxNTIyNjA5, this was a successful refactoring. Merging now

image
image
image

@vwxyzjn vwxyzjn merged commit af6daa6 into master Feb 6, 2022
@vwxyzjn vwxyzjn deleted the PPG-refactor branch February 6, 2022 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant