Add atari ppo example #523

nuance1979 · 2022-02-08T19:02:35Z

I have marked all applicable categories:
- exception-raising fix
- algorithm implementation fix
- documentation modification
- new feature
I have reformatted the code using make format (required)
I have checked the code using make commit-checks (required)
If applicable, I have mentioned the relevant/related issue(s)
If applicable, I have listed every items in this Pull Request below

I needed a policy gradient baseline myself and it has been requested several times (#497, #374, #440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters.

Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in DLR-RM/rl-baselines3-zoo#156.

Trinkle23897 · 2022-02-08T21:29:19Z

Things to check:

obs/255
shared policy-value network till which MLP layer

codecov-commenter · 2022-02-08T21:44:15Z

Codecov Report

Merging #523 (a6be940) into master (3d697aa) will increase coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #523      +/-   ##
==========================================
+ Coverage   94.33%   94.35%   +0.02%     
==========================================
  Files          63       63              
  Lines        4251     4251              
==========================================
+ Hits         4010     4011       +1     
+ Misses        241      240       -1

Flag	Coverage Δ
unittests	`94.35% <ø> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/policy/modelfree/npg.py	`98.85% <0.00%> (+1.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d697aa...a6be940. Read the comment docs.

…to atari_ppo

nuance1979 · 2022-02-10T04:31:27Z

Things to check:

obs/255

I did some experiments (see below) and it seemed that normalizing obs did not help. Since our previous exps do not normalize obs, I'd suggest keeping it as an option ("--scale-obs") but turning it off by default.

shared policy-value network till which MLP layer

I made the change so that the network structure matches other (sb3, cleanrl) implementations, i.e., sharing NatureCNN + 1 Linear-512, then both actor and critic are just one linear layer. The results seems to be about the same as my old structure. I'll rerun the exps and update the plots.

With sb3 network structure: best_reward=1240

With sb3 network structure + obs/255: best_reward=888

With my old network structure: best_reward=1184

With my old network structure + obs/255: best_reward=787

I needed a policy gradient baseline myself and it has been requested several times (thu-ml#497, thu-ml#374, thu-ml#440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters. Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in DLR-RM/rl-baselines3-zoo#156.

nuance1979 and others added 2 commits February 8, 2022 10:26

Add atari ppo example

729b35b

Merge branch 'master' into atari_ppo

a6be940

nuance1979 added 2 commits February 9, 2022 14:58

update network architecture to match sb3 and other repos

07d129a

Merge branch 'atari_ppo' of https://github.com/nuance1979/tianshou in…

dda6be5

…to atari_ppo

Trinkle23897 added 2 commits February 10, 2022 14:53

revert variable name

1f3ce07

revert variable name

abefef7

This was linked to issues Feb 10, 2022

Atari example for policy-based method like A2C, PPO #374

Closed

Atari CNN Input Scale Issue #440

Closed

PPO example for Spaceinvaders #497

Closed

Trinkle23897 approved these changes Feb 10, 2022

View reviewed changes

Trinkle23897 merged commit 40289b8 into thu-ml:master Feb 10, 2022

Trinkle23897 linked an issue Feb 10, 2022 that may be closed by this pull request

test_reward doesn't change in example #272

Closed

8 tasks

nuance1979 deleted the atari_ppo branch March 6, 2022 23:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add atari ppo example #523

Add atari ppo example #523

nuance1979 commented Feb 8, 2022 •

edited

Loading

Trinkle23897 commented Feb 8, 2022

codecov-commenter commented Feb 8, 2022 •

edited

Loading

nuance1979 commented Feb 10, 2022

Add atari ppo example #523

Add atari ppo example #523

Conversation

nuance1979 commented Feb 8, 2022 • edited Loading

Trinkle23897 commented Feb 8, 2022

codecov-commenter commented Feb 8, 2022 • edited Loading

Codecov Report

nuance1979 commented Feb 10, 2022

nuance1979 commented Feb 8, 2022 •

edited

Loading

codecov-commenter commented Feb 8, 2022 •

edited

Loading