-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add atari ppo example #523
Conversation
Things to check:
|
Codecov Report
@@ Coverage Diff @@
## master #523 +/- ##
==========================================
+ Coverage 94.33% 94.35% +0.02%
==========================================
Files 63 63
Lines 4251 4251
==========================================
+ Hits 4010 4011 +1
+ Misses 241 240 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I did some experiments (see below) and it seemed that normalizing obs did not help. Since our previous exps do not normalize obs, I'd suggest keeping it as an option ("--scale-obs") but turning it off by default.
I made the change so that the network structure matches other (sb3, cleanrl) implementations, i.e., sharing NatureCNN + 1 Linear-512, then both actor and critic are just one linear layer. The results seems to be about the same as my old structure. I'll rerun the exps and update the plots.
|
I needed a policy gradient baseline myself and it has been requested several times (thu-ml#497, thu-ml#374, thu-ml#440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters. Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in DLR-RM/rl-baselines3-zoo#156.
make format
(required)make commit-checks
(required)I needed a policy gradient baseline myself and it has been requested several times (#497, #374, #440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters.
Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in DLR-RM/rl-baselines3-zoo#156.