Fix Atari PPO example #780

nuance1979 · 2022-11-30T21:42:51Z

I have marked all applicable categories:
- exception-raising fix
- algorithm implementation fix
- documentation modification
- new feature
I have reformatted the code using make format (required)
I have checked the code using make commit-checks (required)
If applicable, I have mentioned the relevant/related issue(s)
If applicable, I have listed every items in this Pull Request below

While trying to debug Atari PPO+LSTM, I found significant gap between our Atari PPO example vs CleanRL's Atari PPO w/ EnvPool. I tried to align our implementation with CleaRL's version, mostly in hyper parameter choices, and got significant gain in Breakout, Qbert, SpaceInvaders while on par in other games. After this fix, I would suggest updating our Atari Benchmark PPO experiments.

A few interesting findings:

Layer initialization helps stabilize the training and enable the use of larger learning rates; without it, larger learning rates will trigger NaN gradient very quickly;
ppo.py#L97-L101: this change helps training stability for reasons I do not understand; also it makes the GPU usage higher.

Shoutout to CleanRL for a well-tuned Atari PPO reference implementation!

codecov-commenter · 2022-11-30T23:59:40Z

Codecov Report

Merging #780 (d2f422e) into master (929508b) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #780      +/-   ##
==========================================
- Coverage   91.08%   91.07%   -0.01%     
==========================================
  Files          71       71              
  Lines        5082     5077       -5     
==========================================
- Hits         4629     4624       -5     
  Misses        453      453

Flag	Coverage Δ
unittests	`91.07% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/trainer/base.py	`96.90% <ø> (ø)`
tianshou/policy/modelfree/ppo.py	`92.06% <100.00%> (-0.59%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

- [x] I have marked all applicable categories: + [ ] exception-raising fix + [x] algorithm implementation fix + [ ] documentation modification + [ ] new feature - [x] I have reformatted the code using `make format` (**required**) - [x] I have checked the code using `make commit-checks` (**required**) - [x] If applicable, I have mentioned the relevant/related issue(s) - [x] If applicable, I have listed every items in this Pull Request below While trying to debug Atari PPO+LSTM, I found significant gap between our Atari PPO example vs [CleanRL's Atari PPO w/ EnvPool](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_envpoolpy). I tried to align our implementation with CleaRL's version, mostly in hyper parameter choices, and got significant gain in Breakout, Qbert, SpaceInvaders while on par in other games. After this fix, I would suggest updating our [Atari Benchmark](https://tianshou.readthedocs.io/en/master/tutorials/benchmark.html) PPO experiments. A few interesting findings: - Layer initialization helps stabilize the training and enable the use of larger learning rates; without it, larger learning rates will trigger NaN gradient very quickly; - ppo.py#L97-L101: this change helps training stability for reasons I do not understand; also it makes the GPU usage higher. Shoutout to [CleanRL](https://github.com/vwxyzjn/cleanrl) for a well-tuned Atari PPO reference implementation!

Yi Su and others added 8 commits November 16, 2022 11:36

fix atari ppo example

3320219

add layer_init

366f92a

update results

78a4f1d

add type hints

a7d51e7

remove unused code

ddba72c

debug the difference in results

32ade30

update results

b42da84

Merge branch 'master' into fix_atari_ppo

9d6c864

nuance1979 requested review from ChenDRAG and Trinkle23897 November 30, 2022 21:43

fix format

d2f422e

Trinkle23897 approved these changes Dec 4, 2022

View reviewed changes

Trinkle23897 merged commit 662af52 into thu-ml:master Dec 4, 2022

nuance1979 deleted the fix_atari_ppo branch December 5, 2022 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Atari PPO example #780

Fix Atari PPO example #780

nuance1979 commented Nov 30, 2022

codecov-commenter commented Nov 30, 2022 •

edited

Loading

Fix Atari PPO example #780

Fix Atari PPO example #780

Conversation

nuance1979 commented Nov 30, 2022

codecov-commenter commented Nov 30, 2022 • edited Loading

Codecov Report

codecov-commenter commented Nov 30, 2022 •

edited

Loading