Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype Envpool Support #100

Merged
merged 5 commits into from
Feb 8, 2022
Merged

Prototype Envpool Support #100

merged 5 commits into from
Feb 8, 2022

Conversation

vwxyzjn
Copy link
Owner

@vwxyzjn vwxyzjn commented Jan 9, 2022

This PR adds envpool example. Interestingly, after increasing num_envs=32, I was able to solve Pong in 10 mins :D

image

See the tracked experiment in costa-huang/cleanRL/runs/3rx432mj

@gitpod-io
Copy link

gitpod-io bot commented Jan 9, 2022

self.num_envs = getattr(env, "num_envs", 1)
self.episode_returns = None
self.episode_lengths = None
self.is_vector_env = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_vector_env is not referenced except in line 185 which is a comment.

@yooceii
Copy link
Collaborator

yooceii commented Jan 9, 2022

Wonder what's the performance comparing with https://github.com/NVlabs/cule with large number of envs.

@vwxyzjn
Copy link
Owner Author

vwxyzjn commented Jan 9, 2022

Ran a hyper-parameter sweep (sweeps/nfrd091p) overnight, now i can solve Pong in ~5 mins, according to runs/opk2dmta, with hyper parameters

--clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3

D22B5FE2-0515-46FF-91A3-B29B4DF49EBA

Trinkle23897 pushed a commit to sail-sg/envpool that referenced this pull request Jan 9, 2022
Copy link
Collaborator

@dosssman dosssman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I am not too familiar with envpool, I got that it is essential related to the environments, and the PPO logic did not seem to have changed.

My attempt converged around 30 minutes, but did use a weaker CPU server than yours, so I suspect the wall time efficiency is highly depended on hardware. Nevertheless, it is still faster than ppo_atari.py which does not use envpool ( using same hyper parameters), which has yet to converge stably enough after 1h30.

For reference, ppo_atari_envpoo.py has an SPS of around 1729, while ppo_atary.py has an SPS 489.

In any case, this PR looks good for me.
Great work.

CPU specs:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          20
On-line CPU(s) list:             0-19
Thread(s) per core:              1
Core(s) per socket:              10
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
Stepping:                        1
CPU MHz:                         1200.179
CPU max MHz:                     3400.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4800.00
Virtualization:                  VT-x
L1d cache:                       640 KiB
L1i cache:                       640 KiB
L2 cache:                        5 MiB
L3 cache:                        50 MiB

GPU spces: 1080

@vwxyzjn
Copy link
Owner Author

vwxyzjn commented Jan 17, 2022

E026FB08-DDA8-4B6F-87AD-571B5C935674

https://wandb.ai/vwxyzjn/ppo-details/reports/Envpool--VmlldzoxNDM3ODQz @dosssman

@dosssman
Copy link
Collaborator

This time it seems to take around 50 minutes for Pong.

Is the 5 min PPO solving Pong-v5 really due to the hyper parameters mentioned above ?

Also, I noticed that you used the same machine for all the runs, so I was wondering if the concurrence of the training scripts could have some impact on the overall performance too ...

@vwxyzjn
Copy link
Owner Author

vwxyzjn commented Jan 19, 2022

@dosssman it was largely a bit of hyperparameter tuning. Also, I was running these scripts one at a time, so no concurrent issues.

@vwxyzjn vwxyzjn merged commit 57fdf35 into master Feb 8, 2022
@vwxyzjn vwxyzjn deleted the new-envpool branch February 8, 2022 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants