PPO - Add hyperparam tuning with ray.tune #54

xeviknal · 2021-04-13T19:52:56Z

PPO has way more hyperparams compared with REINFORCE ones.

It is really easy to make the models overfit by doing many passes of the very same transitions
Loss must be really watched closely in order to inspect its 3 components (and therefore its 2 coefficients)

This PR adds hyperparam tuning tooling to help us see what's the best configuration.

xeviknal · 2021-04-13T20:43:40Z

Up and running! Hope it gets the 10 experiments done 🤞

xeviknal · 2021-04-14T14:23:25Z

xeviknal · 2021-04-15T10:08:04Z

Results in yann lecun -> ~/ray_results/train_2021-04-14_10-01-45/

xeviknal · 2021-04-18T17:35:46Z

xeviknal added 4 commits April 13, 2021 21:06

Adding ray for hyperparam tuning

80c8276

Adding ray.tune to tune hyperparams

6837c4f

Updating num of GPU used concurrently

b9fc980

Adding cpu and gpu usage

21604a6

xeviknal added 6 commits April 13, 2021 20:45

Making tensorboard aggregable

f4c3e61

Adding sufficx to params path to avoid clashing between experiments

520e168

fixup

fb5a4ad

2k runs per trial

5aa51df

Use more resources and try the ppo-epoch hyperparam

832a23c

More samples. Store a vid of every sample

be6f89b

Updating install script

8fdd359

xeviknal force-pushed the ppo-hyperparam-tuning branch from 065044d to 8fdd359 Compare April 14, 2021 21:52

xeviknal added 4 commits April 15, 2021 11:57

Add reporting to tune + recording video

9774921

Changing epsilon

1a57a14

Add tune report results

f0bbaed

Exp-name changed

8a4c04b

re-design of trials: different seeds, less epochs

b47a357

Provide feedback