added gamma to reward normalization wrappers#209
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
So here is the tricky part - the original implementation actually uses This would cause a performance change unfortunately. There are two ways to go forward
@Howuhh what do you think we should do? |
|
@vwxyzjn To be honest, I think this is a bug in original code, not a feature, so it will be more accurate to rerun for correct results. However, procgen is image based env and for now I don't have resources to train on images. |
|
Ok, no worries. I will take care from here. @Dipamc77 I don't have the GPU memory to run the PPG experiments. Would you mind running them with this PR? I can take care of the ppo procgen experiments. Lines 3 to 8 in 6387191 |
|
Running the PPO experiments now. Also tried a fun thing by adding a wandb tag like which produces runs like @dosssman I think this tagging system could somehow help us phase out past openrlbenchmark experiments without deleting them. I will have to think about the workflow a bit more. |
|
The bigfish performance degradation could easily be due to a random seed. |
|
@vwxyzjn Seems okay to me. Thanks for redoing the experiments btw. |







Description
Fixes incorrect gamma in reward normalization wrapper for non-default gamma's. See #203.
Types of changes
Checklist:
pre-commit run --all-filespasses (required).If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-videoflag toggled on (required).mkdocs serve.width=500andheight=300).