Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added gamma to reward normalization wrappers #209

Merged
merged 5 commits into from
Jul 7, 2022

Conversation

Howuhh
Copy link
Contributor

@Howuhh Howuhh commented Jun 20, 2022

Description

Fixes incorrect gamma in reward normalization wrapper for non-default gamma's. See #203.

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have added additional documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have added the learning curves (in PNG format with width=500 and height=300).
    • I have added links to the tracked experiments.

@vercel
Copy link

vercel bot commented Jun 20, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Jul 6, 2022 at 7:59PM (UTC)

@vwxyzjn
Copy link
Owner

vwxyzjn commented Jun 20, 2022

So here is the tricky part - the original implementation actually uses 0.999 for gamma, but 0.99 for the normalization wrapper. See https://github.com/openai/train-procgen/blob/1a2ae2194a61f76a733a39339530401c024c3ad8/train_procgen/train.py#L43

This would cause a performance change unfortunately. There are two ways to go forward

  1. re-run the procgen benchmark experiments with gym.wrappers.NormalizeReward(envs, gamma=args.gamma).

    cleanrl/benchmark/ppo.sh

    Lines 39 to 44 in 6387191

    poetry install -E procgen
    xvfb-run -a python -m cleanrl_utils.benchmark \
    --env-ids starpilot bossfight bigfish \
    --command "poetry run python cleanrl/ppo_procgen.py --track --capture-video" \
    --num-seeds 3 \
    --workers 1

    poetry install -E procgen
    xvfb-run -a python -m cleanrl_utils.benchmark \
    --env-ids starpilot bossfight bigfish \
    --command "poetry run python cleanrl/ppg_procgen.py --track --capture-video" \
    --num-seeds 3 \
    --workers 1
  2. keep the procgen scripts untouched.

@Howuhh what do you think we should do?

@Howuhh
Copy link
Contributor Author

Howuhh commented Jun 20, 2022

@vwxyzjn To be honest, I think this is a bug in original code, not a feature, so it will be more accurate to rerun for correct results. However, procgen is image based env and for now I don't have resources to train on images.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Jun 20, 2022

Ok, no worries. I will take care from here. @Dipamc77 I don't have the GPU memory to run the PPG experiments. Would you mind running them with this PR? I can take care of the ppo procgen experiments.

poetry install -E procgen
xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids starpilot bossfight bigfish \
--command "poetry run python cleanrl/ppg_procgen.py --track --capture-video" \
--num-seeds 3 \
--workers 1

@vwxyzjn
Copy link
Owner

vwxyzjn commented Jun 21, 2022

Running the PPO experiments now. Also tried a fun thing by adding a wandb tag like

WANDB_TAGS=$(git describe --tags)  xvfb-run -a python -m cleanrl_utils.benchmark \
    --env-ids starpilot bossfight bigfish \
    --command "poetry run python cleanrl/ppo_procgen.py --track --capture-video" \
    --num-seeds 3 \
    --workers 1

which produces runs like

image

@dosssman I think this tagging system could somehow help us phase out past openrlbenchmark experiments without deleting them. I will have to think about the workflow a bit more.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Jun 30, 2022

Following up here

image

image

image

@vwxyzjn
Copy link
Owner

vwxyzjn commented Jun 30, 2022

The bigfish performance degradation could easily be due to a random seed.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Jul 6, 2022

image

image

image

No major performance regression. Going to document this change and merge.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Jul 6, 2022

I have just updated all of the experiments and documentation. @Howuhh @dosssman could you give it a pass, please? Thank you!

@Howuhh
Copy link
Contributor Author

Howuhh commented Jul 7, 2022

@vwxyzjn Seems okay to me. Thanks for redoing the experiments btw.

@vwxyzjn vwxyzjn merged commit cd2011c into vwxyzjn:master Jul 7, 2022
@vwxyzjn vwxyzjn mentioned this pull request Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants