Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ddpg action bias #299

Merged
merged 4 commits into from
Nov 3, 2022
Merged

fix: ddpg action bias #299

merged 4 commits into from
Nov 3, 2022

Conversation

sdpkjc
Copy link
Collaborator

@sdpkjc sdpkjc commented Oct 22, 2022

Description

Fixes the first part of #297

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

  • I have contacted vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have added additional documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format with width=500 and height=300).
    • I have added links to the tracked experiments.
    • I have updated the overview sections at the docs and the repo
  • I have updated the tests accordingly (if applicable).

@vercel
Copy link

vercel bot commented Oct 22, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Nov 3, 2022 at 9:12PM (UTC)

@vwxyzjn
Copy link
Owner

vwxyzjn commented Nov 1, 2022

Thanks for the PR. Running some benchmark experiments now.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Nov 1, 2022

Using the following snippet from #307

python rlops.py --exp-name ddpg_continuous_action \
    --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --tags  pr-299 rlops-pilot \
    --env-ids Hopper-v2 Walker2d-v2 HalfCheetah-v2 \
    --output-filename compare.png \
    --report

we generate the following image

image

image

image

image

Discussion

  • The matplotlib subsamples from the wandb runs and seems to result in slightly inaccurate curves sometimes
  • This PR improves the performance in HalfCheetah-v2
  • Speed is slightly faster, probably because I am now using --worker 1 instead of --worker 3

What remains is to update the documentation and optionally run more experiments in more envs.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Nov 3, 2022

Experiments were done, and the docs were updated. Using the following command from #307 generated the following figure and table

python -m cleanrl_utils.rlops --exp-name ddpg_continuous_action \
    --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --tags 'pr-299' 'rlops-pilot' \
    --env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
    --output-filename compare.png \
    --scan-history \
    --metric-last-n-average-window 100 \
    --report
                    CleanRL's ddpg_continuous_action (pr-299) CleanRL's ddpg_continuous_action (rlops-pilot)
HalfCheetah-v2                              10210.57 ± 196.22                              9205.65 ± 1093.88
Walker2d-v2                                  1661.14 ± 250.01                               1447.09 ± 260.24
Hopper-v2                                    1007.44 ± 148.29                               1126.37 ± 278.02
InvertedPendulum-v2                            684.61 ± 94.41                                 544.77 ± 50.98
Humanoid-v2                                    910.61 ± 97.58                                 849.05 ± 40.64
Pusher-v2                                       -39.39 ± 9.54                                  -32.52 ± 2.03

image

@vwxyzjn
Copy link
Owner

vwxyzjn commented Nov 3, 2022

Thanks @sdpkjc for this PR and raising the issue.

@vwxyzjn vwxyzjn merged commit 023eaea into vwxyzjn:master Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants