Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ddpg_continuous_action.py docs #137

Merged
merged 10 commits into from Mar 21, 2022
Merged

Add ddpg_continuous_action.py docs #137

merged 10 commits into from Mar 21, 2022

Conversation

vwxyzjn
Copy link
Owner

@vwxyzjn vwxyzjn commented Mar 13, 2022

This RP adds docs for ddpg_continuous_action.py.

Checklist for ddpg_continuous_action.py:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format with width=500 and height=300).
    • I have added links to the tracked experiments.
  • I have updated the tests accordingly (if applicable).

@vercel
Copy link

vercel bot commented Mar 13, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/vwxyzjn/cleanrl/CCjtcjP4CWC3iPu2nuVKoyG9EuMD
✅ Preview: https://cleanrl-git-ddpg-docs-vwxyzjn.vercel.app

@gitpod-io
Copy link

gitpod-io bot commented Mar 13, 2022

@vwxyzjn
Copy link
Owner Author

vwxyzjn commented Mar 16, 2022

Hey @yooceii @dosssman, the PR is ready for review. Could you take a look at https://cleanrl-git-ddpg-docs-vwxyzjn.vercel.app/rl-algorithms/ddpg/? Thank you

* `losses/qf1_loss`: the MSE between the Q values at timestep $t$ and the target Q values at timestep $t+1$, which minimizes temporal difference.
* `losses/actor_loss`: implemented as `-qf1(data.observations, actor(data.observations)).mean()`; it is the *negative* average Q values calculated based on the 1) observations and the 2) actions computed by the actor based on these observations. By minimizing `actor_loss`, the optimizer updates the actors parameter using the following gradient (Lillicrap et al., 2016, Equation 6)[^1]:

$$ \mathbb{E}_{s_{t} \sim \rho^{\beta}}\left[\left.\left.\nabla_{a} Q\left(s, a \mid \theta^{Q}\right)\right|_{s=s_{t}, a=\mu\left(s_{t}\right)} \nabla_{\theta_{\mu}} \mu\left(s \mid \theta^{\mu}\right)\right|_{s=s_{t}}\right]$$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the $$\rho^{\beta}$$? Are u referring the buffer $$ R $$?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$$\rho^{\beta}$$ has a definition as follows

image

I didn't add too much explanation for this because I referenced equation 6 from the original paper. If you think this could confuse the readers, I can remove it and just say equation 6 without copying it in our docs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the equation 6 is DPG's update method.
I think DDPG uses $$ \frac{1}{N}\sum_i\left.\left.\nabla_{a} Q\left(s, a \mid \theta^{Q}\right)\right|{s=s_{i}, a=\mu\left(s_{i}\right)} \nabla_{\theta_{\mu}} \mu\left(s \mid \theta^{\mu}\right)\right|{s_{i}} $$
as shown in
image

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right. Thanks for the couch and I have just fixed it.

docs/rl-algorithms/ddpg.md Show resolved Hide resolved
docs/rl-algorithms/ddpg.md Show resolved Hide resolved
docs/rl-algorithms/ddpg.md Show resolved Hide resolved
docs/rl-algorithms/ddpg.md Show resolved Hide resolved
cleanrl/ddpg_continuous_action.py Show resolved Hide resolved
@vwxyzjn
Copy link
Owner Author

vwxyzjn commented Mar 20, 2022

Hey, @yooceii thanks for reviewing the PR :) Let me know if there are other issues.

@vwxyzjn
Copy link
Owner Author

vwxyzjn commented Mar 21, 2022

Given that all the comments @yooceii are addressed, I am merging the PR as is but happy to open follow-up PRs if anything else is needed.

@vwxyzjn vwxyzjn mentioned this pull request Nov 19, 2022
20 tasks
@qgallouedec qgallouedec mentioned this pull request Nov 22, 2022
20 tasks
@masud99r masud99r mentioned this pull request Dec 16, 2022
19 tasks
@51616 51616 mentioned this pull request Dec 22, 2022
20 tasks
@vwxyzjn vwxyzjn mentioned this pull request Jan 9, 2023
20 tasks
@dosssman dosssman mentioned this pull request Jan 10, 2023
20 tasks
@shermansiu shermansiu mentioned this pull request Jan 24, 2023
20 tasks
@quangr quangr mentioned this pull request Jan 31, 2023
19 tasks
@ttumiel ttumiel mentioned this pull request Feb 15, 2023
20 tasks
@vwxyzjn vwxyzjn mentioned this pull request Mar 29, 2023
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants