Add `ddpg_continuous_action.py` docs #137

vwxyzjn · 2022-03-13T23:07:50Z

This RP adds docs for ddpg_continuous_action.py.

Checklist for `ddpg_continuous_action.py`:

vercel · 2022-03-13T23:07:52Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/vwxyzjn/cleanrl/CCjtcjP4CWC3iPu2nuVKoyG9EuMD
✅ Preview: https://cleanrl-git-ddpg-docs-vwxyzjn.vercel.app

gitpod-io · 2022-03-13T23:07:54Z

vwxyzjn · 2022-03-16T14:57:58Z

Hey @yooceii @dosssman, the PR is ready for review. Could you take a look at https://cleanrl-git-ddpg-docs-vwxyzjn.vercel.app/rl-algorithms/ddpg/? Thank you

yooceii · 2022-03-20T18:02:37Z

docs/rl-algorithms/ddpg.md

+* `losses/qf1_loss`: the MSE between the Q values at timestep $t$ and the target Q values at timestep $t+1$, which minimizes temporal difference. 
+* `losses/actor_loss`: implemented as `-qf1(data.observations, actor(data.observations)).mean()`; it is the *negative* average Q values calculated based on the 1) observations and the 2) actions computed by the actor based on these observations. By minimizing `actor_loss`, the optimizer updates the actors parameter using the following gradient (Lillicrap et al., 2016, Equation 6)[^1]:
+
+$$ \mathbb{E}_{s_{t} \sim \rho^{\beta}}\left[\left.\left.\nabla_{a} Q\left(s, a \mid \theta^{Q}\right)\right|_{s=s_{t}, a=\mu\left(s_{t}\right)} \nabla_{\theta_{\mu}} \mu\left(s \mid \theta^{\mu}\right)\right|_{s=s_{t}}\right]$$


What's the $$\rho^{\beta}$$? Are u referring the buffer $$ R $$?

$$\rho^{\beta}$$ has a definition as follows

I didn't add too much explanation for this because I referenced equation 6 from the original paper. If you think this could confuse the readers, I can remove it and just say equation 6 without copying it in our docs.

But the equation 6 is DPG's update method.
I think DDPG uses $$ \frac{1}{N}\sum_i\left.\left.\nabla_{a} Q\left(s, a \mid \theta^{Q}\right)\right|{s=s_{i}, a=\mu\left(s_{i}\right)} \nabla_{\theta_{\mu}} \mu\left(s \mid \theta^{\mu}\right)\right|{s_{i}} $$
as shown in

You're absolutely right. Thanks for the couch and I have just fixed it.

docs/rl-algorithms/ddpg.md

cleanrl/ddpg_continuous_action.py

vwxyzjn · 2022-03-20T23:10:39Z

Hey, @yooceii thanks for reviewing the PR :) Let me know if there are other issues.

vwxyzjn · 2022-03-21T19:08:40Z

Given that all the comments @yooceii are addressed, I am merging the PR as is but happy to open follow-up PRs if anything else is needed.

vwxyzjn mentioned this pull request Mar 13, 2022

Refactor documentation #121

Closed

10 tasks

vercel bot deployed to Preview March 14, 2022 23:12 View deployment

vercel bot deployed to Preview March 15, 2022 19:52 View deployment

vwxyzjn added 3 commits March 15, 2022 17:40

Add DDPG docs

39e24cd

Add DDPG docs

5a4bae2

Remove unused

ba811ab

vwxyzjn force-pushed the ddpg-docs branch from 739b622 to ba811ab Compare March 15, 2022 21:40

vercel bot deployed to Preview March 15, 2022 21:40 View deployment

Update documentation

0405edc

vercel bot deployed to Preview March 16, 2022 14:53 View deployment

vwxyzjn requested review from dosssman and yooceii March 16, 2022 14:58

Add reproduction instruction

2100041

vercel bot deployed to Preview March 16, 2022 15:25 View deployment

Update docs and README.md

0548057

vercel bot deployed to Preview March 17, 2022 14:21 View deployment

vwxyzjn added 2 commits March 17, 2022 10:47

Add explanation of the logged metrics

16d8448

dummy change

de8a1e9

vercel bot deployed to Preview March 17, 2022 22:09 View deployment

Fix typo

e0bfc70

vercel bot deployed to Preview March 17, 2022 22:11 View deployment

yooceii reviewed Mar 20, 2022

View reviewed changes

docs/rl-algorithms/ddpg.md Show resolved Hide resolved

docs/rl-algorithms/ddpg.md Show resolved Hide resolved

docs/rl-algorithms/ddpg.md Show resolved Hide resolved

docs/rl-algorithms/ddpg.md Show resolved Hide resolved

cleanrl/ddpg_continuous_action.py Show resolved Hide resolved

Fix equation explanation

ba81e98

vercel bot deployed to Preview March 21, 2022 01:03 View deployment

This was referenced Nov 15, 2022

Implement Gymnasium-compliant PPO script #318

Closed

Implement Gymnasium-compliant PPO #319

Closed

Implement Gymnasium-compliant PPO script #320

Merged

vwxyzjn mentioned this pull request Nov 19, 2022

Torchx integration #321

Closed

20 tasks

qgallouedec mentioned this pull request Nov 22, 2022

Fix DQN target update frequency #323

Merged

20 tasks

LooseTerrifyingSpaceMonkey mentioned this pull request Nov 27, 2022

Updated the pip install poetry lines in the docker files to contain -… #326

Merged

19 tasks

This was referenced Dec 3, 2022

Using jax scan for PPO + atari + envpool XLA #327

Closed

Using jax scan for PPO + atari + envpool XLA #328

Merged

qgallouedec mentioned this pull request Dec 9, 2022

Fix target-network-frequency in DQN documentation #329

Merged

20 tasks

masud99r mentioned this pull request Dec 16, 2022

Add RPO to CleanRL #331

Merged

19 tasks

51616 mentioned this pull request Dec 22, 2022

Fix ppo jax documentation links #332

Merged

20 tasks

This was referenced Dec 30, 2022

update dqn-jax docs with CPU experiments #335

Merged

update paper link to point to JMLR version #336

Merged

This was referenced Dec 31, 2022

bug: incorrect logic in GAE calculation #337

Merged

Sebula PPO (EnvPool's async API) #338

Closed

Add test cases #339

Merged

qgallouedec mentioned this pull request Jan 9, 2023

Proper description of v_min and v_max in C51 parser #343

Merged

20 tasks

vwxyzjn mentioned this pull request Jan 9, 2023

Qdagger: Reincarnate RL #344

Merged

20 tasks

dosssman mentioned this pull request Jan 10, 2023

Dreamer v1 / v2 [Model-based RL] #345

Closed

20 tasks

manjavacas mentioned this pull request Jan 12, 2023

Added Polyak update rate for soft DQN target network updates #347

Merged

20 tasks

This was referenced Jan 13, 2023

Parallel-envs-friendly ppo_continuous_action.py #348

Draft

fix pre-commit #351

Merged

shermansiu mentioned this pull request Jan 24, 2023

Add Muesli #354

Closed

20 tasks

quangr mentioned this pull request Jan 31, 2023

add tianshou-like JAX+PPO+Mujoco #355

Draft

19 tasks

ttumiel mentioned this pull request Feb 15, 2023

add complex observation atari ppo #359

Open

20 tasks

vwxyzjn mentioned this pull request Mar 29, 2023

Better contribution guide #368

Merged

18 tasks

This was referenced Apr 12, 2023

ra jamesthesnake/cleanrl#7

Merged

Boss jamesthesnake/cleanrl#8

Merged

Cleaner jamesthesnake/cleanrl#9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `ddpg_continuous_action.py` docs #137

Add `ddpg_continuous_action.py` docs #137

vwxyzjn commented Mar 13, 2022 •

edited

vercel bot commented Mar 13, 2022 •

edited

gitpod-io bot commented Mar 13, 2022

vwxyzjn commented Mar 16, 2022

yooceii Mar 20, 2022

vwxyzjn Mar 20, 2022

yooceii Mar 20, 2022

vwxyzjn Mar 21, 2022

vwxyzjn commented Mar 20, 2022

vwxyzjn commented Mar 21, 2022

Add ddpg_continuous_action.py docs #137

Add ddpg_continuous_action.py docs #137

Conversation

vwxyzjn commented Mar 13, 2022 • edited

Checklist for ddpg_continuous_action.py:

vercel bot commented Mar 13, 2022 • edited

gitpod-io bot commented Mar 13, 2022

vwxyzjn commented Mar 16, 2022

yooceii Mar 20, 2022

Choose a reason for hiding this comment

vwxyzjn Mar 20, 2022

Choose a reason for hiding this comment

yooceii Mar 20, 2022

Choose a reason for hiding this comment

vwxyzjn Mar 21, 2022

Choose a reason for hiding this comment

vwxyzjn commented Mar 20, 2022

vwxyzjn commented Mar 21, 2022

Add `ddpg_continuous_action.py` docs #137

Add `ddpg_continuous_action.py` docs #137

vwxyzjn commented Mar 13, 2022 •

edited

Checklist for `ddpg_continuous_action.py`:

vercel bot commented Mar 13, 2022 •

edited