Added support for clipping to DQNPolicy. #642

michalgregor · 2022-05-17T07:32:00Z

Added support for clipping to DQNPolicy.
* When clip_delta=True is passed, smooth_l1_loss is used
instead of the MSE loss.

I have marked all applicable categories:
- exception-raising fix
- algorithm implementation fix
- documentation modification
- new feature
I have reformatted the code using make format (required)
I have checked the code using make commit-checks (required)
If applicable, I have mentioned the relevant/related issue(s)
If applicable, I have listed every items in this Pull Request below

* When clip_delta=True is passed, smooth_l1_loss is used instead of the MSE loss.

Trinkle23897 · 2022-05-17T12:04:24Z

Is it possible to add this feature for all DQN family?

michalgregor · 2022-05-17T12:12:33Z

Is it possible to add this feature for all DQN family?

Yes, I can go through them and make the changes. Do you have any idea which ones I will need to look into? E.g. there is a component in SAC that is trained DQN-style – should I also look there, for example?

Also, I am currently using smooth L1 (I saw it being used in another implementation), but PyTorch also has Huber loss...

Finally, the implementation is a bit redundant – the loss expects the output and the target – but since we also need td_error for the prio buffer, the way I implemented this, we do subtraction twice. But I am not sure how best to avoid that...

Trinkle23897 · 2022-05-17T15:24:23Z

Do you have any idea which ones I will need to look into? E.g. there is a component in SAC that is trained DQN-style – should I also look there, for example?

After a second check of the current codebase, I think it's fine for this PR to only modify DQN loss. It's because other methods, e.g., IQN / FQF / QRDQN, they use a distribution to calculate loss, and it's non-trivial to directly apply this smooth_l1_loss. (BTW QRDQN has already use smooth_l1_loss to some extend)

Also, I am currently using smooth L1 (I saw it being used in another implementation), but PyTorch also has Huber loss...

Could you please elaborate more on it?

Finally, the implementation is a bit redundant – the loss expects the output and the target – but since we also need td_error for the prio buffer, the way I implemented this, we do subtraction twice. But I am not sure how best to avoid that...

That's fine.

michalgregor · 2022-05-17T15:34:19Z

Also, I am currently using smooth L1 (I saw it being used in another implementation), but PyTorch also has Huber loss...

Could you please elaborate more on it?

I have seen DQN code using the smooth L1 loss, but as I recalled, the DQN paper uses the Huber loss – so I figured someone here might know why one would choose one over the other. But looking into it properly just now, I see that with the default parameters (beta=1 for the smooth L1 and delta=1 for the Huber loss), they are actually exactly the same, so there is no difference. Sorry for the confusion.

tianshou/policy/modelfree/dqn.py

codecov-commenter · 2022-05-17T16:03:27Z

Codecov Report

❗ No coverage uploaded for pull request base (master@c87b9f4). Click here to learn what that means.
The diff coverage is 50.00%.

@@            Coverage Diff            @@
##             master     #642   +/-   ##
=========================================
  Coverage          ?   93.56%           
=========================================
  Files             ?       71           
  Lines             ?     4755           
  Branches          ?        0           
=========================================
  Hits              ?     4449           
  Misses            ?      306           
  Partials          ?        0

Flag	Coverage Δ
unittests	`93.56% <50.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/policy/modelfree/dqn.py	`94.38% <50.00%> (ø)`

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

* Made the argument's name more descriptive; * Replaced the smooth L1 loss with the Huber loss, which has identical form with the default parametrization, but seems to be better known in this context; * Added a fuller description to the docstring;

* When clip_loss_grad=True is passed, Huber loss is used instead of the MSE loss. * Made the argument's name more descriptive; * Replaced the smooth L1 loss with the Huber loss, which has an identical form to the default parametrization, but seems to be better known in this context; * Added a fuller description to the docstring;

Added support for clipping to DQNPolicy.

d4b57a3

* When clip_delta=True is passed, smooth_l1_loss is used instead of the MSE loss.

Merge branch 'master' into dqn_clip_delta

95cc2b9

Trinkle23897 reviewed May 17, 2022

View reviewed changes

tianshou/policy/modelfree/dqn.py Outdated Show resolved Hide resolved

Trinkle23897 approved these changes May 18, 2022

View reviewed changes

Trinkle23897 merged commit 277138c into thu-ml:master May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for clipping to DQNPolicy. #642

Added support for clipping to DQNPolicy. #642

michalgregor commented May 17, 2022

Trinkle23897 commented May 17, 2022

michalgregor commented May 17, 2022

Trinkle23897 commented May 17, 2022 •

edited

Loading

michalgregor commented May 17, 2022 •

edited

Loading

codecov-commenter commented May 17, 2022 •

edited

Loading

Added support for clipping to DQNPolicy. #642

Added support for clipping to DQNPolicy. #642

Conversation

michalgregor commented May 17, 2022

Trinkle23897 commented May 17, 2022

michalgregor commented May 17, 2022

Trinkle23897 commented May 17, 2022 • edited Loading

michalgregor commented May 17, 2022 • edited Loading

codecov-commenter commented May 17, 2022 • edited Loading

Codecov Report

Trinkle23897 commented May 17, 2022 •

edited

Loading

michalgregor commented May 17, 2022 •

edited

Loading

codecov-commenter commented May 17, 2022 •

edited

Loading