-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for clipping to DQNPolicy. #642
Conversation
* When clip_delta=True is passed, smooth_l1_loss is used instead of the MSE loss.
Is it possible to add this feature for all DQN family? |
Yes, I can go through them and make the changes. Do you have any idea which ones I will need to look into? E.g. there is a component in SAC that is trained DQN-style – should I also look there, for example? Also, I am currently using smooth L1 (I saw it being used in another implementation), but PyTorch also has Huber loss... Finally, the implementation is a bit redundant – the loss expects the output and the target – but since we also need td_error for the prio buffer, the way I implemented this, we do subtraction twice. But I am not sure how best to avoid that... |
After a second check of the current codebase, I think it's fine for this PR to only modify DQN loss. It's because other methods, e.g., IQN / FQF / QRDQN, they use a distribution to calculate loss, and it's non-trivial to directly apply this smooth_l1_loss. (BTW QRDQN has already use smooth_l1_loss to some extend)
Could you please elaborate more on it?
That's fine. |
I have seen DQN code using the smooth L1 loss, but as I recalled, the DQN paper uses the Huber loss – so I figured someone here might know why one would choose one over the other. But looking into it properly just now, I see that with the default parameters (beta=1 for the smooth L1 and delta=1 for the Huber loss), they are actually exactly the same, so there is no difference. Sorry for the confusion. |
Codecov Report
@@ Coverage Diff @@
## master #642 +/- ##
=========================================
Coverage ? 93.56%
=========================================
Files ? 71
Lines ? 4755
Branches ? 0
=========================================
Hits ? 4449
Misses ? 306
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
* Made the argument's name more descriptive; * Replaced the smooth L1 loss with the Huber loss, which has identical form with the default parametrization, but seems to be better known in this context; * Added a fuller description to the docstring;
* When clip_loss_grad=True is passed, Huber loss is used instead of the MSE loss. * Made the argument's name more descriptive; * Replaced the smooth L1 loss with the Huber loss, which has an identical form to the default parametrization, but seems to be better known in this context; * Added a fuller description to the docstring;
Added support for clipping to DQNPolicy.
* When clip_delta=True is passed, smooth_l1_loss is used
instead of the MSE loss.
make format
(required)make commit-checks
(required)