Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for clipping to DQNPolicy. #642

Merged
merged 3 commits into from
May 18, 2022

Conversation

michalgregor
Copy link
Contributor

Added support for clipping to DQNPolicy.
* When clip_delta=True is passed, smooth_l1_loss is used
instead of the MSE loss.

  • I have marked all applicable categories:
    • exception-raising fix
    • algorithm implementation fix
    • documentation modification
    • new feature
  • I have reformatted the code using make format (required)
  • I have checked the code using make commit-checks (required)
  • If applicable, I have mentioned the relevant/related issue(s)
  • If applicable, I have listed every items in this Pull Request below

    * When clip_delta=True is passed, smooth_l1_loss is used
    instead of the MSE loss.
@Trinkle23897
Copy link
Collaborator

Is it possible to add this feature for all DQN family?

@michalgregor
Copy link
Contributor Author

Is it possible to add this feature for all DQN family?

Yes, I can go through them and make the changes. Do you have any idea which ones I will need to look into? E.g. there is a component in SAC that is trained DQN-style – should I also look there, for example?

Also, I am currently using smooth L1 (I saw it being used in another implementation), but PyTorch also has Huber loss...

Finally, the implementation is a bit redundant – the loss expects the output and the target – but since we also need td_error for the prio buffer, the way I implemented this, we do subtraction twice. But I am not sure how best to avoid that...

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented May 17, 2022

Do you have any idea which ones I will need to look into? E.g. there is a component in SAC that is trained DQN-style – should I also look there, for example?

After a second check of the current codebase, I think it's fine for this PR to only modify DQN loss. It's because other methods, e.g., IQN / FQF / QRDQN, they use a distribution to calculate loss, and it's non-trivial to directly apply this smooth_l1_loss. (BTW QRDQN has already use smooth_l1_loss to some extend)

Also, I am currently using smooth L1 (I saw it being used in another implementation), but PyTorch also has Huber loss...

Could you please elaborate more on it?

Finally, the implementation is a bit redundant – the loss expects the output and the target – but since we also need td_error for the prio buffer, the way I implemented this, we do subtraction twice. But I am not sure how best to avoid that...

That's fine.

@michalgregor
Copy link
Contributor Author

michalgregor commented May 17, 2022

Also, I am currently using smooth L1 (I saw it being used in another implementation), but PyTorch also has Huber loss...

Could you please elaborate more on it?

I have seen DQN code using the smooth L1 loss, but as I recalled, the DQN paper uses the Huber loss – so I figured someone here might know why one would choose one over the other. But looking into it properly just now, I see that with the default parameters (beta=1 for the smooth L1 and delta=1 for the Huber loss), they are actually exactly the same, so there is no difference. Sorry for the confusion.

@codecov-commenter
Copy link

codecov-commenter commented May 17, 2022

Codecov Report

❗ No coverage uploaded for pull request base (master@c87b9f4). Click here to learn what that means.
The diff coverage is 50.00%.

@@            Coverage Diff            @@
##             master     #642   +/-   ##
=========================================
  Coverage          ?   93.56%           
=========================================
  Files             ?       71           
  Lines             ?     4755           
  Branches          ?        0           
=========================================
  Hits              ?     4449           
  Misses            ?      306           
  Partials          ?        0           
Flag Coverage Δ
unittests 93.56% <50.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tianshou/policy/modelfree/dqn.py 94.38% <50.00%> (ø)

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

* Made the argument's name more descriptive;
* Replaced the smooth L1 loss with the Huber loss,
  which has identical form with the default parametrization,
   but seems to be better known in this context;
* Added a fuller description to the docstring;
@Trinkle23897 Trinkle23897 merged commit 277138c into thu-ml:master May 18, 2022
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
* When clip_loss_grad=True is passed, Huber loss is used instead of the MSE loss.
* Made the argument's name more descriptive;
* Replaced the smooth L1 loss with the Huber loss, which has an identical form to the default parametrization, but seems to be better known in this context;
* Added a fuller description to the docstring;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants