policy gradient learn function casting episode returns to int when using discrete actions #520

Kenneth-Schroeder · 2022-02-04T13:31:39Z

In the learn(...) function of the PGPolicy class, the to_torch_as(...) function is used on the return values of the input batch to convert them to a tensor of similar characteristics as the actions. When using discrete actions, this results in a problem as continuous rewards/returns are casted to integers.

tianshou/tianshou/policy/modelfree/pg.py

Line 134 in c25926d

ret = to_torch_as(minibatch.returns, result.act)

This can be fixed by checking self.action_type of the PGPolicy object and transforming the return values accordingly.

The text was updated successfully, but these errors were encountered:

Trinkle23897 · 2022-02-04T13:33:41Z

How about

ret = to_torch(minibatch.returns, result.act.device, torch.float)

Kenneth-Schroeder · 2022-02-04T13:36:09Z

works as well :)

Trinkle23897 · 2022-02-04T13:37:37Z

Thanks for pointing it out! Are you willing to submit a pull request to fix this issue? (I believe it occurs many times)

Kenneth-Schroeder · 2022-02-04T13:38:18Z

Yes, I will take a look at it :)

Kenneth-Schroeder · 2022-02-04T14:02:52Z

Done for PGPolicy. The parameter order is actually ret = to_torch(minibatch.returns, torch.float, result.act.device) :)

Kenneth-Schroeder · 2022-02-04T22:50:04Z

I fixed code style, ran tests etc. and also checked all other occurrences of to_torch_as(...), but none other looked suspicious. PR should be ready for merge. @Trinkle23897

Trinkle23897 added the bug Something isn't working label Feb 4, 2022

Kenneth-Schroeder changed the title ~~BUG - policy gradient learn function casting episode returns to int when using discrete actions~~ policy gradient learn function casting episode returns to int when using discrete actions Feb 4, 2022

Kenneth-Schroeder mentioned this issue Feb 4, 2022

Fixing casts to int by to_torch_as(...) calls in policies when using discrete actions #521

Merged

9 tasks

Trinkle23897 linked a pull request Feb 6, 2022 that will close this issue

Fixing casts to int by to_torch_as(...) calls in policies when using discrete actions #521

Merged

9 tasks

Trinkle23897 closed this as completed in #521 Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

policy gradient learn function casting episode returns to int when using discrete actions #520

policy gradient learn function casting episode returns to int when using discrete actions #520

Kenneth-Schroeder commented Feb 4, 2022

Trinkle23897 commented Feb 4, 2022

Kenneth-Schroeder commented Feb 4, 2022

Trinkle23897 commented Feb 4, 2022

Kenneth-Schroeder commented Feb 4, 2022

Kenneth-Schroeder commented Feb 4, 2022

Kenneth-Schroeder commented Feb 4, 2022 •

edited

Loading

policy gradient learn function casting episode returns to int when using discrete actions #520

policy gradient learn function casting episode returns to int when using discrete actions #520

Comments

Kenneth-Schroeder commented Feb 4, 2022

Trinkle23897 commented Feb 4, 2022

Kenneth-Schroeder commented Feb 4, 2022

Trinkle23897 commented Feb 4, 2022

Kenneth-Schroeder commented Feb 4, 2022

Kenneth-Schroeder commented Feb 4, 2022

Kenneth-Schroeder commented Feb 4, 2022 • edited Loading

Kenneth-Schroeder commented Feb 4, 2022 •

edited

Loading