Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

policy gradient learn function casting episode returns to int when using discrete actions #520

Closed
Kenneth-Schroeder opened this issue Feb 4, 2022 · 6 comments · Fixed by #521
Labels
bug Something isn't working

Comments

@Kenneth-Schroeder
Copy link
Contributor

In the learn(...) function of the PGPolicy class, the to_torch_as(...) function is used on the return values of the input batch to convert them to a tensor of similar characteristics as the actions. When using discrete actions, this results in a problem as continuous rewards/returns are casted to integers.

ret = to_torch_as(minibatch.returns, result.act)

This can be fixed by checking self.action_type of the PGPolicy object and transforming the return values accordingly.

@Trinkle23897
Copy link
Collaborator

How about

ret = to_torch(minibatch.returns, result.act.device, torch.float)

@Trinkle23897 Trinkle23897 added the bug Something isn't working label Feb 4, 2022
@Kenneth-Schroeder
Copy link
Contributor Author

works as well :)

@Kenneth-Schroeder Kenneth-Schroeder changed the title BUG - policy gradient learn function casting episode returns to int when using discrete actions policy gradient learn function casting episode returns to int when using discrete actions Feb 4, 2022
@Trinkle23897
Copy link
Collaborator

Thanks for pointing it out! Are you willing to submit a pull request to fix this issue? (I believe it occurs many times)

@Kenneth-Schroeder
Copy link
Contributor Author

Yes, I will take a look at it :)

@Kenneth-Schroeder
Copy link
Contributor Author

Done for PGPolicy. The parameter order is actually ret = to_torch(minibatch.returns, torch.float, result.act.device) :)

@Kenneth-Schroeder
Copy link
Contributor Author

Kenneth-Schroeder commented Feb 4, 2022

I fixed code style, ran tests etc. and also checked all other occurrences of to_torch_as(...), but none other looked suspicious. PR should be ready for merge. @Trinkle23897

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants