-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
policy gradient learn function casting episode returns to int when using discrete actions #520
Comments
How about ret = to_torch(minibatch.returns, result.act.device, torch.float) |
works as well :) |
Thanks for pointing it out! Are you willing to submit a pull request to fix this issue? (I believe it occurs many times) |
Yes, I will take a look at it :) |
Done for PGPolicy. The parameter order is actually |
I fixed code style, ran tests etc. and also checked all other occurrences of |
In the
learn(...)
function of the PGPolicy class, theto_torch_as(...)
function is used on the return values of the input batch to convert them to a tensor of similar characteristics as the actions. When using discrete actions, this results in a problem as continuous rewards/returns are casted to integers.tianshou/tianshou/policy/modelfree/pg.py
Line 134 in c25926d
This can be fixed by checking
self.action_type
of the PGPolicy object and transforming the return values accordingly.The text was updated successfully, but these errors were encountered: