-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The choice of the actiion decoder #11
Comments
Hi there, Thank you for your congratulatory words. To answer your questions
Theoretically there is no difference between using categorical distribution and MLP + softmax. Personally, I found using torch distributions to be convenient since they implement uniformed interfaces that can work with different strategies to model action heads.
Sure, in the discrete case, let's say |
Great thanks for your @yunfanjiang reply and informative hints!
|
Thanks for the followup. To answer them
Hope these would be helpful. |
Hi, I noticed that you used torch.distributions.Distribution after MLP to get the final output, could you share some insights about this choice? What's the advantage compared with the direct usage of MLP and softmax?
Also, for the training procedure, should we ignore that header, and direct apply NLL loss with the output of MLP, or should we apply the NLL with the probability of that distribution? If also, could you give some simple code snippets to demonstrate the training usage?
BTW, congrats on the acceptance of ICML, well done!
Bests,
The text was updated successfully, but these errors were encountered: