Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimal policy distribution #26

Closed
kubic71 opened this issue Oct 15, 2019 · 1 comment
Closed

optimal policy distribution #26

kubic71 opened this issue Oct 15, 2019 · 1 comment

Comments

@kubic71
Copy link

kubic71 commented Oct 15, 2019

image

Is pi*(s) probability distribution, or does it return just the best action? Slide number 5 says policy pi computes probability distribution...

@foxik
Copy link
Member

foxik commented Oct 17, 2019

Generally π is any probability distribution. If there is only one optimal action, optimal policy must deterministically choose that action. If there are multiple optimal actions, the optimal policy can be any distribution over them. I added a note to the slide saying that in case of multiple actions minimizing q_*(s, a), the policy can stochastically choose any of them.

@foxik foxik closed this as completed Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants