optimal policy distribution #26

kubic71 · 2019-10-15T21:16:34Z

Is pi*(s) probability distribution, or does it return just the best action? Slide number 5 says policy pi computes probability distribution...

foxik · 2019-10-17T09:43:20Z

Generally π is any probability distribution. If there is only one optimal action, optimal policy must deterministically choose that action. If there are multiple optimal actions, the optimal policy can be any distribution over them. I added a note to the slide saying that in case of multiple actions minimizing q_*(s, a), the policy can stochastically choose any of them.

foxik closed this as completed Oct 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimal policy distribution #26

optimal policy distribution #26

kubic71 commented Oct 15, 2019

foxik commented Oct 17, 2019

optimal policy distribution #26

optimal policy distribution #26

Comments

kubic71 commented Oct 15, 2019

foxik commented Oct 17, 2019