Skip to content

Commit

Permalink
Vault backup: 2023-10-24 21:47:55
Browse files Browse the repository at this point in the history
  • Loading branch information
Darren Wong committed Oct 24, 2023
1 parent 6813b16 commit dfd561f
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions statistics/reinforcement-learning/4 RL Model-Free Prediction.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,16 @@

- Last lecture: how do we solve a MDP (find the optimal behaviour that maximises reward) where we already know the dynamics and rewards.
- Use DP to evaluate a policy, then use that as an inner loop to find the optimal policy.
- This lecture: model-free prediction, go directly from the experience the agent has to a value function/policy.
- This lecture: model-free prediction, go directly from the experience the agent has to a value function/policy with no prior knowledge of the MDP.
- Will break this down into policy evaluation, then use our methods for policy evaluation to help us do control.
- This lecture will focus on the policy evaluation/prediction; what is the value of a given policy.
- Next lecture: model-free control, find the optimal value function in the MDP
- Next lecture: model-free control, find the optimal value function in the MDP

## Monte-Carlo Reinforcement Learning

Monte-Carlo learning describes a class of methods that have the agent fully explore a trajectory then estimate the value of each state/action by looking at sample returns.

- Learn directly from episodes of experience, so we don't need a model prior to learning.
- Learns from *complete* episodes (i.e. play the full scenario and propagate rewards backwards). In other words, we *do not bootstrap*.
- Hence this only works for episodic MDPs - you need to terminate the episode for this to work.
- MC uses the simplest possible idea to estimate the value function. Take sample returns, and then estimate the value as the mean of observed returns.

0 comments on commit dfd561f

Please sign in to comment.