# 6 Value Function Approximation

## Large-Scale Reinforcement Learning
Reinforcement learning can be used to solve large problems, e.g.
* Backgammon: $10^{20}$ states 
* Computer Go: $10^{170}$ states 
* Helicopter: continuous state space
How can we **scale up** the model-free methods for prediction and control from the last two lectures?

So far we have represented value function by a **lookup table** 
* Every state s has an entry V(s)
* Or every state-action pair s,a has an entry Q(s,a)

Problem with large MDPs:
* There are too many states and/or actions to store in memory 
* It is too slow to learn the value of each state individually

## Value Function Approximation
Solution for large MDPs:
* Estimate value function with function approximation
$$ \hat{v} ( s , w ) \approx v_π ( s ) $$
or $$ \hat{q} ( s , a , w ) \approx q_π ( s , a )
$$
* Generalise from seen states to unseen states 
* Update parameter w using MC or TD learning

<img width=600 src="images/rl-fa-types.png" />
<img width=600 src="images/rl-fa-choose.png" />

## 6.1 Incremental Methods

### 6.1.1 Gradient Decent 
<img width=600 src="images/rl-fa-gd.png" />
<img width=600 src="images/rl-fa-sgd.png" />

### 6.1.2 Linear Function Approximation
<img width=600 src="images/rl-fa-feature.png" />

<img width=600 src="images/rl-fa-linear.png" />

<img width=600 src="images/rl-fa-table-lookup-features.png" />


### 6.1.3 Incremental Prediction Algorithms
<img width=600 src="images/rl-fa-incremental-prediction-algo.png" />

<img width=600 src="images/rl-fa-mc.png" />

<img width=600 src="images/rl-fa-td0.png" />

<img width=600 src="images/rl-fa-td-lambda2.png" />


### 6.1.4 Incremental Control Algorithms
<img width=600 src="images/rl-fa-control.png" />

<img width=600 src="images/rl-fa-action-value-fa.png" />

<img width=600 src="images/rl-fa-linear-action-value-fa.png" />

<img width=600 src="images/rl-fa-incremental-control-algo.png" />


### 6.1.5 Convergence
<img width=600 src="images/rl-fa-convergence-prediction.png" />
<img width=600 src="images/rl-fa-convergence-gradient-td.png" />
<img width=600 src="images/rl-fa-convergence-control.png" />

## 6.2 Batch Methods
### Batch Reinforcement Learning
* Gradient descent is simple and appealing
* But it is not sample efficient
* Batch methods seek to find the best fitting value function 
* Given the agent’s experience (“training data”)

### 6.2.1 Least Squares Prediction

### 6.2.2 Least Squares Control


# 7 Policy Gradient Methods
# 8 Integrating Learning and Planning
# 9 Exploration and Exploitation
# 10 Case study - RL in games
