# The Bias - Variance Tradeoff
How do we quantify the types of error in our model?

One metric: Sum of Squared Errors (SSE) aka Residual Sum of Squares

![](https://wikimedia.org/api/rest_v1/media/math/render/svg/dc6fa99c7596bf227b458ab49937b57c99f0a50c)

$x_i$ -- a given x value

$y_i$ -- actual y value

$f(x_i)$-- the model's predicted y value

$$ SSE = \sum(y_i - \hat y)^2 $$

$\hat y $ -- predicted y value

### SSE can be decomposed into error due to Bias and Variance
What's bias?
*Your model makes assumptions about the shape of the data and consistently gets it wrong*

What's variance?
*Imagine building your model many times, on different slices of data. Variance is how much your predictions for a given x_i will differ each time you make a prediction*

$$ {\sum(\hat y - mean(\hat y)^2}\over{N} $$

$\hat y $ -- predicted y value

### Graphical Representation
Imagine each dart throw is a new prediction
![](../assets/images/bullseyes.png)

**A high-bias model**
![](../assets/images/linear-fit-quadratic.png)

**If we increase the complexity, the bias decreases**
![](../assets/images/quadratic-fit-quadratic.png)

All datasets contain error. Here's a subset of that same data with one error point as an outlier

We can fit a high-degree polynomial to these points to fit the training set perfectly.
![](../assets/images/variance_1.png)

However, when we repeat that model fit on new data, our prediction makes errors. This is error due to variance. 
![](../assets/images/variance_2.png)

## The tradeoff:
SSE decomposes into bias error + variance error + random error

![](https://camo.githubusercontent.com/be96d619bff8883343cf541ed1405a8f7f5991cc/68747470733a2f2f75706c6f61642e77696b696d656469612e6f72672f6d6174682f632f622f632f63626336353331306430396136656661363330643863316633336364666138382e706e67)
![](https://camo.githubusercontent.com/34d8f46b4220c71b359f55db15ed9124474b397d/687474703a2f2f73636f74742e666f72746d616e6e2d726f652e636f6d2f646f63732f646f63732f4269617356617269616e63652f6269617376617269616e63652e706e67)

## Conceptual definition
Take 5 minutes to read the definition of Bias and Variance error at this link:
http://scott.fortmann-roe.com/docs/BiasVariance.html

Jot down a definition for each in your own words

### Here's a sample of code for how to get variance and bias from an sklearn model

In [1]:
from sklearn import linear_model
regr = linear_model.LinearRegression()
regr.fit(X, Y)

yhat = regr.predict(X)
sse = np.mean((np.mean(yhat) - Y) ** 2)
var = np.var(yhat)
bias = sse - var - 0.01

NameError: name 'X' is not defined