## How good is our algorithm for this data set?

Here are 4 of the main metrics of error when it comes to linear regression. $\hat{y}$ is our model's predicted value:
- __Mean Absolute Error:__ This one treats all error equally. In other words, major outlier don't have a up-scaled effect on the total error when compared to smaller error terms.

### $$\epsilon_{mae} = \frac{1}{n}\sum_{i=0}^n |y_i - \hat{y_i}|$$


- __Mean Squared Error:__ This type of error penalizes error terms by squaring them. This might result in one or two outliers in the data causing an over-estimation of error in the model. On the other hand, if large outliers are particularly undesireable, this method will notify us of that type of error more quickly. 

### $$\epsilon_{mse} = \frac{1}{n}\sum_{i=0}^n (y_i - \hat{y_i})^2$$

- __Root Mean Squared Error:__ The square root of Mean Squared Error. It's pretty hard to find any reading about when or or why to use this instead of just using MSE. 

### $$\epsilon_{rmse} = \sqrt{\frac{1}{n}\sum_{i=0}^n (y_i - \hat{y_i})^2}$$

- __$R^2$ AKA Coefficient of Determination:__ Conceptually, $R^2$ is a measure of performance that tells us how much better (or worse) our model is than a model who just predicts the average every time. Mathematically, we are taking the difference the variance of the dependant variable from the average value and the variance of the actual values from the predicted values of our model. We want to see which one is larger. In the formula below, note that if variance from the mean is larger than variance from the predicted values, we get a positive number. If the variance of the data from our predicted values is larger, we get a negative number. 

### $$R^2 = \frac{Var(Y) - Var(Y)_{reg}}{Var(Y)} = 1 - \frac{Var(Y)_{reg}}{Var(Y)} $$

## Further Explanation of $R^2$ Values

To explain $Var(Y)_{reg}$ the "variance of the regression model", let's look at the general definition of variance using $N$ equally likely outcomes:

### $$\text{Var}(Y) = E[(Y - E[Y])^2] = \frac{1}{N}\sum  ({y_i} - {\mu})^2 = \sigma^2$$

In this general definition, we are looking at the expectation of the sqared difference between the variable and its average value $E[(Y - \mu)^2]$. When we want to talk about the variance of the regression model, we are talking about the expectation of the squared difference between the actual values and our predicted ones. This is technically termed the __residual sum of squares__: 

### $$\text{Var}(Y)_{regression} = \frac{1}{N}\sum ({y_i} - \hat{y})^2 = \epsilon_{mse}$$

Notice that this is just the definition of _mean squared error_ from above. 

### The values for these various error metrics can easily be calculated by Scikit-Learn:

In [30]:

from IPython.utils import io
with io.capture_output() as captured:
    %run linear_regression_02.ipynb 

mae = metrics.mean_absolute_error(y_test, y_pred)
mse = metrics.mean_squared_error(y_test, y_pred)
rmse = np.sqrt(metrics.mean_squared_error(y_test, y_pred))
r2 = metrics.r2_score(y_test, y_pred)

r2

0.7031965697808809

Deciding what kind of error metric to use is application specific. If your data has outliers that are just noise rather than actual values that you might care about, MAE might be a better choice. It is less sensitive to outliers and therefore will give a more accurate accounting of error for the context in which outliers matter less. However, if your data has large outliers that might be particularly meaningful, we would want to know how they impact our model. In this case, using MSE would be a more appropriate choice. 