## How good is our algorithm for this data set?

Here are 4 of the main metrics of error when it comes to linear regression.:
- __Mean Absolute Error:__ This one treats all error equally. In other words, major outlier don't have a up-scaled effect on the total error when compared to smaller error terms.

### $$\epsilon_{mae} = \frac{1}{n}\sum_{i=0}^n |y_i - \hat{y_i}|$$


- __Mean Squared Error:__ This type of error penalizes error terms by squaring them. This might result in one or two outliers in the data causing an over-estimation of error in the model. On the other hand, if large outliers are particularly undesireable, this method will notify us of that type of error more quickly. 

### $$\epsilon_{mse} = \frac{1}{n}\sum_{i=0}^n (y_i - \hat{y_i})^2$$

- __Root Mean Squared Error:__ The square root of Mean Squared Error. It's pretty hard to find any reading about when or or why to use this instead of just using MSE. 

### $$\epsilon_{rmse} = \sqrt{\frac{1}{n}\sum_{i=0}^n (y_i - \hat{y_i})^2}$$

- __$R^2$ AKA Coefficient of Determination:__ This is just the square of the correlation coefficient. In other words $R^2 = \rho^2$. It can also be calculated with the following formula: 

### $$R^2 = \frac{Var(X,Y)_{mean} - Var(X,Y)_{regression}}{Var(X,Y)_{mean}} = 1 - \frac{\sigma^2_{regression}}{\sigma^2_{mean}} $$

### The values for these error metrics can easily be calculated by Scikit-Learn:

In [29]:

from IPython.utils import io
with io.capture_output() as captured:
    %run linear_regression_02.ipynb 

mae = metrics.mean_absolute_error(y_test, y_pred)
mse = metrics.mean_squared_error(y_test, y_pred)
rmse = np.sqrt(metrics.mean_squared_error(y_test, y_pred))
r2 = metrics.r2_score(y_test, y_pred)

r2

0.7031965697808809

Deciding what kind of error metric to use is application specific. If your data contains outl