# Supervised Learning Model Evaluation Lab

Complete the exercises below to solidify your knowledge and understanding of supervised learning model evaluation.

In [1]:
import pandas as pd

## Regression Model Evaluation

Load the boston dataset using sklearn and get the datasets X and y containing the target and the rest of the variables

In [2]:
from sklearn.datasets import load_boston

boston = load_boston()

print(boston.DESCR)

.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pu

In [3]:
X = boston.data
y = boston.target

### Split this data set into training (80%) and testing (20%) sets.

The `MEDV` field represents the median value of owner-occupied homes (in $1000's) and is the target variable that we will want to predict.

In [4]:
boston_full = pd.DataFrame(data=boston.data, columns=boston.feature_names)
boston_full['MEDV'] = boston.target

In [5]:
boston_full.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


In [6]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)

### Train a `LinearRegression` model on this data set and generate predictions on both the training and the testing set.

In [7]:
from sklearn.linear_model import LinearRegression

lr_model = LinearRegression()

lr_model.fit(X_train, y_train)

y_pred_train = lr_model.predict(X_train)
y_pred_test = lr_model.predict(X_test)

### Calculate and print R-squared for both the training and the testing set.

In [8]:
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

"""
Root mean squared error (RMSE). As the name suggests, it is the square root of the MSE. Because the MSE is squared, 
its units do not match that of the original output. Researchers will often use RMSE to convert the error metric back 
into similar units, making interpretation easier. Since the MSE and RMSE both square the residual, they are similarly 
affected by outliers. The RMSE is analogous to the standard deviation (MSE to variance) and is a measure of how large 
your residuals are spread out. Both MAE and MSE can range from 0 to positive infinity, so as both of these measures 
get higher, it becomes harder to interpret how well your model is performing.
"""

print(f'R-squared for training set: {round(r2_score(y_train, y_pred_train), 3)}')
print(f'R-squared for training set: {round(r2_score(y_test, y_pred_test), 3)}')

R-squared for training set: 0.751
R-squared for training set: 0.669


### Calculate and print mean squared error for both the training and the testing set.

In [9]:
"""
The Mean Squared Error (MSE) or Mean Squared Deviation (MSD) is just like the MAE, but squares the difference before 
summing them all instead of using the absolute value. It measures the average of error squares i.e. the average squared 
difference between the estimated values and true value. 

Because we are squaring the difference, the MSE will almost always be bigger than the MAE. For this reason, 
we cannot directly compare the MAE to the MSE. We can only compare our model’s error metrics to those of a 
competing model. The effect of the square term in the MSE equation is most apparent with the presence of outliers 
in our data. While each residual in MAE contributes proportionally to the total error, the error grows quadratically 
in MSE. This ultimately means that outliers in our data will contribute to much higher total error in the MSE than they 
would the MAE. Similarly, our model will be penalized more for making predictions that differ greatly from the 
corresponding actual value. This is to say that large differences between actual and predicted are punished more in 
MSE than in MAE.
"""

print(f'Mean squared error for training set: {round(mean_squared_error(y_train, y_pred_train), 3)}')
print(f'Mean squared error for training set: {round(mean_squared_error(y_test, y_pred_test), 3)}')

Mean squared error for training set: 21.641
Mean squared error for training set: 24.291


### Calculate and print mean absolute error for both the training and the testing set.

In [10]:
"""
The mean absolute error (MAE) is the simplest regression error metric to understand. We’ll calculate the residual for 
every data point, taking only the absolute value of each so that negative and positive residuals do not cancel out. 
We then take the average of all these residuals. Effectively, MAE describes the typical magnitude of the residuals.

Because we use the absolute value of the residual, the MAE does not indicate underperformance or overperformance of 
the model (whether or not the model under or overshoots actual data). Each residual contributes proportionally to the 
total amount of error, meaning that larger errors will contribute linearly to the overall error. Like we’ve said above, 
a small MAE suggests the model is great at prediction, while a large MAE suggests that your model may have trouble in 
certain areas. A MAE of 0 means that your model is a perfect predictor of the outputs (but this will almost never happen).

While the MAE is easily interpretable, using the absolute value of the residual often is not as desirable as squaring 
this difference. Depending on how you want your model to treat outliers, or extreme values, in your data, you may want 
to bring more attention to these outliers or downplay them. The issue of outliers can play a major role in which error 
metric you use.
"""

print(f'Mean absolute error for training set: {round(mean_absolute_error(y_train, y_pred_train), 3)}')
print(f'Mean absolute error for training set: {round(mean_absolute_error(y_test, y_pred_test), 3)}')

Mean absolute error for training set: 3.315
Mean absolute error for training set: 3.189


## Classification Model Evaluation

Load the iris dataset using sklearn and get the datasets X and y containing the target and the rest of the variables

In [11]:
from sklearn.datasets import load_iris

iris = load_iris()

X = iris.data
y = iris.target

### Split this data set into training (80%) and testing (20%) sets.

The `class` field represents the type of flower and is the target variable that we will want to predict.

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y, test_size=0.2)

### Train a `LogisticRegression` model on this data set and generate predictions on both the training and the testing set.

In [13]:
from sklearn.linear_model import LogisticRegression

logr = LogisticRegression()
logr.fit(X_train, y_train)

y_pred_train = logr.predict(X_train)
y_pred_test = logr.predict(X_test)



### Calculate and print the accuracy score for both the training and the testing set.

In [14]:
print("Training set accuracy: {:.3f}".format(logr.score(X_train, y_train)))
print("Test set accuracy: {:.3f}".format(logr.score(X_test, y_test)))

Training set accuracy: 0.958
Test set accuracy: 0.967


### Calculate and print the balanced accuracy score for both the training and the testing set.

In [15]:
from sklearn.metrics import balanced_accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

print("Training set balanced accuracy: {:.3f}".format(balanced_accuracy_score(y_train, y_pred_train)))
print("Test set balanced accuracy: {:.3f}".format(balanced_accuracy_score(y_test, y_pred_test)))

Training set balanced accuracy: 0.958
Test set balanced accuracy: 0.967


### Calculate and print the precision score for both the training and the testing set.

In [16]:
print("Training set precision score: {:.3f}".format(precision_score(y_train, y_pred_train, average='weighted')))
print("Test set precision score: {:.3f}".format(precision_score(y_test, y_pred_test, average='weighted')))

Training set precision score: 0.960
Test set precision score: 0.970


### Calculate and print the recall score for both the training and the testing set.

In [17]:
print("Training set recall score: {:.3f}".format(recall_score(y_train, y_pred_train, average='weighted')))
print("Test set recall score: {:.3f}".format(recall_score(y_test, y_pred_test, average='weighted')))

Training set recall score: 0.958
Test set recall score: 0.967


### Calculate and print the F1 score for both the training and the testing set.

In [18]:
print("Training set F1 score: {:.3f}".format(f1_score(y_train, y_pred_train, average='weighted')))
print("Test set F1 score: {:.3f}".format(f1_score(y_test, y_pred_test, average='weighted')))

Training set F1 score: 0.958
Test set F1 score: 0.967


### Generate confusion matrices for both the training and the testing set.

In [19]:
# train

print(f'Confussion Matrix for train data:\n{confusion_matrix(y_train, y_pred_train)}')

Confussion Matrix for train data:
[[40  0  0]
 [ 0 36  4]
 [ 0  1 39]]


In [20]:
# test

print(f'Confussion Matrix for test data:\n{confusion_matrix(y_test, y_pred_test)}')

Confussion Matrix for test data:
[[10  0  0]
 [ 0  9  1]
 [ 0  0 10]]


## Bonus: For each of the data sets in this lab, try training with some of the other models you have learned about, recalculate the evaluation metrics, and compare to determine which models perform best on each data set.