## 6.2.1 Estimation Model Evaluation using Python

The following Python codes demonstrate examples of how we can derive the error measures for a model's performance evaluation. In the examples, we use a for loop to split the initial data set into different sizes of test sets, i.e., 20% (ratio of 0.2), 30% (ratio of 0.3), and 40% (ratio of 0.4). Then, compare the model performance based on different test sets.

The comments embedded in the codes give descriptions to guide the rationale of the programming logic.

In [1]:
# import necessary libraries
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
from sklearn.model_selection import train_test_split 
from sklearn import metrics 
import math
from sklearn.linear_model import LinearRegression

# specify dataset source, train and test sets
df = pd.read_csv('data/ChurnFinal.csv')

# need to convert categorical to numeric for Python Estimation modeling
df.loc[df['Churn'] == 'yes', 'Churn'] = 1
df.loc[df['Churn'] == 'no', 'Churn'] = 0
df['Churn'] = pd.to_numeric(df['Churn'], errors='coerce').astype('float')

# specify inputs and label
df_inputs = pd.get_dummies(df[['Gender', 'Age', 'PostalCode', 'Cash', 'CreditCard', 
           'Cheque', 'SinceLastTrx', 'SqrtTotal', 'SqrtMax', 'SqrtMin']])
df_label = df['Churn']

# create a multiple linear regression model object
model = LinearRegression() 

# model performance on different test sets 20%, 30%, 40% 
lowest = 1
best_sample = 0.0
for test_sample in (0.2, 0.3, 0.4): 
        # fit the decision tree regressor     
        X_train, X_test, y_train, y_test = train_test_split(df_inputs, df_label, 
                test_size=test_sample, random_state=7) 
        model.fit(X_train, y_train)
        #Predict the response for test dataset
        y_predict = model.predict(X_test)
        print('Test set size ', test_sample,
           ': MSE =', round(metrics.mean_squared_error(y_test, y_predict),3), #MSE
           ', RMSE =', round(math.sqrt(metrics.mean_squared_error(y_test, y_predict)),3), #RMSE
           ', MAE =', round(metrics.mean_absolute_error(y_test, y_predict),3), #MAE
           ', R2 =', round(metrics.r2_score(y_test, y_predict),3) ) #R2

Test set size  0.2 : MSE = 0.197 , RMSE = 0.444 , MAE = 0.388 , R2 = 0.21
Test set size  0.3 : MSE = 0.182 , RMSE = 0.426 , MAE = 0.372 , R2 = 0.271
Test set size  0.4 : MSE = 0.19 , RMSE = 0.436 , MAE = 0.377 , R2 = 0.239


The results show that the model trained on the churn data with a 0.7 ratio of the training set (i.e., 0.3 for the test set) makes fewer errors and achieves the highest R2 in prediction compared to other training set sizes of 0.6 and 0.8. 

**NOTE**:  For a detailed explanation of the Python API performance metrics() parameters, refer to the official website, https://scikit-learn.org/stable/modules/model_evaluation.html