## Model Evaluation

**Goals of model evaluation**<br>
**1.** Find the best model that represents the data<br>
**2.** Determine how well the model will work in the future 

Evaluating model performance with the data used for training is not acceptable in data mining because it can easily generate overoptimistic and over fitted models. 

**Methods used for model evaluation**<br>
**1.** Hold-Out<br>
**2.** Cross Validation<br>

**Model performance evaluation can be**<br>
**1.** Classification Evaluation<br>
**2.** Regression Evaluation<br>


**Hold-Out** - Dataset (large) is randomly divided to three subsets:<br>
**1.** Training set is a subset of the dataset used to build predictive models.<br>
**2.** Validation set is a subset of the dataset used to assess the performance of model built in the training phase. It provides a test platform for fine tuning model's parameters and selecting the best-performing model. Not all modeling algorithms need a validation set.<br>
**3.** Test set or unseen examples is a subset of the dataset to assess the likely future performance of a model. If a model fit to the training set much better than it fits the test set, overfitting is probably the cause.<br>

**Cross Validation** - Divide the data into k subsets of equal size. We build models k times, each time leaving out one of the subsets from training and use it as the test set.

<img src='./CrossValidation1.png' width = 400  height=400 >

In [17]:
# Cross-Validation
# Load libraries
import numpy as np
from sklearn import datasets
from sklearn import metrics
from sklearn.model_selection import KFold, cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

In [24]:
# Load the digits dataset
digits = datasets.load_digits()

In [30]:
# Create the features matrix
X = digits.data
X, X.shape

(array([[ 0.,  0.,  5., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ..., 10.,  0.,  0.],
        [ 0.,  0.,  0., ..., 16.,  9.,  0.],
        ...,
        [ 0.,  0.,  1., ...,  6.,  0.,  0.],
        [ 0.,  0.,  2., ..., 12.,  0.,  0.],
        [ 0.,  0., 10., ..., 12.,  1.,  0.]]), (1797, 64))

In [31]:
# Create the target vector
y = digits.target
y, y.shape

(array([0, 1, 2, ..., 8, 9, 8]), (1797,))

In [35]:
# Create Pipeline
# Create standardizer
standardizer = StandardScaler()

# Create logistic regression
logit = LogisticRegression()

# Create a pipeline that standardizes, then runs logistic regression
pipeline = make_pipeline(standardizer, logit)

In [41]:
# Create k-Fold cross-validation
kf = KFold(n_splits = 10, shuffle = True, random_state = 1)
kf

KFold(n_splits=10, random_state=1, shuffle=True)

In [42]:
# Conduct k-Fold Cross-Validation
# Do k-fold cross-validation
cv_results = cross_val_score(pipeline,    # Pipeline
                             X,           # Feature matrix
                             y,           # Target vector
                             cv = kf,     # Cross-validation technique
                             scoring = "accuracy",      # scores of the estimator
                             n_jobs = -1)               # Use all CPU cores
# Returns array of scores of the estimator for each run of the cross validation.

In [38]:
# Calculate Mean Performance Score
cv_results, cv_results.shape

(array([0.97222222, 0.97777778, 0.95555556, 0.95      , 0.95555556,
        0.98333333, 0.97777778, 0.96648045, 0.96089385, 0.94972067]), (10,))

In [39]:
# Calculate mean
cv_results.mean()

0.964931719428926