# Cross Validation on W3 Schools

Perform cross validation on the Iris dataset

## Prepare constant material

### Import dataset

In [1]:
from sklearn import datasets

X, y = datasets.load_iris(return_X_y=True)

### Build model

In [2]:
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(random_state=42)

### Display results function

In [3]:
def display_results(scores):
    print("Cross Validation scores:", scores)
    print("Average CV score:", scores.mean())
    print('Number of CV scores used in average:', len(scores))

## K-Fold method

Split the data into k number of smaller sets to validated model.

### Import Cross-validation method

In [4]:
from sklearn.model_selection import KFold, cross_val_score

### Evaluate model

In [5]:
k_folds = KFold(n_splits=5)

scores = cross_val_score(clf, X, y, cv=k_folds)

### Display results

In [6]:
display_results(scores)

Cross Validation scores: [1.         1.         0.83333333 0.93333333 0.8       ]
Average CV score: 0.9133333333333333
Number of CV scores used in average: 5


## Stratified K-Fold method

Account for imbalance in train and test classes

### Import cross-validation method

In [7]:
from sklearn.model_selection import StratifiedKFold

### Evaluate model

In [8]:
sk_folds = StratifiedKFold(n_splits = 5)

scores = cross_val_score(clf, X, y, cv=sk_folds)

### Display results

In [9]:
display_results(scores)

Cross Validation scores: [0.96666667 0.96666667 0.9        0.93333333 1.        ]
Average CV score: 0.9533333333333334
Number of CV scores used in average: 5


## Leave-One-Out

Use all points but one to train the model

### Import cross validation method

In [10]:
from sklearn.model_selection import LeaveOneOut

### Evaluate model

In [12]:
loo = LeaveOneOut()

scores = cross_val_score(clf, X, y, cv=loo)

### Display results

In [13]:
display_results(scores)

Cross Validation scores: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1.
 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 0. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1.]
Average CV score: 0.94
Number of CV scores used in average: 150


## Leave-P-Out

Similar to Leave-One-Out, but leaving a number 'p' out

### Import cross validation method

In [14]:
from sklearn.model_selection import LeavePOut

### Evaluate model

In [16]:
lpo = LeavePOut(p=2)

scores = cross_val_score(clf, X, y, cv=lpo)

### Display Results

In [17]:
display_results(scores)

Cross Validation scores: [1. 1. 1. ... 1. 1. 1.]
Average CV score: 0.9382997762863534
Number of CV scores used in average: 11175
