## Cross Validation

- **Cross Validation helps us know the effectiveness of a model. It has the following steps -**<br>
<ul>
    <li>Reserve a sample data set from the full data set</li>
    <li>Train the model using the remaining part of the data set</li>
    <li>Use the reserved sample data set test to perform validation.</li>
    <li>If the model delivers a positive result on validation data, it is an effective model</li>
</ul>

- **Common methods for Cross Validation:**<br>
<ul>
    <li> Validation Set Approach -<br> reserve 50% of dataset for validation and rest 50% for model training. A disadvantage is that we train the model on only 50% of the data set, so we may leave some interesting information about data i.e. higher bias.</li>
    <li> Leave one out cross validation (LOOCV) -<br> reserve only one data-point of the available data set and use the rest for training. Disadvantage is higher variation in the test results as it depends on which data point is chosen.</li>
    <li> k-fold cross validation -<br> Randomly split the entire dataset into k folds. For each fold, train the model on the remaining k – 1 folds  and test using the kth fold. The average of the k recorded errors is the cross-validation error.</li>
</ul>

<img src='./CrossValidation1.png' width = 400  height = 400 >

- The data is split into training data and test data. The training set contains a known output and the model learns on this data in order to be generalized to other data later on. The test dataset is used to test the model’s prediction this subset.

<img src='./CrossValidation2.png' width = 400  height = 400 >

In [1]:
# importing libraries

from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

In [6]:
# loading dataset

iris = load_iris()

logreg = LogisticRegression()

# by default cv = 3
scores = cross_val_score(logreg, iris.data, iris.target)
print("Cross-validation scores: {}".format(scores))

Cross-validation scores: [0.96078431 0.92156863 0.95833333]


In [3]:
scores = cross_val_score(logreg, iris.data, iris.target, cv=5)
print("Cross-validation scores: {}".format(scores))

Cross-validation scores: [1.         0.96666667 0.93333333 0.9        1.        ]


In [4]:
# calculating score mean
print("Average cross-validation score: {:.2f}".format(scores.mean()))

Average cross-validation score: 0.96
