## Cross-validating: evaluation estimate proformance

#### In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function. Let’s load the iris data set to fit a linear support vector machine on it:

In [2]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import svm

In [6]:
iris = datasets.load_iris()
iris.data.shape, iris.target.shape

((150, 4), (150,))

In [8]:
X_train,X_test,y_train,y_test = train_test_split(
...     iris.data, iris.target, test_size=0.4, random_state=1)

In [11]:
X_train.shape, y_train.shape

((90, 4), (90,))

In [12]:
X_test.shape, y_test.shape

((60, 4), (60,))

In [13]:
clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)

In [14]:
clf.score(X_test, y_test)    

0.9833333333333333

#### A solution to this problem is a procedure called cross-validation (CV for short). A test set should still be held out for final evaluation, but the validation set is no longer needed when doing CV. In the basic approach, called k-fold CV, the training set is split into k smaller sets (other approaches are described below, but generally follow the same principles). The following procedure is followed for each of the k “folds”:

    - A model is trained using k-1 of the folds as training data; (and all the set constain the D)
    - the resulting model is validated on the remaining part of the data (i.e., it is used as a test set to compute a performance measure such as accuracy).
    -  mean = 0.98 , std = 0.03 , So we should believe the accuracy or not ??
    

In [18]:
from sklearn.model_selection import cross_val_score
clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, iris.data, iris.target, cv=5)
scores                                    

array([0.96666667, 1.        , 0.96666667, 0.96666667, 1.        ])

#### The mean score and the 95% confidence interval of the score estimate are hence given by:

In [19]:
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

Accuracy: 0.98 (+/- 0.03)


#### Just as it is important to test a predictor on data held-out from training, preprocessing (such as standardization, feature selection, etc.) and similar data transformations similarly should be learnt from a training set and applied to held-out data for prediction:

In [20]:
>>> from sklearn import preprocessing
>>> X_train, X_test, y_train, y_test = train_test_split(
...     iris.data, iris.target, test_size=0.4, random_state=0)
>>> scaler = preprocessing.StandardScaler().fit(X_train)
>>> X_train_transformed = scaler.transform(X_train)
>>> clf = svm.SVC(C=1).fit(X_train_transformed, y_train)
>>> X_test_transformed = scaler.transform(X_test)
>>> clf.score(X_test_transformed, y_test)  

0.9333333333333333

In [39]:
X_train_transformed[0:5]

array([[ 0.18206758,  0.71103882,  0.45664061,  0.5584109 ],
       [-1.17402201,  0.00522823, -1.10334891, -1.19183221],
       [-0.04394735, -0.93585257,  0.77939706,  0.93346299],
       [-0.26996228, -0.93585257,  0.29526238,  0.1833588 ],
       [-0.26996228, -0.46531217, -0.02749407,  0.1833588 ]])

In [40]:
X_train[0:5]

array([[6. , 3.4, 4.5, 1.6],
       [4.8, 3.1, 1.6, 0.2],
       [5.8, 2.7, 5.1, 1.9],
       [5.6, 2.7, 4.2, 1.3],
       [5.6, 2.9, 3.6, 1.3]])

In [47]:
>>> from sklearn.pipeline import make_pipeline
>>> clf = make_pipeline(preprocessing.StandardScaler(), svm.SVC(C=1))
>>> cv = 5
>>> cross_val_score(clf, iris.data, iris.target, cv=cv)

array([0.96666667, 0.96666667, 0.96666667, 0.93333333, 1.        ])

####  The cross_validate function and multiple metric evaluation¶
- The cross_validate function differs from cross_val_score in two ways -

-    It allows specifying multiple metrics for evaluation. 可以指定多个评估指标
-    It returns a dict containing training scores, fit-times and score-times in addition to the test score.
-    For single metric evaluation, where the scoring parameter is a string, callable or None, the keys will be --['test_score', 'fit_time', 'score_time']

-  And for multiple metric evaluation, the return value is a dict with the following keys -- ['test_<scorer1_name>', 'test_<scorer2_name>', 'test_<scorer...>', 'fit_time', 'score_time']

-    return_train_score is set to True by default. It adds train score keys for all the scorers. If train scores are not needed, this should be set to False explicitly.

    <br>precision: 查准率，在所有的预测positive中查看真正positive的比例
    <br>recall: 查全率，Recall = TP/(TP+FN), FN:false and negative means predict is right

In [50]:
>>> from sklearn.model_selection import cross_validate
>>> from sklearn.metrics import recall_score
>>> scoring = ['precision_macro', 'recall_macro']
>>> clf = svm.SVC(kernel='linear', C=1, random_state=0)
>>> scores = cross_validate(clf, iris.data, iris.target, scoring=scoring,
...                         cv=5, return_train_score=False)
>>> sorted(scores)

['fit_time', 'score_time', 'test_precision_macro', 'test_recall_macro']

In [54]:
scores['fit_time']

array([0.00095034, 0.00079513, 0.0005827 , 0.00054693, 0.00042701])

####   Obtaining predictions by cross-validation
    the function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when it was in the test set. Only cross-validation strategies that assign all elements to a test set exactly once can be used (otherwise, an exception is raised).

In [56]:
>>> from sklearn.model_selection import cross_val_predict
>>> from sklearn.metrics import accuracy_score
>>> predicted = cross_val_predict(clf, iris.data, iris.target, cv=10)
>>> accuracy_score(iris.target, predicted)

0.9733333333333334