### sklearn.model_selection.KFold
* _class_ sklearn.model_selection.KFold(_n_splits=5_, _*_, _shuffle=False_, _random_state=None_)[[source]](https://github.com/scikit-learn/scikit-learn/blob/f3f51f9b6/sklearn/model_selection/_split.py#L365)[¶](https://scikit-learn.org/1.1/modules/generated/sklearn.model_selection.KFold.html?highlight=kfold#sklearn.model_selection.KFold "Permalink to this definition")

Parameters:

**n_splits**int, default=5

Number of folds. Must be at least 2.

Changed in version 0.22: `n_splits`  default value changed from 3 to 5.

**shuffle**bool, default=False

Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled.

**random_state**int, RandomState instance or None, default=None

When  `shuffle`  is True,  `random_state`  affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. Pass an int for reproducible output across multiple function calls. See  [Glossary](https://scikit-learn.org/1.1/glossary.html#term-random_state).

In [5]:
from sklearn.datasets import load_iris
iris = load_iris()

In [6]:
iris

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [7]:
iris.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

### pandas.DataFrame
_class_ pandas.DataFrame(_data=None_,  _index=None_,  _columns=None_,  _dtype=None_,  _copy=None_)[[source]](https://github.com/pandas-dev/pandas/blob/v1.5.3/pandas/core/frame.py#L475-L11996)[#](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame "Permalink to this definition")

In [8]:
import pandas as pd

iris_df = pd.DataFrame(iris.data , columns=iris.feature_names )

In [None]:
iris_df

In [9]:
iris_df['label'] = iris.target

In [10]:
iris_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   label              150 non-null    int32  
dtypes: float64(4), int32(1)
memory usage: 5.4 KB


In [11]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import KFold

In [12]:
dt_clf = DecisionTreeClassifier(random_state=0)
kf = KFold(n_splits=5)

### sklearn.model_selection.train_test_split
sklearn.model_selection.train_test_split(_*arrays_, _test_size=None_, _train_size=None_, _random_state=None_, _shuffle=True_, _stratify=None_)[[링크]](https://github.com/scikit-learn/scikit-learn/blob/ff1023fda/sklearn/model_selection/_split.py#L2463)[¶](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split "Permalink to this definition")

In [13]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

In [14]:
y_train[24]

2

In [15]:
kf.split(X_train)

<generator object _BaseKFold.split at 0x0000014A6C711900>

In [16]:
for i , j  in kf.split(X_train):
    print(X_train[i])

[[7.7 2.8 6.7 2. ]
 [5.8 2.7 4.1 1. ]
 [5.2 3.4 1.4 0.2]
 [5.  3.5 1.3 0.3]
 [5.1 3.8 1.9 0.4]
 [5.  2.  3.5 1. ]
 [6.3 2.7 4.9 1.8]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.1 3.3 1.7 0.5]
 [5.6 2.7 4.2 1.3]
 [5.1 3.4 1.5 0.2]
 [5.7 3.  4.2 1.2]
 [7.7 3.8 6.7 2.2]
 [4.6 3.2 1.4 0.2]
 [6.2 2.9 4.3 1.3]
 [5.7 2.5 5.  2. ]
 [5.5 4.2 1.4 0.2]
 [6.  3.  4.8 1.8]
 [5.8 2.7 5.1 1.9]
 [6.  2.2 4.  1. ]
 [5.4 3.  4.5 1.5]
 [6.2 3.4 5.4 2.3]
 [5.5 2.3 4.  1.3]
 [5.4 3.9 1.7 0.4]
 [5.  2.3 3.3 1. ]
 [6.4 2.7 5.3 1.9]
 [5.  3.3 1.4 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 2.4 3.8 1.1]
 [6.7 3.  5.  1.7]
 [4.9 3.1 1.5 0.2]
 [5.8 2.8 5.1 2.4]
 [5.  3.4 1.5 0.2]
 [5.  3.5 1.6 0.6]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.9 3.2 5.7 2.3]
 [6.  2.7 5.1 1.6]
 [6.1 2.6 5.6 1.4]
 [7.7 3.  6.1 2.3]
 [5.5 2.5 4.  1.3]
 [4.4 2.9 1.4 0.2]
 [4.3 3.  1.1 0.1]
 [6.  2.2 5.  1.5]
 [7.2 3.2 6.  1.8]
 [4.6 3.1 1.5 0.2]
 [5.1 3.5 1.4 0.3]
 [4.4 3.  1.3 0.2]
 [6.3 2.5 4.9 1.5]
 [6.3 3.4 5.6 2.4]
 [4.6 3.4 1.4 0.3]
 [6.8 3.  5.

In [17]:
cv_accuracy=[]
n_iter=0
# KFold객체의 split() 호출하려면 폴드 별 학습용,검증용테스트의 로우 인덱스를 array로 반환
for train_index,valid_index in kf.split(X_train):
    # kfold.split() 으로 변환된 인덱스를 이용하여 학습용,검증용 데이터 추출
    x_train1, x_valid = X_train[train_index], X_train[valid_index]
    y_train1, y_valid = y_train[train_index], y_train[valid_index]
    # 학습 및 예측
    dt_clf.fit(x_train1,y_train1)
    pred = dt_clf.predict(x_valid)
    n_iter +=1
    # 반복 시 마다 정확도 측정
    # accuracy = np.round(np.mean(pred == y_valid),4)와 같다.
    accuracy = np.round(accuracy_score(y_valid,pred),4)
    print(accuracy)
    cv_accuracy.append(accuracy)
    train_size = x_train1.shape[0]
    test_size = x_valid.shape[0]
    print(f'{n_iter} 교차검증 정확도 : {accuracy} , 학습데이터의 크기 : {train_size} , 검증데이터의 크기 : {test_size}')
    print(f'{n_iter} 검증 세트 인덱스{valid_index}')


0.9583
1 교차검증 정확도 : 0.9583 , 학습데이터의 크기 : 96 , 검증데이터의 크기 : 24
1 검증 세트 인덱스[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
1.0
2 교차검증 정확도 : 1.0 , 학습데이터의 크기 : 96 , 검증데이터의 크기 : 24
2 검증 세트 인덱스[24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
0.8333
3 교차검증 정확도 : 0.8333 , 학습데이터의 크기 : 96 , 검증데이터의 크기 : 24
3 검증 세트 인덱스[48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71]
0.9583
4 교차검증 정확도 : 0.9583 , 학습데이터의 크기 : 96 , 검증데이터의 크기 : 24
4 검증 세트 인덱스[72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95]
0.9167
5 교차검증 정확도 : 0.9167 , 학습데이터의 크기 : 96 , 검증데이터의 크기 : 24
5 검증 세트 인덱스[ 96  97  98  99 100 101 102 103 104 105 106 107 108 109 110 111 112 113
 114 115 116 117 118 119]


In [18]:
cv_accuracy

[0.9583, 1.0, 0.8333, 0.9583, 0.9167]

In [19]:
import numpy as np
np.mean(cv_accuracy)

0.9333199999999999