K-fold cross-validation is a technique to evaluate a machine learning model's performance by splitting the dataset into *k* subsets (folds). The model is trained on *k-1* folds and tested on the remaining fold, repeating this *k* times with different test folds. The final performance is averaged across all *k* iterations, reducing overfitting and improving reliability.

In [1]:
from sklearn.datasets import load_wine
from sklearn.model_selection import KFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier
import numpy as np

# Load the Iris dataset
iris = load_wine() # classification

X, y = iris.data, iris.target

# Define the K-Fold cross-validator with 5 folds
kf = KFold(n_splits=5, shuffle=True, random_state=42)

# Initialize a RandomForest classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Perform cross-validation and compute accuracy for each fold
scores = cross_val_score(model, X, y, cv=kf, scoring='accuracy')

# Print accuracy scores for each fold and mean accuracy
print(f"Accuracy for each fold: {scores}")
print(f"Mean accuracy: {np.mean(scores):.4f}")


Accuracy for each fold: [1.         1.         0.94444444 0.97142857 1.        ]
Mean accuracy: 0.9832
