Monte Carlo Cross-Validation (MCCV), also known as repeated random subsampling, randomly splits the dataset into training and testing sets multiple times. The model is trained and evaluated on different splits, and performance metrics are averaged. Unlike k-fold, the number of splits and train-test proportions are independent, allowing more flexibility but potentially overlapping test sets.

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import ShuffleSplit, cross_val_score
from sklearn.ensemble import RandomForestClassifier
import numpy as np

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Define the Monte Carlo cross-validator (ShuffleSplit) with 10 iterations
mc_cv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=42)

# Initialize a RandomForest classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Perform Monte Carlo cross-validation and compute accuracy for each iteration
scores = cross_val_score(model, X, y, cv=mc_cv, scoring='accuracy')

# Print accuracy scores for each iteration and mean accuracy
print(f"Accuracy for each iteration: {scores}")
print(f"Mean accuracy: {np.mean(scores):.4f}")


Accuracy for each iteration: [1.         0.96666667 0.96666667 0.93333333 0.93333333 1.
 0.9        0.96666667 1.         0.93333333]
Mean accuracy: 0.9600
