# Exercise 09

Run the individual classifiers from the previous exercise to make predictions on the validaiton set,
and create a new training set with the resulting predictions: each training instance is a vector containing
the set of predictions from all your classifiers for an image, and the target is the image's class.
Train a classifier on this new training set. Congratulations - you have just trained a blender, and
together with the classifiers it forms a stacking ensemble! Now evaluate the ensemble on the test set.
For each image in the test set, make predictions with all your classifiers, then feed the predictions
to the blender to get the ensemble's predictions. How does it compare to the voting classifier you trained
earlier? Now try again using a `StackingClassifier` instead. Do you get better performance? If so, why?

## Imports

In [27]:
from sklearn.svm import SVC
from sklearn.ensemble import (
    RandomForestClassifier,
    ExtraTreesClassifier,
    StackingClassifier,

)
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import numpy as np

## Common Functions

In [12]:
def train_and_evaluate_classifiers(
    classifiers, X_train, y_train, X_validation, y_validation
):
    """
    Train each classifier on the full training set and print its accuracy.

    Returns:
        trained_models: List of tuples (name, trained_classifier)
    """
    trained_models = []

    for name, clf in classifiers:
        fitted_clf, accuracy = evaluate_single_classifier(
            X_train, y_train, X_validation, y_validation, clf
        )
        print(f"{name}'s accuracy on the validation set is {accuracy:.4f}")

        trained_models.append((name, fitted_clf))

    return trained_models

def evaluate_single_classifier(X_train, y_train, X_validation, y_validation, clf):
    clf.fit(X_train, y_train)
    accuracy = clf.score(X_validation, y_validation)
    return clf, accuracy

In [6]:
def load_mnist_dataset():
    """Load the MNIST dataset"""
    mnist = fetch_openml("mnist_784", version=1, as_frame=False, parser="auto")
    X = mnist.data
    y = mnist.target
    return X, y


def split_mnist_dataset(
    X, y, validation_size=10_000, test_size=10_000, random_state=42
):
    """
    Split MNIST dataset into training, validation, and test sets.

    Args:
        X: Features (MNIST images)
        y: Labels (digits 0-9)
        validation_size: Number of samples for training set (default: 10_000)
        test_size: Number of samples for test set (default: 10_000)
        random_state: Random seed for reproducibility

    Returns:
        X_train, X_validation, X_test, y_train, y_validation, y_test: Split datasets

    Note:
        The validation set will contain all remaining samples not in train or test sets.
    """
    # First split to get the test set
    X_temp, X_test, y_temp, y_test = train_test_split(
        X, y, test_size=test_size, random_state=random_state
    )
    # split the remaining data to get training and validation sets
    X_train, X_validation, y_train, y_validation = train_test_split(
        X_temp, y_temp, test_size=validation_size, random_state=random_state
    )
    print(f"Training set size: {X_train.shape[0]}")
    print(f"Validation set size: {X_validation.shape[0]}")
    print(f"Test set size: {X_test.shape[0]}")

    return X_train, X_validation, X_test, y_train, y_validation, y_test

## 1. Split the Data

In [8]:
X, y = load_mnist_dataset()

X_train, X_validation, X_test, y_train, y_validation, y_test = split_mnist_dataset(
    X, y
)


Training set size: 50000
Validation set size: 10000
Test set size: 10000


## Step 2. Evaluate individual classifiers

In [11]:
classifiers = [
    ("random_forest", RandomForestClassifier(random_state=42)),
    ("extra_trees", ExtraTreesClassifier(random_state=42)),
    ("svc", SVC(probability=True, random_state=42)),
]
trained_models = train_and_evaluate_classifiers(
    classifiers, X_train, y_train, X_validation, y_validation
)

random_forest's accuracy on the validation set is 0.97
extra_trees's accuracy on the validation set is 0.97
svc's accuracy on the validation set is 0.98


## Step 3. Create blender's training set from out-of-sample predictions of individual classifiers

In [23]:
    # Blender matrix needs a feature matrix where each row is a training example (instance)
    # and a column in the index of each individual classifier
    validation_meta_features = np.column_stack(
        [clf.predict_proba(X_validation) for name, clf in trained_models]
    )        

## Steps 4 and 5. Train a classifier on the new training set and test it on the test set

In [34]:
    # Picking SVC since it performed best on the individual classifiers' level
    blender = LogisticRegression(random_state=42)

    # we also need meta features for the test set in order to assess correctly
    test_meta_features = np.column_stack(
        [clf.predict_proba(X_test) for _, clf in trained_models]
    )
    trained_blender, blender_acc = evaluate_single_classifier(
        validation_meta_features, y_validation, test_meta_features, y_test, blender
    )

In [40]:
    print(
        f"Blender's (LogisitRegression model) accuracy on the test set is {blender_acc * 100:.4f}%"
    )

Blender's (LogisitRegression model) accuracy on the test set is 97.7600%


## Step 6. Evaluate stacking with Scikit-Learn's StackingClassifier

In [38]:
stacking_clf = StackingClassifier(
    estimators=[
        ("random_forest", RandomForestClassifier(random_state=42)),
        ("extra_trees", ExtraTreesClassifier(random_state=42)),
        ("svc", SVC(probability=True, random_state=42)),
    ],
    cv=5,
)

In [39]:
stacking_clf.fit(X_train, y_train)
stacking_clf_acc = stacking_clf.score(X_test, y_test)
print(f"StackingClassifier's accuracy on the test set is {stacking_clf_acc * 100:.4f}%")

StackingClassifier's accuracy on the test set is 97.7500%
