# Ensemble Learning techniques on Mnist

In this Kernel we introduce diffrient ensemble technique showing how the ensemble is strongly powerful tool for machine learning.
In the First step we just load the mnist data set using sklearn fetch_openml function and spliting into train, train and val sets

## Loading the data

In [1]:
from sklearn.datasets import fetch_openml
import numpy as np

In [2]:
mnist = fetch_openml('mnist_784', version=1)

In [3]:
X, y = mnist['data'], mnist['target']

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=10000, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=10000, random_state=42)

In [5]:
len(X_train), len(X_test) , len(X_val)

(50000, 10000, 10000)

## Voting Classifiers

The **EnsembleClassifier** is consist of more than one predictor, the diffrient predictors combined together to build the ensembled model, 
In this section we used **DecisionTreeClassifier**, **LogisticRegression** and **SGDClassifier**. Combinging these three predictors using **VotingClassifers** we manged to achieve about 
**92% Accuracy score**

In [6]:
from sklearn.metrics import accuracy_score

In [7]:
from sklearn.tree import DecisionTreeClassifier
dt_clf = DecisionTreeClassifier()
dt_clf.fit(X_train, y_train)
print(dt_clf.__class__.__name__, accuracy_score(dt_clf.predict(X_val), y_val))

DecisionTreeClassifier 0.8685


In [8]:
from sklearn.linear_model import LogisticRegression
log_clf = LogisticRegression(penalty="l1", solver="saga", tol=0.1)
log_clf.fit(X_train, y_train)
print(log_clf.__class__.__name__, accuracy_score(log_clf.predict(X_val), y_val))

LogisticRegression 0.9218


In [9]:
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(loss = 'modified_huber')
sgd_clf.fit(X_train, y_train)
print(sgd_clf.__class__.__name__, accuracy_score(sgd_clf.predict(X_val), y_val))

SGDClassifier 0.8903


In [10]:
from sklearn.ensemble import VotingClassifier
voting_clf = VotingClassifier(
    estimators=[('dtree', dt_clf), ('log_clf', log_clf), ('sgd', sgd_clf)],
    voting='soft'
)
voting_clf.fit(X_train, y_train)
print(voting_clf.__class__.__name__, accuracy_score(voting_clf.predict(X_val), y_val))

VotingClassifier 0.9095


## Stacking

**Blinder** is a **Stacking** method. In Stacking we makeing a predictor with input features consisting of the outputs of the other predictors we used an ExtraTrees as Blinder

In [11]:
X_train_blender = np.c_[dt_clf.predict(X_train), log_clf.predict(X_train), sgd_clf.predict(X_train)]
X_val_blender = np.c_[dt_clf.predict(X_val), log_clf.predict(X_val), sgd_clf.predict(X_val)]

In [12]:
from sklearn.ensemble import ExtraTreesClassifier
ext_clf = ExtraTreesClassifier(n_estimators=5000)
ext_clf.fit(X_train_blender, y_train)
print(ext_clf.__class__.__name__, accuracy_score(ext_clf.predict(X_val_blender), y_val))

ExtraTreesClassifier 0.8685


## Using More Powerful predictors

The performance of the ensemble might depend on the predictors.
for example in this section we used more strong models like **SVM** and **RandomForest** and **ExtrTrees** we end up with 96% Accuracy

In [13]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import VotingClassifier

In [14]:
rnd_clf = RandomForestClassifier()
svc_clf = SVC(probability=True)
ext_clf = ExtraTreesClassifier()
vot_clf = VotingClassifier(
    estimators = [('rndf', rnd_clf), ('svc', svc_clf), ('ext', ext_clf)],
    voting = 'soft',    
)

In [15]:
for clf in (rnd_clf, svc_clf, ext_clf, vot_clf):
  clf.fit(X_train, y_train)
  print(clf.__class__.__name__, accuracy_score(clf.predict(X_val), y_val))

RandomForestClassifier 0.9696
SVC 0.9788
ExtraTreesClassifier 0.9714
VotingClassifier 0.9793


In [16]:
X_train_blender = np.c_[rnd_clf.predict(X_val), svc_clf.predict(X_val), ext_clf.predict(X_val)]

In [17]:
X_test_blender = np.c_[rnd_clf.predict(X_test), svc_clf.predict(X_test), ext_clf.predict(X_test)]

In [18]:
svc_blender = SVC()
svc_blender.fit(X_train_blender, y_val)

SVC()

In [19]:
print(svc_blender.__class__.__name__, accuracy_score(svc_blender.predict(X_test_blender), y_test))

SVC 0.9646


Also there is other ensembling techniques like Bageing and Pasting, Boosting and others that you might try yourself.



I hope you enjoyed this Notebook i wish you to tell me in the comments how to improve my skills. 🥰