# Voting Classifier

## Table of Content 

- [Imports](#imports)
- [Data](#data)
- [Train Test Split](#train-test-split)
- [Voting Classifiers](#voting-classifiers)
  - [Hard Voting](#hard-voting)
  - [Soft Voting](#soft-voting)

## Imports

In [2]:
# Interactive shell
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

# Machine learning
from sklearn.datasets import make_moons
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

## Data

In [42]:
X, y = make_moons(n_samples=500, noise=0.30, random_state=1227)
X.shape, y.shape

((500, 2), (500,))

In [43]:
X[:10, :]

array([[ 1.61891829, -0.27995175],
       [ 0.95702628,  0.21523992],
       [-0.97787042, -0.16568298],
       [ 1.92790511,  0.03238679],
       [ 0.94535378, -0.74096263],
       [ 0.47255047, -0.40597825],
       [ 1.1077164 , -0.14934513],
       [ 1.2052587 ,  0.3850397 ],
       [ 2.05821764,  0.82129983],
       [-0.20037365,  1.33628222]])

In [44]:
y[:10]

array([1, 1, 0, 1, 1, 1, 1, 0, 1, 0])

## Train Test Split

In [45]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1227)
X_train.shape, y_train.shape

((375, 2), (375,))

## Voting Classifiers

In [46]:
# Limited-memory Broyden–Fletcher–Goldfarb–Shanno for optimization
log_clf = LogisticRegression(solver="lbfgs", random_state=1227)
# 100 trees in the forest
rnd_clf = RandomForestClassifier(n_estimators=100, random_state=1227)
# Radial-basis function and 1 / (n_features * X.var()) as the vale for gamma
svm_clf = SVC(kernel="rbf", gamma="scale", random_state=1227)

### Hard Voting

In [47]:
hard_voting_clf = VotingClassifier(
    # Accepts a list of tuples, each tuple is a pair of a classifier and its name
    estimators=[("lr", log_clf), ("rf", rnd_clf), ("svc", svm_clf)],
    # Hard voting uses predicted class labels for majority rule voting
    voting="hard",
    weights=None,
    # None means 1
    n_jobs=None,
    # If voting='soft' and flatten_transform=True, transform method returns matrix with shape (n_samples, n_classifiers * n_classes)
    flatten_transform=True,
)

The estimator has the following attributes:

* estimators_ : list of classifiers
  
    The collection of fitted sub-estimators as defined in estimators that are not 'drop'.

In [50]:
hard_voting_clf.estimators_

[LogisticRegression(random_state=1227),
 RandomForestClassifier(random_state=1227),
 SVC(random_state=1227)]

* named_estimators_ : `~sklearn.utils.Bunch` (a subclass of dictionary)
  
  Attribute to access any fitted sub-estimators by name.

In [53]:
hard_voting_clf.named_estimators_

{'lr': LogisticRegression(random_state=1227),
 'rf': RandomForestClassifier(random_state=1227),
 'svc': SVC(random_state=1227)}

* classes_ : array-like of shape (n_predictions,)
  
    The classes labels.

In [54]:
hard_voting_clf.classes_

array([0, 1])

In [48]:
# Fit estimator
hard_voting_clf.fit(X_train, y_train)

VotingClassifier(estimators=[('lr', LogisticRegression(random_state=1227)),
                             ('rf', RandomForestClassifier(random_state=1227)),
                             ('svc', SVC(random_state=1227))])

In [40]:
InteractiveShell.ast_node_interactivity = "last_expr"

In [49]:
# Accuracy scores
for clf in (log_clf, rnd_clf, svm_clf, hard_voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, "has score", accuracy_score(y_test, y_pred))

LogisticRegression has score 0.912
RandomForestClassifier has score 0.936
SVC has score 0.96
VotingClassifier has score 0.96


### Soft Voting

In [55]:
log_clf = LogisticRegression(solver="lbfgs", random_state=1227)
rnd_clf = RandomForestClassifier(n_estimators=100, random_state=1227)
# Enable probability estimates, which slow down training, but allows us to use soft voting
svm_clf = SVC(gamma="scale", probability=True, random_state=1227)

In [58]:
soft_voting_clf = VotingClassifier(
    estimators=[("lr", log_clf), ("rf", rnd_clf), ("svc", svm_clf)],
    voting="soft",
    weights=None,
    # None means 1
    n_jobs=None,
    # If voting='soft' and flatten_transform=True, transform method returns matrix with shape (n_samples, n_classifiers * n_classes)
    flatten_transform=True,
)

In [59]:
soft_voting_clf.fit(X_train, y_train)

VotingClassifier(estimators=[('lr', LogisticRegression(random_state=1227)),
                             ('rf', RandomForestClassifier(random_state=1227)),
                             ('svc', SVC(probability=True, random_state=1227))],
                 voting='soft')

In [60]:
# Accuracy scores
for clf in (log_clf, rnd_clf, svm_clf, soft_voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, "has score", accuracy_score(y_test, y_pred))

LogisticRegression has score 0.912
RandomForestClassifier has score 0.936
SVC has score 0.96
VotingClassifier has score 0.968


As can be seen, the score is higher for the soft voting classifier (0.968) than the hard voting classifier (0.96), which is often because it gives more weight to highly confident votes.