# Introducing ensemble methods

## Lecture 8

### GRA 4160
### Predictive modelling with machine learning

#### Lecturer: Vegard H. Larsen

 Ensemble methods are a type of machine learning technique that combine the predictions of multiple models to make more accurate predictions than any individual model could. These methods are particularly useful in situations where the individual models have high variance or make strong, complex predictions. There are several types of ensemble methods, including boosting, bagging, and bootstrapped ensembles.

One popular type of ensemble method is boosting, in which a series of weak models are trained sequentially, with each model attempting to correct the mistakes of the previous model. The final prediction is made by combining the predictions of all the models in the ensemble. Boosting algorithms include AdaBoost and Gradient Boosting.

Another type of ensemble method is bagging, in which a group of models are trained independently on different random subsets of the training data. The final prediction is made by averaging the predictions of all the models in the ensemble. Bagging algorithms include Random Forests and Extra Trees. Ensemble methods have been successful in a wide range of applications, including image classification and speech recognition.

### Voting Classifier

Combines multiple classifiers and uses a voting scheme to make predictions. The voting scheme can be either **hard** or **soft**, depending on how the final prediction is made.

In a hard voting scheme, the final prediction is the mode of the predictions of the individual classifiers.
In other words, each classifier casts a "vote" for its predicted class, and the class that receives the most votes is chosen as the final prediction.
This is equivalent to a simple majority vote.

In a soft voting scheme, the final prediction is the class with the highest probability of being predicted by the individual classifiers.
In other words, each classifier produces a set of probabilities for each class, and the probabilities are averaged across all the classifiers.
The class with the highest average probability is chosen as the final prediction.

In [None]:
from sklearn.ensemble import VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset and split it into training and testing sets
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40, random_state=42)

# Define the base classifiers
clf1 = LogisticRegression(random_state=10, solver='lbfgs', max_iter=1000)
clf2 = DecisionTreeClassifier(random_state=42)

# Define the VotingClassifier with hard voting
voting_clf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2)], voting='hard')

# Train the LogisticRegression and RandomForestClassifier
clf1.fit(X_train, y_train)
clf2.fit(X_train, y_train)

# Evaluate the LogisticRegression and RandomForestClassifier on the testing set
y_pred1 = clf1.predict(X_test)
y_pred2 = clf2.predict(X_test)
accuracy1 = accuracy_score(y_test, y_pred1)
accuracy2 = accuracy_score(y_test, y_pred2)

# Train the VotingClassifier
voting_clf.fit(X_train, y_train)

# Evaluate the VotingClassifier on the testing set
y_pred = voting_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy logistic regression: {accuracy1:.3f}')
print(f'Accuracy random forest: {accuracy2:.3f}')
print(f'Accuracy voting classifier: {accuracy:.3f}')

## Week and strong learners

A weak learner is a model that performs only slightly better than random guessing.
For example, a decision tree with only one split or a linear regression model with a low degree polynomial can be considered weak learners.
Although weak learners may not perform well individually, they can be combined in various ways to create a strong learner.

A strong learner, on the other hand, is a model that can make accurate predictions on a given task with high confidence.
A strong learner can be created by combining multiple weak learners using ensemble methods such as boosting, bagging, and stacking.

In boosting, weak learners are trained sequentially, with each subsequent learner focused on the samples that the previous learner got wrong.
By doing so, boosting can increase the accuracy of the model and create a strong learner from a collection of weak learners.
Examples of boosting algorithms include AdaBoost and Gradient Boosting.

In bagging, weak learners are trained independently on different subsets of the data, and their predictions are aggregated using a voting scheme or an average.
Bagging can reduce the variance of the model and create a strong learner from a collection of unstable weak learners.
Examples of bagging algorithms include Bagging classifier (today) Random Forest and Extra Trees (will be covered next lecture).

# Exercises

1. Replace the `DecisionTreeClassifier` with another classifier (like a Support Vector Machine or a Random forest) or add an additional classifier to the ensemble. Observe how this change impacts the overall accuracy of the VotingClassifier.

2. Change the voting scheme from `hard` to `soft` and observe how this change impacts the overall accuracy of the VotingClassifier.

3. Use additional performance metrics (like precision, recall, F1-score, or confusion matrix) to evaluate the LogisticRegression, RandomForestClassifier, and VotingClassifier.