# VotingClassifier
The VotingClassifier function in scikit-learn is an ensemble learning method that combines the predictions from multiple machine learning classifiers to improve the robustness and accuracy of the model. It can be configured to use either hard voting (majority rule voting) or soft voting (average of predicted probabilities).

# How It Works
- Initialize Individual Classifiers: Define the individual classifiers you want to use in the ensemble.
- Create VotingClassifier: Combine the individual classifiers using VotingClassifier.
- Fit the Ensemble Model: Train the ensemble model on the training data.
- Make Predictions: Use the ensemble model to make predictions on new data.

## Simple Example

In [None]:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Define individual classifiers
clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()

# Create VotingClassifier
voting_clf = VotingClassifier(estimators=[
    ('lr', clf1), 
    ('dt', clf2)], voting='hard')

# Fit ensemble model
voting_clf.fit(X_train, y_train)

# Make predictions
predictions = voting_clf.predict(X_test)

# Evaluate model
print(f"Accuracy: {accuracy_score(y_test, predictions)}")


- Uses LogisticRegression and DecisionTreeClassifier.
- Demonstrates hard voting where the majority class is chosen as the final prediction.

## Complex Example

In [None]:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Define individual classifiers
clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()
clf3 = SVC(probability=True)  # Enable probability estimates for soft voting

# Create VotingClassifier
voting_clf = VotingClassifier(estimators=[
    ('lr', clf1), 
    ('dt', clf2), 
    ('svc', clf3)], voting='soft')

# Fit ensemble model
voting_clf.fit(X_train, y_train)

# Make predictions
predictions = voting_clf.predict(X_test)

# Evaluate model
print(f"Accuracy: {accuracy_score(y_test, predictions)}")


- Uses LogisticRegression, DecisionTreeClassifier, and SVC.
- Demonstrates soft voting where the average of predicted probabilities is used for the final prediction.

## Very Complex Example

In [None]:
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score

# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Define individual classifiers
clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()
clf3 = SVC(probability=True)  # Enable probability estimates for soft voting
clf4 = RandomForestClassifier()

# Create VotingClassifier
voting_clf = VotingClassifier(estimators=[
    ('lr', clf1), 
    ('dt', clf2), 
    ('svc', clf3), 
    ('rf', clf4)], voting='soft')

# Define parameter grid
param_grid = {
    'lr__C': [0.1, 1, 10],
    'dt__max_depth': [None, 10, 20],
    'svc__C': [0.1, 1, 10],
    'rf__n_estimators': [10, 50, 100]
}

# Create GridSearchCV
grid_search = GridSearchCV(voting_clf, param_grid, cv=5)

# Fit ensemble model
grid_search.fit(X_train, y_train)

# Make predictions
predictions = grid_search.predict(X_test)

# Evaluate model
print(f"Accuracy: {accuracy_score(y_test, predictions)}")


- Uses LogisticRegression, DecisionTreeClassifier, SVC, and RandomForestClassifier.
- Integrates GridSearchCV to optimize parameters for each classifier in the ensemble.
- Uses soft voting to average the predicted probabilities.

## Test the examples

In [None]:
import unittest
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_iris, load_wine, load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.metrics import accuracy_score

### test_simple example

In [None]:
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()

voting_clf = VotingClassifier(estimators=[
    ('lr', clf1), 
    ('dt', clf2)], voting='hard')

voting_clf.fit(X_train, y_train)
predictions = voting_clf.predict(X_test)

self.assertEqual(len(predictions), len(y_test))
print(f"Simple Voting Classifier Accuracy: {accuracy_score(y_test, predictions)}")

scores = cross_val_score(voting_clf, data.data, data.target, cv=5)
print(f"Simple Voting Classifier Cross-Validation Scores: {scores}")

- Uses LogisticRegression and DecisionTreeClassifier.
- Tests if predictions match the length of the test data.
- Evaluates the performance using cross_val_score.

### test_complex example

In [None]:
data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()
clf3 = SVC(probability=True)

voting_clf = VotingClassifier(estimators=[
    ('lr', clf1), 
    ('dt', clf2), 
    ('svc', clf3)], voting='soft')

voting_clf.fit(X_train, y_train)
predictions = voting_clf.predict(X_test)

self.assertEqual(len(predictions), len(y_test))
print(f"Complex Voting Classifier Accuracy: {accuracy_score(y_test, predictions)}")

scores = cross_val_score(voting_clf, data.data, data.target, cv=5)
print(f"Complex Voting Classifier Cross-Validation Scores: {scores}")

- Uses LogisticRegression, DecisionTreeClassifier, and SVC.
- Tests if predictions match the length of the test data.
- Evaluates the performance using cross_val_score.

### test_very_complex example

In [None]:
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()
clf3 = SVC(probability=True)
clf4 = RandomForestClassifier()

voting_clf = VotingClassifier(estimators=[
    ('lr', clf1), 
    ('dt', clf2), 
    ('svc', clf3), 
    ('rf', clf4)], voting='soft')

param_grid = {
    'lr__C': [0.1, 1, 10],
    'dt__max_depth': [None, 10, 20],
    'svc__C': [0.1, 1, 10],
    'rf__n_estimators': [10, 50, 100]
}

grid_search = GridSearchCV(voting_clf, param_grid, cv=5)
grid_search.fit(X_train, y_train)
predictions = grid_search.predict(X_test)

self.assertEqual(len(predictions), len(y_test))
print(f"Very Complex Voting Classifier Accuracy: {accuracy_score(y_test, predictions)}")

scores = cross_val_score(grid_search, data.data, data.target, cv=5)
print(f"Very Complex Voting Classifier Cross-Validation Scores: {scores}")

- Uses LogisticRegression, DecisionTreeClassifier, SVC, and RandomForestClassifier.
- Integrates GridSearchCV to optimize parameters for each classifier in the ensemble.
- Tests if predictions match the length of the test data.
- Evaluates the performance using cross_val_score.

## VotingClassifier with BaggingClassifier

In [None]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import VotingClassifier, BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

# Load the iris dataset
data = load_iris()
X, y = data.data, data.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base models wrapped with BaggingClassifier
model1 = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=10, random_state=42)
model2 = BaggingClassifier(base_estimator=KNeighborsClassifier(), n_estimators=10, random_state=42)
model3 = BaggingClassifier(base_estimator=LogisticRegression(max_iter=200), n_estimators=10, random_state=42)

# Combine the models into a VotingClassifier
voting_classifier = VotingClassifier(estimators=[('bag_dt', model1), ('bag_knn', model2), ('bag_lr', model3)], voting='soft')

# Train the VotingClassifier
voting_classifier.fit(X_train, y_train)

# Evaluate the performance using cross-validation
scores = cross_val_score(voting_classifier, X, y, cv=5)

# Print cross-validation scores
print("Cross-validation scores for VotingClassifier with BaggingClassifier:", scores)

# Make predictions on the test set
predictions = voting_classifier.predict(X_test)

# Print the predictions
print("Predictions:", predictions)


- We use BaggingClassifier with different base classifiers (DecisionTree, KNeighbors, LogisticRegression).
- These bagged models are combined into a VotingClassifier.
- We evaluate the classifier using cross-validation and print the predictions on the test set.

## BaggingClassifier with VotingClassifier

In [None]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import BaggingClassifier, VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

# Load the iris dataset
data = load_iris()
X, y = data.data, data.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base models for VotingClassifier
model1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3 = LogisticRegression(max_iter=200)

# Combine the models into a VotingClassifier
voting_classifier = VotingClassifier(estimators=[('dt', model1), ('knn', model2), ('lr', model3)], voting='soft')

# Use the VotingClassifier as the base estimator for BaggingClassifier
bagging_classifier = BaggingClassifier(base_estimator=voting_classifier, n_estimators=10, random_state=42)

# Train the BaggingClassifier
bagging_classifier.fit(X_train, y_train)

# Evaluate the performance using cross-validation
scores = cross_val_score(bagging_classifier, X, y, cv=5)

# Print cross-validation scores
print("Cross-validation scores for BaggingClassifier with VotingClassifier:", scores)

# Make predictions on the test set
predictions = bagging_classifier.predict(X_test)

# Print the predictions
print("Predictions:", predictions)


- We create a VotingClassifier using DecisionTreeClassifier, KNeighborsClassifier, and LogisticRegression.
- We then use this VotingClassifier as the base estimator for a BaggingClassifier.
- We train the BaggingClassifier, evaluate its performance using cross-validation, and make predictions on the test set.