<a href="https://colab.research.google.com/github/zhangou888/NN/blob/main/AdaBoostclassifier_comparison_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AdaBoost classifier example (2 adaboost classifier comparison)

**AdaBoost (Adaptive Boosting)** is an ensemble learning method that combines multiple weak learners (typically decision trees with a maximum depth of 1) to create a strong learner.

It works by iteratively training weak learners, with each learner focusing on the instances that were misclassified by the previous learners.

AdaBoost assigns weights to both the training instances and the weak learners. Instances that are difficult to classify receive higher weights, and more accurate learners receive higher weights in the final ensemble.

In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score, classification_report
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.base import BaseEstimator, ClassifierMixin
import numpy as np

In [None]:
# 1. Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20,
                           n_informative=15, n_redundant=5,
                           random_state=42)

In [None]:
# 2. Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

## Approach 1: Basic AdaBoostClassifier with Decision Trees (Stumps)
-
Descriptio
  : This is the "classic" AdaBoost setup. It uses decision trees with a maximum depth of 1 (often call d "decision stumps") as the weak learners. AdaBoost iteratively trains these stumps, weighti g misclassified instances higher in each iteration. The final prediction is a weighted combination of t e predictions from all the stump
.-

Pr    - s:

Simple and fast to train: Decision stumps are very simple, so each iteration of AdaBoost is relatively q    - uick.
Few parameters to tune: The main parameters are the number of estimators (n_estimators) and the learning rate (learning_    - rate).
Good for high-dimensional data: Can handle data with many features reasonabl    - y well.
Resistant to overfitting (to a point): AdaBoost can be less prone to overfitting than a single complex decision tree, especially in lower dimensional    -  spaces.
Easy to interpret: Decision stumps are inherently inter
- pretabl    - e.
Cons:

Weak learners may limit accuracy: Decision stumps are very weak learners. While AdaBoost combines them effectively, the overall accuracy might be limited, especially on compl    - ex datasets.
Sensitive to noisy data and outliers: AdaBoost can be sensitive to noisy data because it focuses on misclassified instances. Outliers can receive high weights and disproportionately influen    - ce the model.
Can underfit: May not be able to capture complex relationships in the data, leading to    -  underfitting.
May require careful tuning of n_estimators and learning_rate: Selecting the right combination is important. Too few estimators can lead to underfitting; too many, or a learning rate that is too high, can lead to overfitting, especially if the
-  data is noisy.
Suit    - able Use Cases:

Binary classification problems where speed and simplic    - ity are important.
High-dimensional datasets where interpret    - ability is desired.
Situations where a quick baseline model is needed before trying more    -  complex approaches.
Problems where the relationships between features and target are relatively simple.

In [None]:
# 3. Approach 1: Basic AdaBoostClassifier with Decision Trees (as before)
ada_boost_dt = AdaBoostClassifier(n_estimators=50, learning_rate=1.0, random_state=42)
ada_boost_dt.fit(X_train, y_train)

y_pred_dt = ada_boost_dt.predict(X_test)
accuracy_dt = accuracy_score(y_test, y_pred_dt)

print("Approach 1: Basic AdaBoost with Decision Trees")
print(f"Accuracy: {accuracy_dt}")
print(classification_report(y_test, y_pred_dt))

Approach 1: Basic AdaBoost with Decision Trees
Accuracy: 0.8366666666666667
              precision    recall  f1-score   support

           0       0.85      0.84      0.85       160
           1       0.82      0.84      0.83       140

    accuracy                           0.84       300
   macro avg       0.84      0.84      0.84       300
weighted avg       0.84      0.84      0.84       300



## Approach 2: AdaBoost with Support Vector Machine (SVM) and Hyperparameter Tuning
- Description: This approach uses Support Vector Machines (SVMs) as the weak learners. SVMs are more powerful and flexible than decision stumps, but they are also more complex and require careful tuning. Hyperparameter tuning (using GridSearchCV in the example) is crucial to optimize the SVM's parameters. Because SVMs are sensitive to feature scaling, a Pipeline with StandardScaler is used to ensure proper preprocessing.
-
Pros:    -

Potentially higher accuracy: SVMs are capable of capturing more complex relationships in the data than decision stum    - ps.
Flexibility: SVMs have different kernel functions (linear, RBF, polynomial, etc.) that can be chosen to suit the d    - ata.
Good for complex datasets: Can handle datasets with non-linear relationships between features and the ta    - rget.
Regularization: SVMs have a regularization parameter (C) that helps prevent overfi
- tting.
    -
Cons:

Slower to train: SVMs are generally slower to train than decision stumps, especially on large datasets. The hyperparameter tuning process adds even more computatio    - nal cost.
More parameters to tune: SVMs have several hyperparameters that need to be tuned, such as the kernel type, regularization parameter (C), and kernel coefficien    - t (gamma).
More prone to overfitting: SVMs are more prone to overfitting than decision stumps, especially if the hyperparameters are not tune    - d properly.
Less interpretable: SVMs are generally less interpretable than decision trees. The decision boundary learned by an SVM can be difficult to visualize and    -  understand.
Requires careful preprocessing: SVMs are sensitive to feature scaling, so it's important to scale the data bef    - ore training.
Choice of kernel is critical: The "right" kernel depends heavily on the data; if the data is naturally linear, a linear kernel is often best, but for more complex data, RBF or polynomial kernels
- may be needed.
Suita- ble Use Cases:

Problems where high accuracy is the primary goal, even if it comes at the cost of increased training tim- e and complexity.
Datasets with complex, non-linear relationships between featur- es and the target.
Situations where interpretability is n- ot a major concern.
Problems where feature scaling is possible and can - improve performance.
When computational resources are available to perform hyperparameter tuning.

In [None]:
# 3. Tune the SVC separately
# Create a pipeline with StandardScaler and SVC
pipeline = Pipeline([
    ('scaler', StandardScaler()),  # Feature scaling
    ('svm', SVC(probability=True, random_state=42))  # SVC needs probability=True for AdaBoost
])

# Define the parameter grid for GridSearchCV - for the SVM *within the pipeline*
param_grid = {
    'svm__C': [0.1, 1, 10],  # Regularization parameter
    'svm__kernel': ['linear', 'rbf'],  # Kernel type
    'svm__gamma': ['scale', 'auto']  # Kernel coefficient
}

# Perform GridSearchCV for hyperparameter tuning - DIRECTLY ON THE PIPELINE
grid_search = GridSearchCV(pipeline, param_grid, cv=3, scoring='accuracy', n_jobs=-1)  # n_jobs=-1 uses all available cores

# Fit the GridSearchCV object to the training data
grid_search.fit(X_train, y_train)

# Print the best parameters and the best score - FOR THE PIPELINE
print("\nTuning the SVC:")
print("Best parameters (for the Pipeline/SVM):", grid_search.best_params_)
print("Best score (for the Pipeline/SVM):", grid_search.best_score_)

# Get the best estimator from the grid search - THIS IS THE TUNED PIPELINE
best_pipeline = grid_search.best_estimator_

# 4. Train the tuned SVC and get predictions
# Train the best pipeline on the *entire* training set
best_pipeline.fit(X_train, y_train)

# Get the probability predictions from the tuned SVC
train_predictions = best_pipeline.predict_proba(X_train)
test_predictions = best_pipeline.predict_proba(X_test)

# 5. Create a *new* AdaBoost model using the SVC predictions as features
# The new AdaBoost model will use the predictions of the tuned SVC as features
ada_boost = AdaBoostClassifier(random_state=42)

# Train the AdaBoost model using the SVC predictions as features
ada_boost.fit(train_predictions, y_train)

# Make predictions using the AdaBoost model
y_pred = ada_boost.predict(test_predictions)

# Evaluate the model
accuracy_svm = accuracy_score(y_test, y_pred)
print(f"Accuracy (Stacked AdaBoost with tuned SVC): {accuracy_svm}")
print(classification_report(y_test, y_pred))


Tuning the SVC:
Best parameters (for the Pipeline/SVM): {'svm__C': 1, 'svm__gamma': 'scale', 'svm__kernel': 'rbf'}
Best score (for the Pipeline/SVM): 0.9342834085323356
Accuracy (Stacked AdaBoost with tuned SVC): 0.9533333333333334
              precision    recall  f1-score   support

           0       0.98      0.93      0.96       160
           1       0.93      0.98      0.95       140

    accuracy                           0.95       300
   macro avg       0.95      0.95      0.95       300
weighted avg       0.95      0.95      0.95       300



In [None]:
# 5. Comparison and Discussion
print("\nComparison:")
print(f"Approach 1 (Decision Tree): Accuracy = {accuracy_dt}")
print(f"Approach 2 (SVM with Tuning): Accuracy = {accuracy_svm}")

print("\nDiscussion:")
print("The choice of base estimator and hyperparameter tuning significantly impacts performance.")
print("SVMs, being more complex, can potentially achieve higher accuracy but require careful tuning.")
print("The Pipeline ensures proper scaling of data before being fed to the SVM.")
print("GridSearchCV automates the process of finding the best hyperparameters for the SVM within the AdaBoost framework.")

#Add more comparison based on your observation


Comparison:
Approach 1 (Decision Tree): Accuracy = 0.8366666666666667
Approach 2 (SVM with Tuning): Accuracy = 0.9533333333333334

Discussion:
The choice of base estimator and hyperparameter tuning significantly impacts performance.
SVMs, being more complex, can potentially achieve higher accuracy but require careful tuning.
The Pipeline ensures proper scaling of data before being fed to the SVM.
GridSearchCV automates the process of finding the best hyperparameters for the SVM within the AdaBoost framework.
