# **ML Lab - AdaBoost Algorithm**
Urlana Suresh Kumar - 22071A6662

In This Notebook,We Explore the implementation of
**AdaBoost (Adaptive Boosting)** which is a powerful ensemble learning technique that combines multiple "weak" classifiers to create a "strong" classifier. It works by training classifiers sequentially, where each classifier focuses more on the misclassified samples from the previous ones. The weights of the misclassified samples are increased so that the next classifier gives them more attention.

In this implementation, we will use the AdaBoost algorithm to classify the Iris dataset, a popular dataset for machine learning tasks. The goal is to predict the species of an iris flower based on its features.


## Step 1: Import Libraries and Load Data

In [1]:
# Importing necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import load_iris
import warnings

# Suppress warnings
warnings.filterwarnings("ignore")

# Load the Iris dataset
iris = load_iris()

# Convert data to a DataFrame
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Display the first few rows
print(iris_df.head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

  species  
0  setosa  
1  setosa  
2  setosa  
3  setosa  
4  setosa  


## Step 2: Prepare Features and Target Variables

In [2]:
# Splitting the features (X) and target (y)
X = iris_df.iloc[:, :-1]  # All columns except the last one (features)
y = iris_df.iloc[:, -1]   # The last column (target)

# Check dataset shape
print(f"Shape of X: {X.shape}, Shape of y: {y.shape}")

# Display class distribution
print("Class distribution:")
print(y.value_counts())

Shape of X: (150, 4), Shape of y: (150,)
Class distribution:
species
setosa        50
versicolor    50
virginica     50
Name: count, dtype: int64


## Step 3: Split Data and Train AdaBoost Classifier

In [3]:
# Splitting the dataset into training and validation sets
X_train, X_val, Y_train, Y_val = train_test_split(X, y, test_size=0.25, random_state=28)

# Create and train the AdaBoost classifier
adb = AdaBoostClassifier()
adb_model = adb.fit(X_train, Y_train)


## Step 4: Evaluate the Model

In [4]:
# Evaluate the model on the validation set
accuracy = adb_model.score(X_val, Y_val)
print(f"The accuracy of the model on the validation set is: {accuracy}")


The accuracy of the model on the validation set is: 0.9210526315789473


# Conclusion

The **AdaBoost algorithm** was successfully implemented on the Iris dataset, achieving a validation accuracy of **92.1%**. This demonstrates the effectiveness of AdaBoost in handling multi-class classification problems by combining weak learners to create a strong ensemble model.

## Key Takeaways:
- AdaBoost adapts to the data by focusing on misclassified samples, improving overall accuracy.
- The Iris dataset's balanced class distribution contributed to the model's performance.
- While AdaBoost is powerful, it can be sensitive to noise and outliers, making careful data preprocessing essential.

This implementation highlights the practicality and efficiency of ensemble learning in real-world machine learning tasks.
