# **ML Lab - XGBoost Algorithm**
Urlana Suresh Kumar - 22071A6662

In this notebook, we explore the implementation of XGBoost for multi-class classification using the Iris dataset. The Iris dataset is a classic example in machine learning, consisting of features of three types of Iris flowers. The objective is to predict the flower type based on the given features.


## Step 1: Import Libraries and Load Dataset
We start by importing the necessary libraries, loading the Iris dataset, and preparing it for training.


In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
import xgboost as xgb

# Load Dataset
iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Features (X) and Target (y)
X = iris_df.iloc[:, :-1]  # Features (all columns except target)
y = iris_df['species']    # Target (species column)

# Encode the target labels to numeric values
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

## Step 2: Split Data and Prepare for XGBoost
We split the dataset into training and testing sets and prepare it for training with XGBoost's DMatrix.

In [2]:
# Split into Training and Testing Data
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.3, random_state=42)

# Create DMatrix for XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)


## Step 3: Define Parameters and Train the XGBoost Model
We define the model's parameters and train it on the training data.

In [3]:
# Define XGBoost Parameters
params = {
    'objective': 'multi:softmax',  # For multi-class classification
    'num_class': 3,               # 3 classes in the Iris dataset
    'eval_metric': 'mlogloss',    # Multiclass log-loss
    'max_depth': 3,               # Maximum depth of a tree
    'eta': 0.1,                   # Learning rate
    'subsample': 0.8,             # Subsampling ratio
    'colsample_bytree': 0.8,      # Subsample ratio of columns for each tree
}

# Train the XGBoost model
num_round = 100
bst = xgb.train(params, dtrain, num_round)


## Step 4: Make Predictions and Evaluate the Model
We make predictions on the test data, convert numerical predictions to class labels, and evaluate the model's accuracy.

In [4]:
# Make Predictions
y_pred = bst.predict(dtest)

# Convert numerical predictions back to class labels
y_pred_labels = label_encoder.inverse_transform(y_pred.astype(int))

# Check actual labels from the test set
y_test_labels = label_encoder.inverse_transform(y_test)

# Calculate Accuracy
accuracy = accuracy_score(y_test_labels, y_pred_labels)

# Display Results
print("Predictions (original class labels):", y_pred_labels[:10])  # First 10 predictions
print("Actual labels:", y_test_labels[:10])  # First 10 actual labels
print(f'Accuracy: {accuracy * 100:.2f}%')


Predictions (original class labels): ['versicolor' 'setosa' 'virginica' 'versicolor' 'versicolor' 'setosa'
 'versicolor' 'virginica' 'versicolor' 'versicolor']
Actual labels: ['versicolor' 'setosa' 'virginica' 'versicolor' 'versicolor' 'setosa'
 'versicolor' 'virginica' 'versicolor' 'versicolor']
Accuracy: 100.00%


# Conclusion
In this notebook, we successfully implemented and evaluated the XGBoost algorithm for multi-class classification on the Iris dataset. The model achieved 100% accuracy, demonstrating its effectiveness in this scenario. XGBoost's powerful tree-based approach makes it a robust choice for classification tasks.