# Overview of the data

The **Iris Flower Classification** project is based on a dataset sourced from **CodeAlpha**. This dataset provides detailed measurements of iris flowers and is widely used for understanding and implementing classification techniques in machine learning. The goal is to classify iris flowers into different species based on their physical characteristics, such as sepal and petal dimensions.

**Dataset Description**

The dataset contains the following features:

**SepalLengthCm**:Represents the length of the sepal measured from its base to the tip (in centimeters).

**SepalWidthCm**:Represents the width of the sepal measured at its widest point (in centimeters).

**PetalLengthCm**:Represents the length of the petal measured from its base to the tip (in centimeters).

**PetalWidthCm**:Represents the width of the petal measured at its widest point (in centimeters).

**Species**:The target variable that indicates the species of the iris flower. Each species represents a distinct subgroup of flowers that share similar characteristics and can breed within the same group.

# Objectives of the data 

**1.Dataset Selection**:Use the Iris flower dataset containing measurements of sepal and petal dimensions as input features for classification.

**2.Model Training**:Train a machine learning classification model using the given input features to accurately predict the species of Iris flowers.

**3.Simplified Data Access & Model Building**:Utilize user-friendly Python libraries to ensure easy dataset loading, preprocessing, and model development.

**4.Model Evaluation**:Evaluate the trained model using test data to measure its predictive performance.

**5.Confusion Matrix Analysis**:Generate a confusion matrix to analyze the classification results and understand how well the model distinguishes between different species.

**6.Accuracy Measurement**:Assess the overall performance of the model by calculating accuracy scores on both training and testing datasets.

In [3]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
# Read csv file
df =pd.read_csv("D:\Iris.csv")
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [5]:
# Displays the unique classes of the target variable (Species)
df['Species'].unique()

array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

In [6]:
# Checks the total number of missing values in each column
df.isnull().sum()

Id               0
SepalLengthCm    0
SepalWidthCm     0
PetalLengthCm    0
PetalWidthCm     0
Species          0
dtype: int64

In [7]:
# Filters the dataset to exclude Iris-setosa samples
df=df[df['Species']!='Iris-setosa']

In [8]:
#After filtering the dataset to exclude iris-setosa samples
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
50,51,7.0,3.2,4.7,1.4,Iris-versicolor
51,52,6.4,3.2,4.5,1.5,Iris-versicolor
52,53,6.9,3.1,4.9,1.5,Iris-versicolor
53,54,5.5,2.3,4.0,1.3,Iris-versicolor
54,55,6.5,2.8,4.6,1.5,Iris-versicolor


In [9]:
# Encodes Species column by mapping categorical values to numerical labels
df['Species'].map({'versicolor':0,'virginica':1})

50    NaN
51    NaN
52    NaN
53    NaN
54    NaN
       ..
145   NaN
146   NaN
147   NaN
148   NaN
149   NaN
Name: Species, Length: 100, dtype: float64

In [10]:
#Filtering the data result
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
50,51,7.0,3.2,4.7,1.4,Iris-versicolor
51,52,6.4,3.2,4.5,1.5,Iris-versicolor
52,53,6.9,3.1,4.9,1.5,Iris-versicolor
53,54,5.5,2.3,4.0,1.3,Iris-versicolor
54,55,6.5,2.8,4.6,1.5,Iris-versicolor


In [11]:
# Split dataset into independent and dependent features
X=df.iloc[:,:-1]
y=df.iloc[:,-1]

In [12]:
#Independent variables (input features)
X

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
50,51,7.0,3.2,4.7,1.4
51,52,6.4,3.2,4.5,1.5
52,53,6.9,3.1,4.9,1.5
53,54,5.5,2.3,4.0,1.3
54,55,6.5,2.8,4.6,1.5
...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3
146,147,6.3,2.5,5.0,1.9
147,148,6.5,3.0,5.2,2.0
148,149,6.2,3.4,5.4,2.3


In [14]:
#Dependent variables(Target variables)
y

50     Iris-versicolor
51     Iris-versicolor
52     Iris-versicolor
53     Iris-versicolor
54     Iris-versicolor
            ...       
145     Iris-virginica
146     Iris-virginica
147     Iris-virginica
148     Iris-virginica
149     Iris-virginica
Name: Species, Length: 100, dtype: object

In [28]:
# Encodes the target variable (Species) into numerical labels using LabelEncoder
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(df['Species'])


In [29]:
# Shows the mapping of class names to encoded values and previews encoded labels
print("Classes:", le.classes_)
print("Encoded Labels:", y[:10])

Classes: ['Iris-versicolor' 'Iris-virginica']
Encoded Labels: [0 0 0 0 0 0 0 0 0 0]


In [43]:
# Import libraries for data manipulation, train-test splitting, logistic regression modeling, and performance evaluation
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import accuracy_score


In [31]:
#Read again csv file to predict model
df = pd.read_csv("D:\Iris.csv")
df

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica


In [44]:
# Defines independent variables used for predicting the species
X = df[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']]

In [45]:
# Encodes the target variable (Species) into numerical labels using LabelEncoder
le = LabelEncoder()
y = le.fit_transform(df['Species'])

In [46]:
# Splits the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

In [47]:
# Initializes and trains the Logistic Regression model on the training data
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

In [48]:
# Predicts the target labels for the test dataset
y_pred = model.predict(X_test)

In [49]:
# Evaluates model performance using accuracy score and confusion matrix
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

In [50]:
# Displays the model accuracy and the confusion matrix results
print("Model Accuracy:", accuracy)
print("\nConfusion Matrix:\n", cm)

Model Accuracy: 1.0

Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


In [51]:
# Displays the original class labels learned by the LabelEncoder
print("\nClasses:", le.classes_)


Classes: ['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']


In [52]:
# Prints accuracy, confusion matrix, and classification report for model evaluation
print("Model Accuracy:", accuracy_score(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=le.classes_))

Model Accuracy: 1.0

Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Classification Report:
                  precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30



In [53]:
# Computes and displays the confusion matrix for the model predictions
cm = confusion_matrix(y_test, y_pred)
print("\nConfusion Matrix:\n", cm)


Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


In [55]:
# Measures how accurately the model predicts the target classes
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)

Model Accuracy: 1.0


In [56]:
#Finalise the project
print("Project execution complete!")

Project execution complete!
