Iris Flower Classification

1. Importing Required Libraries

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB


2. Load the Dataset:

This dataset contains 4 numerical features and a target column species representing 3 flower types.

In [None]:
df = pd.read_csv('/content/drive/MyDrive/IRIS.csv')


3. Handle Missing Values

If there are missing values in the numerical columns, we fill them using the column mean.

For categorical column (species), we fill missing values using the most frequent value (mode).

In [None]:
df = df.fillna(df.mean(numeric_only=True))
if df['species'].isnull().sum() > 0:
    df['species'] = df['species'].fillna(df['species'].mode().iloc[0])

4. Split Features and Target

We separate:

X: input features (sepal and petal measurements)

y: target labels (species type)



In [None]:
X = df.drop('species', axis=1)
y = df['species']


5. Split Dataset into Training and Testing

We split the dataset into:

80% training data

20% testing data
random_state=42 ensures reproducibility.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


6. Standardize the Features

We scale the features so they all have mean = 0 and standard deviation = 1, which improves model performance

In [None]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


7. Define and Train Multiple Models

We define a dictionary containing six different classification models to compare their performance on the same data.

In [None]:
models = {
    'Logistic Regression': LogisticRegression(max_iter=200),
    'KNN': KNeighborsClassifier(),
    'SVM': SVC(),
    'Random Forest': RandomForestClassifier(random_state=42),
    'Decision Tree': DecisionTreeClassifier(random_state=42),
    'Naive Bayes': GaussianNB()
}


8. Evaluate Model Accuracy

We train each model and evaluate it using accuracy score. This helps us identify which model performs best on the test data.

In [None]:
print("📊 Model Performance Summary:\n")
for name, model in models.items():
    model.fit(X_train_scaled, y_train)
    y_pred = model.predict(X_test_scaled)
    acc = accuracy_score(y_test, y_pred)
    print(f" {name}: {acc:.4f} accuracy")


9. Predict on New Input

We give a new flower's measurements as input and use the trained Random Forest model to predict its species. The input is scaled using the same StandardScaler used earlier.

In [None]:
new_input = [[5.8, 2.6, 4, 1.2]]
new_input_scaled = scaler.transform(new_input)
best_model = models['Random Forest']
prediction = best_model.predict(new_input_scaled)
print(f"\n🌸 Predicted species for input {new_input[0]}: {prediction[0]}")
