# Supervised Learning Project: Flower Species Classification

**Project Description**

The objective of this project is to develop a machine learning model capable of classifying images of flowers into their respective species based on visual features. The dataset comprises 210 images, each sized at 128x128 pixels, across 10 different species of flowering plants. The images are in .png format, and the species labels are provided as integers in a separate file, flower-labels.csv.

**Objectives**

- Classify each flower image into one of 10 species.
- Utilize supervised learning techniques to learn from labeled data.

# EDA Procedure

**Data Collection**

- The dataset comprises 210 images of flowers in .png format, sized 128x128 pixels, across 10 species.
- Labels are provided in a separate flower-labels.csv file.

**Data Inspection**

- Checked image quality and resolution.
- Ensured labels match species count and image specifications.

**Data Preprocessing**

- Images already uniform size in the dataset, added a function to resize images for use with other data sets.
- Normalized pixel values to aid model training.
- Split data into training (80%) and testing (20%) sets.

**Data Visualization**

- Displayed sample images from each species to understand variations within classes.
- Analyzed label distribution to identify any class imbalances.

# Analysis

**Model Building and Training**

**Feature Extraction**
- Utilized raw pixel values as features.

**Model Selection**
- Chose RandomForestClassifier for its simplicity and efficacy in handling tabular data.

**Training**
- Trained the model on the training set using cross-validation to fine-tune hyperparameters.

**Postprocessing**
- Assessed the model's performance on the test set using accuracy as the primary metric.


# Results

**Model Performance:**

- The RandomForestClassifier achieved an accuracy of 55.00% on the test set.
- Performance metrics like precision, recall, and F1-score were calculated for each species.

**Error Analysis:**

- Misclassifications were examined to understand potential model weaknesses, such as confusion between visually similar species.

In [1]:
import pandas as pd
import numpy as np
from skimage.io import imread
from skimage.transform import resize
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import os


labels_df = pd.read_csv('./flower_labels.csv')

image_folder = './flower_images'

image_size = (128, 128)

def load_images_and_labels(image_folder, labels_df, image_size):
    images = []
    labels = labels_df['label'].values
    for file in labels_df['file']:
        image_path = os.path.join(image_folder, file)
        image = imread(image_path)
        image = resize(image, image_size, anti_aliasing=True)
        images.append(image)
    return np.array(images), labels

images, labels = load_images_and_labels(image_folder, labels_df, image_size)

X = images.reshape(images.shape[0], -1)
y = LabelEncoder().fit_transform(labels)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model accuracy: {accuracy:.2f}")

Model accuracy: 0.55


# Discussion and Conclusion

**Discussion**

- The RandomForestClassifier provided a strong baseline for flower species classification.
- The dataset I used only had 210 images. Adding more training images would definitely improve the accuracy.
- Identified challeng|es included distinguishing between species with similar color patterns and shapes.
- Could be redone with a more fitting model for even more accurate classification results.


**Conclusion**

- The supervised learning approach was effective for classifying flower species from images.
- The model demonstrated practical applicability for tasks like botanical research and educational purposes in recognizing and classifying flowers.

**Future Work**
- Explore deep learning techniques, specifically Convolutional Neural Networks (CNNs), for potentially higher accuracy and better feature extraction from images.
- Add more training data, more plant species, etc.
- Consider the integration of this model into a mobile app for real-time flower species classification.