# Flower Species Recognition and Image Classification Analysis

Date: 12/03/2025

Team Members:
- Karrie Butcher
- Nicko Lomelin
- Thanh Tuan Pham

dataset: https://www.kaggle.com/datasets/alxmamaev/flowers-recognition 

The dataset is designed for classifying the specific species of flowers (labeled as 'daisy', 'dandelion', 'rose', 'sunflower', or 'tulip') based on unstructured visual data, specifically, raw RGB pixel intensities representing shapes, textures, and color patterns. The primary business value of building a predictive model on this data is to enable automated botanical identification. In an agricultural or gardening context, an accurate model allows for the development of "intelligent" robotic systems capable of distinguishing between decorative plants (e.g., Tulips/Roses) and invasive species or weeds (e.g., Dandelions). This distinction allows for targeted automated weeding, reducing the need for blanket herbicide application, or  educational apps that allow users to instantly identify flora in their environment.

Prediction Task: **Multi-Class Image Classification**

We will build a Convolutional Neural Network (CNN) to predict the flower_class (the target variable) as one of five distinct species categories based on the input image tensor.

---
## 1. Preparation

****
### **1.1 Chosen Metric(s) and Justification**

#### Chosen Metrics: **Macro-Averaged F1-Score** and **Confusion Matrix**

#### Justification: 

We will use the **Macro-Averaged F1-Score** as our primary metric to mathematically balance the cost of accidentally spraying crops (Precision) against the cost of missing weeds (Recall), while utilizing a **Confusion Matrix** to diagnose specific inter-class errors.

**Why Accuracy is Insufficient:**
We cannot rely on simple Accuracy for this task due to the class imbalance in the dataset (e.g., *Dandelion* has ~1,052 images while *Sunflower* has only ~733). A model could achieve high accuracy by identifying the majority class (Dandelions) while failing completely on the minority classes. This is unacceptable for our objective.

**Why F1-Score (Macro) is Appropriate:**
The F1-Score is the harmonic mean of Precision and Recall. In our proposed business case of **Automated Weeding**, both types of errors carry significant, distinct costs:

* **Precision (Cost of False Positives):** If our model falsely classifies a decorative *Rose* as a weed (*Dandelion*), the automated system might spray herbicide on it, destroying a valuable crop. High precision is required to prevent this economic loss.
* **Recall (Cost of False Negatives):** If our model fails to identify a *Dandelion* (classifying it as a *Rose*), the weed remains and competes for nutrients, reducing yield. High recall is required to ensure effectiveness.

We specifically select the **Macro-Average** (calculating F1 for each class independently and then averaging) rather than the Weighted-Average. This ensures that the model's performance on rarer flowers (like Sunflowers) is treated as equally important to its performance on abundant weeds.

**Why Confusion Matrix:**
Finally, we will visualize the results using a Confusion Matrix. This allows us to look "under the hood" to see exactly which pairs of flowers are being confused (e.g., are we consistently confusing *Tulips* with *Roses*?), allowing us to diagnose if specific visual features (like color or shape) are causing model failure.


***

### **1.2 Chosen Method for Dividing Data and Justification**

#### Chosen Method: Stratified Shuffle Split (80% Training / 20% Testing)

#### Justification:

We will use a **Stratified Shuffle Split**, allocating 80% of the data for training and 20% for final testing. During the model training phase (the 80% split), we will further reserve a portion (e.g., 10%) as a **validation set** to monitor convergence and prevent overfitting.

**Why Stratified?**
Our dataset is imbalanced (e.g., the *Sunflower* class makes up only ~17% of the data, while *Dandelion* is ~24%). A simple random shuffle runs the risk of creating a Test Set that has disproportionately few Sunflowers. If our Test Set does not contain a representative number of all species, our evaluation will be statistically unreliable. Stratified splitting forces the Train and Test sets to preserve the **exact same percentage** of each flower class as the original full dataset.

**Why not 10-Fold Cross-Validation?**
While 10-Fold Cross-Validation is standard for lighter algorithms (like SVMs or Decision Trees), it is computationally prohibitive for Convolutional Neural Networks (CNNs) due to the high training time required for image data.

**Realistic Mirroring:**
This stratified approach mirrors real-world deployment (the "Business Case"). In the field, an automated weeding robot will encounter flowers in their natural frequencies (i.e., more weeds than decorative flowers). By ensuring our Test set maintains this natural distribution, we can be confident that our metrics reflect how the model will actually perform in the agricultural environment, rather than being an artifact of a skewed random data split.

# 2. Modeling

# 3. Exceptional Work

# 4. Citation