<a href="https://colab.research.google.com/github/MichaelTay/w281-summer-2023-project/blob/mcliston_modeling/logistic_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fruit and Vegetable Classification
## \# Class activation heatmap for image classification
Taken from: https://www.kaggle.com/code/databeru/fruit-and-vegetable-classification

## \# Grad-CAM class activation visualization

Having 3861 images of 36 different fruits/vegetables

![fruit vegetable](https://i.imgur.com/KUAcIQD.jpeg)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MichaelTay/w281-summer-2023-project/blob/main/feature_detection.ipynb)

<h1>Table of contents</h1>


<ul>
<li><a href="#1"><strong>1. Loading and preprocessing</strong></a>
</ul>
    
<ul>
<li><a href="#2"><strong>2. Load the Images with a generator and Data Augmentation</strong></a>
</ul>

<ul>
<li><a href="#3"><strong>3. Train the model</strong></a>
</ul>

<ul>
<li><a href="#4"><strong>4. Visualize the result</strong></a>
</ul>

<ul>
<li><a href="#5"><strong>5. Class activation heatmap for image classification</strong></a>
</ul>

# Context

Image classification of fruits and vegetables has a wide range of applications in nutrition, cooking, farming, and produce wholesale. Being able to identify the type (fruit or vegetable) and class (which particular fruit or vegetable) is the foundational layer upon which one can build useful techniques related to produce such as quality evaluation, insect infestation, ripeness evaluation, sorting, recipe generation, and a myriad of others. Creating a well-performing baseline fruit and vegetable classifier opens up a world of possibilities for computer vision applications within the produce industry.

# Content
This dataset contains three folders:

- train (100 images each)
- test (10 images each)
- validation (10 images each)
each of the above folders contains subfolders for different fruits and vegetables wherein the images for respective food items are present# Context

This dataset contains images of the following food items:

- **fruits**: banana, apple, pear, grapes, orange, kiwi, watermelon, pomegranate, pineapple, mango
- **vegetables**: cucumber, carrot, capsicum, onion, potato, lemon, tomato, raddish, beetroot, cabbage, lettuce, spinach, soy bean, cauliflower, bell pepper, chilli pepper, turnip, corn, sweetcorn, sweet potato, paprika, jalepeño, ginger, garlic, peas, eggplant

To balance the class distribution, we chose a subset of 10 of the 20 total vegetable classes. The following classes were analyzed as part of this project:

Fruits = ['banana', 'apple', 'pear', 'grapes', 'orange', 'kiwi', 'watermelon', 'pomegranate', 'pineapple', 'mango']
Vegetables = ['bell pepper', 'cauliflower', 'chilli pepper', 'peas', 'corn', 'spinach', 'turnip', 'garlic', 'ginger', 'cabbage']



In [1]:
#importing required libraries
from skimage.io import imread
from skimage.transform import resize
from skimage.feature import hog
from skimage import exposure
import matplotlib.pyplot as plt
# Load the Drive helper and mount
#from google.colab import drive
import xarray as x

# 1. Loading and preprocessing<a class="anchor" id="1"></a><a class="anchor" id="1"></a>

In [2]:
import os

# Load the Drive helper and mount
from google.colab import drive

# This will prompt for authorization.
mountdir = '/content/drive'
drive.mount(mountdir, force_remount=True)

localdir = mountdir + '/MyDrive'
# Replace your folder here
w281_directory = '/Berkeley/w281/Fruit-and-Vegetable-Classification/'
inputdir = localdir + w281_directory
# Uncomment below if using local folder
# inputdir = "/Users/mcliston/Library/CloudStorage/GoogleDrive-michael.c.liston@gmail.com/My Drive/Berkeley/w281/Fruit-and-Vegetable-Classification/"

Mounted at /content/drive


In [3]:
import numpy as np
import pandas as pd
from pathlib import Path
import os.path
import matplotlib.pyplot as plt
import cv2
#import tensorflow as tf

# Create a list with the filepaths for training and testing
train_dir = Path(inputdir, './input/train')
train_filepaths = list(train_dir.glob(r'**/*.jpg'))

test_dir = Path(inputdir, './input/test')
test_filepaths = list(test_dir.glob(r'**/*.jpg'))

val_dir = Path(inputdir, './input/validation')
val_filepaths = list(test_dir.glob(r'**/*.jpg'))

def proc_img(filepath):
    """ Create a DataFrame with the filepath and the labels of the pictures
    """

    labels = [str(filepath[i]).split("/")[-2] \
              for i in range(len(filepath))]

    filepath = pd.Series(filepath, name='Filepath').astype(str)
    labels = pd.Series(labels, name='Label')

    # Concatenate filepaths and labels
    df = pd.concat([filepath, labels], axis=1)

    # Shuffle the DataFrame and reset index
    df = df.sample(frac=1).reset_index(drop = True)

    return df

train_df = proc_img(train_filepaths)
test_df = proc_img(test_filepaths)
val_df = proc_img(val_filepaths)

### Loading Training/Validation/Test sets

In [9]:
modeling_datasets = inputdir + 'modeling/'

In [18]:
!ls drive/MyDrive/Berkeley/w281/Fruit-and-Vegetable-Classification/modeling/train

train_hog_features.csv	      train_saturation_features.csv
train_hue_features.csv	      train_sobel_x_features.csv
train_laplacian_features.csv  train_sobel_y_features.csv
train_luminance_features.csv


In [19]:
train_hog = pd.read_csv(modeling_datasets+'train/train_hog_features.csv')
validation_hog = pd.read_csv(modeling_datasets+'validation/validation_hog_features.csv')
test_hog = pd.read_csv(modeling_datasets+'test/test_hog_features.csv')

In [52]:
test_hog = test_hog.drop('Unnamed: 0', axis=1)

In [31]:
full_train = pd.concat([train_hog, validation_hog], axis=0)
full_train = full_train.reset_index().drop(['Unnamed: 0', 'index'], axis=1)

## Logistic Regression

In [58]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import (roc_curve,
                             auc, RocCurveDisplay,
                             classification_report,
                             confusion_matrix,accuracy_score)
from sklearn.model_selection import StratifiedKFold

from sklearn.metrics import accuracy_score, f1_score, make_scorer
from hyperopt import tpe, hp, fmin, STATUS_OK,Trials
from hyperopt.pyll.base import scope
from tqdm import tqdm

def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

def confusion_matrix_plot(conf_matrix, save_name='file.png'):
    """ Confusion matrix matplotlib plot
    # param conf_matrix: nested list of TP, TN, FP, FN
    # return: None
    """

    font = {'family' : 'normal',
        'size'   : 14}
    fig, ax = plt.subplots(figsize=(2.5, 2.5))
    ax.matshow(conf_matrix, cmap=plt.cm.Blues, alpha=0.3)
    for i in range(conf_matrix.shape[0]):
        for j in range(conf_matrix.shape[1]):
            ax.text(x=j, y=i, s=conf_matrix[i, j], va='center', ha='center', **font)

    plt.xlabel('Predicted label')
    plt.ylabel('True label')

    plt.tight_layout()
    # plt.savefig(save_name, facecolor='white')

##### Bayesian Parameter Search

In [45]:
space = {
            'fit_intercept' : hp.choice('fit_intercept', [True, False]),
            'tol' : hp.uniform('tol', 0.00001, 0.0001),
            'C' : hp.uniform('C', 0.05, 3),
            'solver' : hp.choice('solver', ['newton-cg', 'lbfgs', 'liblinear']),
            'max_iter' : hp.choice('max_iter', range(100,1000)),
            # 'scale': hp.choice('scale', [0, 1]),
            'warm_start' : hp.choice('warm_start', [True, False]),
            'multi_class' : 'auto',
            'class_weight' : 'balanced'
                }

In [43]:
def optimize_lr(params):

    skf = StratifiedKFold(n_splits=10)
    clf = LogisticRegression(**params, n_jobs=-1)
    f1_weighted = cross_val_score(clf, full_train.iloc[:, 0:-1], full_train['label'],
                         scoring=make_scorer(accuracy_score),
                        cv=skf).mean()
    best_score = np.mean(f1_weighted)
    loss = 1 - best_score
    return {"loss":loss, "status":STATUS_OK}

In [48]:
RANDOM_SEED = 1234
trials = Trials()

best = fmin(
    fn=optimize_lr,
    space=space,
    algo=tpe.suggest,
    max_evals=20,
    trials=trials,
    rstate=np.random.default_rng(RANDOM_SEED)
)

print("Best: {}".format(best))

100%|██████████| 20/20 [18:29<00:00, 55.47s/trial, best loss: 0.788961038961039]
Best: {'C': 2.35920464457673, 'fit_intercept': 0, 'max_iter': 596, 'solver': 1, 'tol': 3.1488465609992565e-05, 'warm_start': 1}


In [53]:
params = {'C': 2.35920464457673,
          'fit_intercept': True,
          'max_iter': 596,
          'solver': 'lbfgs',
          'tol': 3.1488465609992565e-05,
          'warm_start': False}

hog_model = LogisticRegression(**params, n_jobs=-1)
hog_model.fit(full_train.iloc[:,0:-1], full_train['label'])

y_pred = hog_model.predict(test_hog.iloc[:,0:-1])


In [55]:
accuracy_score(test_hog['label'], y_pred)

0.47593582887700536

In [56]:
print(classification_report(test_hog['label'], y_pred))

               precision    recall  f1-score   support

        apple       0.00      0.00      0.00         9
       banana       1.00      0.44      0.62         9
  bell pepper       0.17      0.11      0.13         9
      cabbage       0.43      0.60      0.50        10
  cauliflower       1.00      0.22      0.36         9
chilli pepper       1.00      0.71      0.83         7
         corn       0.71      0.50      0.59        10
       garlic       0.54      0.70      0.61        10
       ginger       0.00      0.00      0.00        10
       grapes       0.35      0.75      0.48         8
         kiwi       0.38      0.80      0.52        10
        mango       0.50      0.40      0.44        10
       orange       0.00      0.00      0.00         7
         pear       1.00      0.60      0.75        10
         peas       0.47      0.78      0.58         9
    pineapple       0.26      0.80      0.39        10
  pomegranate       0.40      0.40      0.40        10
      spi

In [60]:
conf_matrix = confusion_matrix(test_hog['label'], y_pred)
conf_matrix

array([[ 0,  0,  2,  3,  0,  0,  0,  0,  0,  2,  0,  0,  0,  0,  1,  0,
         0,  0,  1,  0],
       [ 0,  4,  0,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0,  1,  2,
         0,  0,  1,  0],
       [ 0,  0,  1,  2,  0,  0,  0,  0,  0,  3,  2,  0,  0,  0,  0,  0,
         0,  0,  1,  0],
       [ 0,  0,  0,  6,  0,  0,  0,  0,  0,  0,  0,  3,  0,  0,  0,  0,
         0,  0,  0,  1],
       [ 0,  0,  0,  0,  2,  0,  0,  0,  0,  0,  1,  0,  0,  0,  0,  3,
         1,  0,  1,  1],
       [ 0,  0,  0,  1,  0,  5,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  5,  0,  0,  0,  2,  0,  0,  0,  0,  1,
         0,  0,  2,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  7,  0,  1,  0,  0,  0,  0,  2,  0,
         0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  1,  0,  1,  2,  0,  0,  0,  1,  2,
         1,  0,  0,  2],
       [ 0,  0,  0,  0,  0,  0,  0,  1,  0,  6,  0,  0,  0,  0,  0,  1,
         0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0