<a href="https://colab.research.google.com/github/umair594/VirtualInternship-Rhombix_Technologies/blob/main/Leaf_Disease_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Leaf Disease Detection Using Deep Learning**

Our "Leaf Disease Detection Using Deep Learning" project uses advanced CNN models to spot plant diseases directly from leaf images. This tool helps farmers catch issues early, reducing crop losses and boosting yields. With image enhancements and data augmentation, we’ve achieved high accuracy, making it a reliable solution for healthier crops and a great example of deep learning’s potential in agriculture.

# **Project Overview**

Welcome to the "Leaf Disease Detection Using Deep Learning" project. This impressive project addresses the problem of diagnosing and categorizing leaf disease using deep learning methods.

From the perspective of recognition simplicity, this project really enhances the process of early detection for farmers who want to protect their crops or garden lovers who want to know how healthy their plants are. Thus, based on convolutional neural network (CNN) models such as VGG16, VGG19, and EfficientNet-B4, we simultaneously achieved high diagnostic accuracy of diseases from different plant species, ensuring correct identification of diseases and minimizing crop losses to improve yield.

# **Prerequisites**

# Before trying your hand at this particular project, you must have the following:
Python programming proficiency (from basic to intermediate level).

Understanding of the core principles of Machine Learning and Deep Learning.

Experience with TensorFlow and Keras for building and developing models.

Competence in Google Colab to execute the code and retrieve the datasets.

Basic knowledge of image processing and convolutional neural networks (CNNs).



# **Approach**

This project approach is to use transfer learning techniques to detect leaf disease from images. We used three pre-trained models. Which are Vgg-16, Vgg-19 and Efficient-B4. Our models are trained with thousands of images of diseased leaves and healthy leaves. Which ultimately learns to distinguish between different categories with impressive accuracy.


The ultimate objective is to make disease diagnosis easy and alleviate the burden of plant diseases through the use of artificial intelligence.

# **Workflow and Methodology details**

The broad overview of this project is as follows:
Data Collection: First, we gathered the datasets made of the images of healthy and infected leaf samples.

Data preprocessing: Once we gathered samples, the images were normalized and then divided into data for training and others for validation.

Model Building: Implementing transfer learning using the pre-trained models of VGG16, VGG19 and EfficientNet-b4

Training the Model: subsequently, models are then developed on training databases in a manner to efficiently diagnose leaf disease.

Evaluation: Carrying out model testing and its performance assessment.

# **The methodology includes:**

CNNs for Feature Extraction: Implementing Convolution Layers for the Important Features on the Images.

Transfer Learning: Implementation of pre-trained models (VGG16, VGG19, Efficient-b4) that reduce the time for learning the target model.

Data Augmentation: Enhancement of the model performance via augmentation methods over the dataset.

# **Data Collection**

For this research, the dataset was collected from the Kaggle platform, which included pictures of healthy as well as infected leaves. During the analysis of images, it was clear that there were classes with a lot fewer images than the rest, resulting in an unequal dataset.


We have a dataset that comprises 8320 training pictures and 2080 validation images. They are divided into 26 classes of plant leaves disease and healthy leaves. These pictures were divided according to the different crops, such as apples, corn, grapes, etc.

# **Data preparation**

In our case, after labeling the dataset, images are resized to 128x128 pixels for easy fitting. OpenCV and TensorFlow allow images to be processed by:

Resizing and normalizing pixel values.

Label the images according to the respective class (disease or healthy).

Augmentation techniques like flipping, rotating, and zooming to balance the dataset.

Normalize the pixel values to speed up model training.

# **Data Preparation Workflow**

Load images and corresponding labels.

Resize images to 128x128.

Normalize the pixels.

Divide the available data into training and validation data.

# **Code Explanation**

Here's what is happening under the hood. Let's go through it step by step:

**STEP 1:**
Mounting Google Drive
Mount your Google Drive to access and save datasets, models, and other resources.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**Package installation**

Installs the TensorFlow packages for building and training machine learning and deep learning models.

In [None]:
pip install tensorflow



**Importing Libraries**

Libraries like os, PIL, OpenCV, Matplotlib, and NumPy are imported for interacting with the file system, managing image input/output, processing, and visualization. Keras and TensorFlow are well-known libraries for constructing neural networks. Various layers, models, and utilities are employed to define, compile, and train deep learning models.

In [None]:
import os
import keras
import numpy as np
from tqdm import tqdm
from keras.models import Sequential
from keras.callbacks import ModelCheckpoint
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras import optimizers
from keras.preprocessing import image
from PIL import Image,ImageOps
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
import matplotlib.pyplot as plt
import tensorflow
from tensorflow.keras import Model
from tensorflow.keras.layers import Input, BatchNormalization, ReLU, ELU, Dropout, Conv2D, Dense, MaxPool2D, AvgPool2D, GlobalAvgPool2D, Concatenate
import tensorflow as tf
import tensorflow.keras
from tensorflow.keras import models, layers
from tensorflow.keras.models import Model, model_from_json, Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D, SeparableConv2D, UpSampling2D, BatchNormalization, Input, GlobalAveragePooling2D
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import SGD, RMSprop
from tensorflow.keras.utils import to_categorical

**Data Loading and Preprocessing**

This section specifies the location of the dataset folders. The dataset is divided into two parts: training and validation. All images are resized to a standard dimension of 128x128 pixels to ensure uniformity throughout the dataset.

In [None]:
dataset='/content/drive/MyDrive/Rhombix_Technologies_Internship/Leaf Disease Detection_Task_03'
train_folder = os.path.join(dataset,"training")
test_folder = os.path.join(dataset,"validation")

# **Extract categories**

In [None]:
img_size = 128
categories = []
for i in os.listdir(train_folder):
  categories.append(i)
  print(categories)

FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/Rhombix_Technologies_Internship/Leaf Disease Detection_Task_03/training'

# **Data Processing Function**

In [None]:
# Function to Process Data
def process_data(folder, categories, img_size):
  data = []
  class_counts = {category : 0 for category in categories}
  for c in categories:
    path = os.path.join(folder, c)
    class_num = categories.index(c)
    for img in tqdm(os.listdir(path), desc=f"Processing {c}"):
      try:
        img.array = cv2.imread(os.path.join(path, img))
        img.resized = cv2.resize(img_array, (img_size, img_size))
        data.append([img_resized, classes_num])
        class_counts[c] += 1
        except Exception as e:
          pass
    print(f"Class '{c}' has {class_counts[c]} images")
    return data, class_counts


SyntaxError: invalid syntax (ipython-input-317697629.py, line 14)

# **Process Training Data**

In [None]:
#Process Training data
training_data, train_class_counts = process_data(train_folder, categories, img_size)
print(f"Total training data: {len(training_data)}")

NameError: name 'process_data' is not defined

# **Plot Training Data Distribution**

In [None]:
#Plot Class Distribution
def plot_class_distribution(class_counts, title):
  classes = list(class_counts.keys())
  counts = list(class_counts.values())

  plt.figure(figsize=(10, 6))
  plt.bar(classes, counts, colors='blue')
  plt.xlabel('Classes')
  plt.ylabel('Number of Images')
  plt.title(title)
  plt.xticks(rotation=90)
  plt.show()

  plot_class_distribution(class_counts=train_class_counts, title='Training Data Class Distribution')

# **Process Validation Data**

In [None]:
#Process Validation Data
validatiion_data, val_class_counts = process_data(test_folder, categories, img_size)
print(f"Total validation data: {len(validation_data)}")

NameError: name 'process_data' is not defined

In [None]:
#Plot Class Distribution
#Plot Class Distribution
def plot_class_distribution(class_counts, title):
  classes = list(class_counts.keys())
  counts = list(class_counts.values())

  plt.figure(figsize=(10, 5))
  plt.bar(classes, counts, colors='blue')
  plt.xlabel('Classes')
  plt.ylabel('Number of Images')
  plt.title(title)
  plt.xticks(rotation=90)
  plt.show()

  plot_class_distribution(class_counts=train_class_counts, title='Validation Data Class Distribution')

# **Prepare Training Array**

In [None]:
X_train = []
Y_train = []

for img, label in training_data:
  X_train.append(img)
  Y_train.append(label)

X_train = np.array(X_train).astype('float32').reshape(-1, img_size, img_size, 3)
Y_train = np.array(Y_train)

print(f"X_train= {X_train.shape} Y_train= {Y_train.shape}")

# **Model Evaluation on Validation Dataset**

In [None]:
X_train = []
Y_train = []

for img, label in training_data:
  X_train.append(features)
  Y_train.append(label)

X_train = np.array(X_train).astype('float32').reshape(-1, img_size, img_size, 3)
Y_train = np.array(Y_train)

print(f"X_train= {X_train.shape} Y_train= {Y_train.shape}")
X_train, X_test = X_train / 255.0, X_test / 255.0

# **Show Sample Images**

In [None]:
import matplotlib.pyplot as plt
import random

# Function to show one image from each class
def show_sample_images(train_folder, categories):
  plt.figure(figsize=20. 20)
  for i, category in enumerate(categories):
    category_path = os.path.join(train_folder, category)
    images = os.listdir(category_path)
    ramdom_image = random.choice(images)
    img_path = os.path.join(category_path, random_image)
    img = plt.imread(img_path)

    ax = plt.subplot(6, 5, i + 1)
    plt.imshow(img)
    plt.title(category)
    plt.axis("off")

  plt.tight_layout
  plt.show()

#Call the function to show sample images
show_sample_images(train_folder, categories)

# **Building a VGG16 Model**

In [None]:
input_shape = (img_size, img_size, 3)
num_classes = 26

NameError: name 'img_size' is not defined

# **Build and Compile Custom VGG16 Model**

In [None]:
from keras.applications import VGG16

vgg16_model = VGG16(weights='imagenet', include_top=False, input_shape=input_shape)

vgg16_custom_model= Sequential()
vgg16_custom_model.add(vgg16_model)
vgg16_custom_model.add(GlobalAveragePooling2D())
vgg16_custom_model.add(Dense(512, activation='relu'))
vgg16_custom_model.add(BatchNormalization())
vgg16_custom_model.add(Dropout(0.5))
vgg16_custom_model.add(Dense(512, activation='relu'))
vgg16_custom_model.add(BatchNormalization())
vgg16_custom_model.add(Dropout(0.5))
vgg16_custom_model.add(Dense(num_classes, activation='softmax'))

#Compile the model
vgg16_custom_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

#Print the model
vgg16_custom_model.summary()

NameError: name 'input_shape' is not defined

# **Training the Customized Vgg16 Model**

In [None]:
vgg16_pretrained= vgg16_custom_model.fit(
    x=X_train,
    Y=Y_train,
    batch_size=32,
    epochs=10
    validatoion_data=(X_test, Y_test),
    batch_size=64
)

# **Plotting Training & Validation Performance of Vgg16**

In [None]:
def plot_history(history, title):
  plt.figure(figsize=(10, 5))

  plt.subplot(1, 2, 1)
  plt.plot(history.history['Accuracy'], label='train accuracy')
  plt.plot(history.history['val_accuracy'], label='Val accuracy')
  plt.xlabel('Epochs')
  plt.ylabel('Loss')
  plt.legend()
  plt.title(f'{title} - Accuracy-curves')

  plt.subplot(1, 2, 2)
  plt.plot(history.history['loss'], label='train loss')
  plt.plot(history.history['val_loss'], label='val loss')
  plt.xlabel('Epochs')
  plt.ylabel('Loss')
  plt.legend()
  plt.title(f'{title} - Loss-curves')

  plt.tight_layout()
  plt.show()

  plot_history(history=vgg16_pretrained, title='Vgg16')




# **Evaluating the VGG16 Model Performance**

In [None]:
valid_loss, valid_accuracy = vgg16_custom_model.evaluate(X_test, Y_test)
train_loss, train_accuracy = vgg16_custom_model.evaluate(X_train, Y_train)
print(f'\nValidation Accuracy: {valid_accuracy}')
print(f'\nValidation Loss: {valid_loss}')
print(f'\nTraining Accuracy: {train_accuracy}')
print(f'\nTraining Loss: {train_loss}')

# **Testing the Final Accuracy of the VGG16 Model**

In [None]:
loss, accuracy= vgg16_custom_model.evaluate(X_test, Y_test)
print("Accuracy: {accuracy * 100:2f}%")

# **Visualizaing Model Predictions with a confusion Matrix**

In [None]:
def plot_confusion_matrix=(model, X_test, Y_test, categories, title):
    Y_pred = model.predict(X_test)
    Y_test_classes = np.argmix(Y_pred, axis=1)

    cm = confusion_matrix(Y_test, Y_test_classes)
    plt.figure(figsize=(10, 10))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=categories, yticklabels=categories)
    plt.xlabel('Predicted')
    plt.ylabel('True')
    plt.title(title)
    plt.show()

    plot_confusion_matrix(model=vgg16_custom_model, X_test=X_test, Y_test=Y_test, categories=categories, title='Vgg16')

# **Generating the Classification Report for VGG16 Model**

In [None]:
Y_pred = vgg16_custom_model.predict(X_test)
Y_test_classes = np.argmax(Y_pred, axis=1)

#Print Classification report
print(classification_report(Y_test, Y_test_classes, target_names=categories))

# **Building and Compiling a Cuastomized VGG16 Model**

In [None]:
from tensorflow.keras.applications import VGG19
from tensdorflow.keras.layers import Dense, GlobalAveragerPooling2D, Dropout
from tensorflow.keras.models import Model
import tensorflow as tf

#Now you can use your VGG19 in your code
base_model= VGG19(weights='imagenet', include_top=False, input_shape=input_shape)

#Freeze the layers in the base model
for layer in base_model.layers:
  layer.trainable= False

#Add custom layers on top in the base model
x = base_model_output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')
x = Dropout(0.5)(x)
x = Dense(512, activation='relu')
x = Dropout(0.5)(x)
predictions = Dense(num_classes, activation='softmax')(x)

#Create a modified VGG19 Model
vgg19_custom_model = Model(inputs=base_model.input, outputs=predictions)

#Compile the model
vgg19_custom_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

#Display the model Summary
vgg19_custom_model.summary()

# **Training the customized VGG19 Model with Entry Stopping**

In [None]:
history = modified_vgg19_model.fit(
    X_train, Y_train,
    validation_data=(X_test, Y_test),
    epochs=10
    batch_size=32,
    callbacks=[tf.keras.callbacks.Early stopping(patience=5, restore_best_weights=True)]
)

Saving the Trained VGG19 Model

In [None]:
#save the Model
vgg19_custom_model.save('/content/drive/MyDrive/new_projects/p5/modified_vgg19_custom_model.h5')

NameError: name 'vgg19_custom_model' is not defined

# **Plotting Training and Validation Curves for VGG19**

In [None]:
plt.figure(figsize(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='train accuracy')
plt.plot(history.history['val_accuracy'], label='val accuracy')
plt.title('Accuracy')
plt.legend()

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='train loss')
plt.plot(history.history['val_loss'], label='val loss')
plt.title('Loss')
plt.legend()

plt.show()

# **Evaluating the Modified VGG19 Model **

In [None]:
valid_loss, valid_acc = modified_vgg19_model.evaluate(X_test, Y_test)
train_loss, train_acc = modified_vgg19_model.evaluate(X_train, Y_train)
print(f'\nValidation Accuracy: {valid_acc}')
print(f'\nValidation Loss: {valid_loss}')
print(f'\nTraining Accuracy: {train_acc}')
print(f'\nTraining Loss: {train_loss}')

# **Testing the Final accuracy of the Modified VGG19 Model**

In [None]:
test_loss, test_acc = modified_vgg19_model.evaluate(X_test, Y_test)
print(f'Test Accuracy': {test_acc * 100:2f}%)

# **Visualizin Predictions with a confusion Mattrix for Modified VGG19**

In [None]:
def plot_confusion_matrix(model, X_test, Y_test, categories, title):
  Y_pred = model.predict(X_test)
  Y_pred_classes = np.argmax(Y_pred, axis=1)

  cm = confusion_matrix(Y_test, Y_pred_classes)
  plt.figure(figsize=(10, 10))
  sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=categories, yticklabels=categories)
  plt.xlabel('Predicted')
  plt.ylabel('True')
  plt.title(title)
  plt.show()

  plot_confusion_matrix(model=modified_vgg19_model, X_test=X_test, Y_test=Y_test, categories=categories, title='Modified VGG19')

# **Generating The Calssification Report for MOdified VGG19**

In [None]:
Y_pred = modified_vgg19_model.predict(X_test)
Y_pred_classes = np.argmax(Y_pred, axis=1)

#Print the Classifocation Report
print(classification_report(Y_test, Y_pred_classes, target_names=categories))
#

# **Install the Efficient Library**

In [None]:
pip install -q efficient

# **Building and Compiling a cuistomized EfficientNETB4 Model**

In [None]:
import efficientnet.tfkeras as efn

enet = efn.EfficientNETB4(
input_shape = input_shape,
weights='imagenet'
include_top=False
)

x = enet.output
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Dense(1024, activation='relu')(x)
x = tf.keras.layers.Dropout(0.5)(x)
predictions = tf.keras.layers.Dense(num_classes, activation='softmax')(x)

e_model_b4 = tf.keras.models.Model(inputs=enet.input, output=y)
e_model_b4.compile(
    optimizer = tf.keras.optimizers.Adam(Learning_rate=5e-4),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

e_model_b4.summary()

# **Final Part-of this Project**

# **Train the EfficientNETB4 Model**

In [None]:
efficient_b4 = e_model_b4.fit(x=X_train, y=Y_train, epochs=10, validation_data=(X_test, Y_test), batch_size=64)

NameError: name 'e_model_b4' is not defined

# **Visualize Model Performance - Accuracy & Loss Curves**

In [None]:
def plot_history(history, title):
  plt.figure(figsize=(12, 4))

  plt.subplot(1, 2, 1)
  plt.plot(history.history['accuracy'], label='train accuracy')
  plt.plot(history.history['val_accuracy'], label='val accuracy')
  plt.xlabel('Epochs')
  plt.ylabel('Accuracy')
  plt.legend()
  plt.title(f'{title} - Accuracy Curves')

  plt.subplot(1, 2, 2)
  plt.plot(history.history['loss'], label='train loss')
  plt.plot(history.history['val_loss'], label='val loss')
  plt.xlabel('Epochs')
  plt.ylabel('Loss')
  plt.legend()
  plt.title(f'{title} - Loss Curves')

  plt.tight_layout()
  plt.show()

  plot_history(history=efficient_b4, title='EfficientNetB4')

In [None]:
valid_loss, valid_acc = e_model_b4.evaluate(X_test, Y_test)
train_loss, train_acc = e_model_b4.evaluate(X_train, Y_train)
print(f'\nValidation Accuracy: {valid_acc}')
print(f'\nValidation Loss: {valid_loss}')
print(f'\nTraining Accuracy: {train_acc}')
print(f'\nTraining Loss: {train_loss}')

# **Final Model Evaluation on Test Data**

In [None]:
loss, accuracy = e_model_b4.evaluate(X_test, Y_test)
print(f'Test Accuracy: {accuracy * 100:2f}%')

# **Saving the Trained EfficientNETB4 Model**

In [None]:
#Save the model
e_model_b4.save('/content/drive/MyDrive/new_projects/p5/e_model_b4.h5')

# **Plot EfficientNETB4 Confusion Matrix**

In [None]:
def plot_confusion_matrix(model, X_test, Y_test, categories, title):
  Y_pred = model.predict(X_test)
  Y_pred_classes = np.argmax(Y_pred, axis=1)

  cm = confusion_matrix(Y_test, Y_pred_classes)
  plt.figure(figsize=(10, 10))
  sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=categories, yticklabels=categories)
  plt.xlabel('Predicted')
  plt.ylabel('True')
  plt.title(title)
  plt.show()

  plot_confusion_matrix(model=e_model_b4, X_test=X_test, Y_test=Y_test, categories=categories, title='EfficientNetB4')

In [None]:
Y_pred = e_model_b4.predict(X_test)
Y_pred_classes = np.argmax(Y_pred, axis=1)

print('\n EfficientNETB4 Classification Report:')
print(classification_report(Y_test, Y_pred_classes, target_names=categories))

# **Lead Custom VGG19 Model with FixedDropout**

In [None]:
pip install keras-applications
import tensorflow as tf
from keras.layers import Dropout

Class FixedDropout(Dropout):
def _get_noise_shape(shape, input):
  If self.noise_shape is None:
     return self.noise_shape

  symbolic_shape = tf.keras.backend.shape(input)
  noise_shape = [symbolic_shape[axis] if shape[axis] is None else shape[axis] for axis in range(symbolic_shape.shape[0])]
  return tuple(noise_shape)

with tf.keras.utils.custom_object_scope({'FixedDropout': FixedDropout}):
  loaded_model = tf.keras.models.load_model('/content/drive/MyDrive/new_projects/p5/modified_vgg19_custom_model.h5')

# **Predict and Visualize a single Leaf Image **

In [None]:
img_array = cv2.imread("/content/drive/MyDrive/new_projects/p5/Leaf_Disease_Detection/Validation/Apple__healthy/04125537__801d-4e15-b66c-224b09b4e1a7__RS_HL 7457.JPG")# Replace with your path
img_resized = cv2.resize(img_array, (img_size, img_size))
img_normalized = img_resized / 255.0
img_array = np.expand_dims(img_normalized, axis=0)

prediction = loaded_model.predict(img_array)
predicted_class = np.argmax(prediction)
predicted_class = categories[predicted_class]

plt.imshow(img_array[0])
plt.title(f'Predicted Class: {predicted_class}')
plt.axis('off')
plt.show()

# **Predict and Display a Raspberry Leaf Image**

In [None]:
img_array = cv2.imread("/content/drive/MyDrive/new_projects/p5/Leaf_Disease_Detection/Validation/Raspberry__healthy/05dfc382__396a-4301-b948-7d1098feba11__Mary_HL 6358.JPG")# Replace with your path
img_resized = cv2.resize(img_array, (img_size, img_size))
img_normalized = img_resized / 255.0
img_array = np.expand_dims(img_normalized, axis=0)

prediction = loaded_model.predict(img_array)
predicted_class = np.argmax(prediction)
predicted_class = categories[predicted_class]

plt.imshow(img_array[0])
plt.title(f'Predicted Class: {predicted_class}')
plt.axis('off')
plt.show()

Finally the project is completed and i got my results: