In [14]:
# clean the current working directory
! cd /content
! rm -r *

rm: cannot remove '*': No such file or directory


## Introduction
In this notebook, we are developing a machine learning model that can classify some birds that are commonly spotted in the Netherlands. We have the following requirements for the model:
*   our model can classify images of commonly spotted birds
*   our model can classify images that do not show a bird as "no bird"
*   our model is developed using transfer learning.
*   our model achieves an accuracy of 95 percent or higher on classifying if it is a bird within the test set.
*   our model achieves an accuracy of ?? percent or higher on classifying the bird species within the test set.

We follow the following structure to develop our model:
1. Select a bird image dataset
2. select a no bird image dataset
3. load in and split the data
5. select an open machine learning model for image classification
6. load the model and freeze hidden layers
5. apply transfer learning on said model using our dataset
6. test the accuracy of the trained model
7. if requirements are met, save the model to be used in our project

# 1. Bird image dataset

We have selected an open dataset of bird images with their species as a
class. The bird dataset can be found [here](https://www.kaggle.com/datasets/davemahony/20-uk-garden-birds/). The dataset contains images of birds from 20 species that are commonly found in the UK. This is relevant to us because birds that are common in the UK overlap with the birds that are commonly spotted in The Netherlands.

# 2. No bird image dataset

We have also selected an open dataset of tree images. The tree dataset can be found [here](https://www.kaggle.com/datasets/bryanb/forests-trees-and-leaves), the house dataset [here](https://www.kaggle.com/datasets/balraj98/facades-dataset) and the grass field dataset [here](https://www.kaggle.com/datasets/usharengaraju/grassclover-dataset). This is relevant to us because we want to train the machine learning model such that it can also classify and image as "No bird".

# 3. Load and split the data

Now that we have relevant image data, we load this data into our workspace and split it into a train, validation, and test set. We add the "No bird" class to the data that was discussed in part 2.

We want a good distribution of images per class after splitting the data. We want to split the images such that the ratio of images per class is the same in every set. So if we apply a 80-10-10 split, then we want 80 percent of the images per class in the training set, 10 percent of the images per class in the validation set, and 10 percent of the images per class in the test set.

In [None]:
!pip install -q kaggle

from google.colab import files
files.upload()

! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json

import kaggle

## Loading the bird dataset

In [None]:
! kaggle datasets download -d davemahony/20-uk-garden-birds

! rm -r bird_data

! mkdir bird_data

! unzip -q 20-uk-garden-birds.zip -d bird_data

! ls bird_data -all

Downloading 20-uk-garden-birds.zip to /content
 99% 374M/378M [00:11<00:00, 40.1MB/s]
100% 378M/378M [00:11<00:00, 35.0MB/s]
rm: cannot remove 'bird_data': No such file or directory
total 884
drwxr-xr-x  4 root root   4096 Oct 16 14:39 .
drwxr-xr-x  1 root root   4096 Oct 16 14:39 ..
-rw-r--r--  1 root root 865711 Jul 23 10:42 birds.csv
-rw-r--r--  1 root root  15095 Jul 23 10:42 stats.xlsx
drwxr-xr-x 22 root root   4096 Oct 16 14:39 withBackground
drwxr-xr-x 22 root root   4096 Oct 16 14:39 withoutBackground


## Loading the leaf dataset

In [None]:
! kaggle datasets download -d alexo98/leaf-detection

! rm -r leaf_data

! mkdir leaf_data

! unzip -q leaf-detection.zip -d leaf_data

! ls leaf_data -all

Downloading leaf-detection.zip to /content
 98% 96.0M/98.3M [00:03<00:00, 41.1MB/s]
100% 98.3M/98.3M [00:03<00:00, 30.8MB/s]
rm: cannot remove 'leaf_data': No such file or directory
total 288
drwxr-xr-x 4 root root   4096 Oct 16 14:39 .
drwxr-xr-x 1 root root   4096 Oct 16 14:39 ..
drwxr-xr-x 3 root root   4096 Oct 16 14:39 test
drwxr-xr-x 2 root root  36864 Oct 16 14:40 train
-rw-r--r-- 1 root root 240282 Jun 22  2020 train.csv


## Loading the house dataset

In [None]:
! kaggle datasets download -d balraj98/facades-dataset

! rm -r house_data

! mkdir house_data

! unzip -q facades-dataset.zip -d house_data

! ls house_data -all

Downloading facades-dataset.zip to /content
 92% 31.0M/33.5M [00:01<00:00, 28.1MB/s]
100% 33.5M/33.5M [00:01<00:00, 21.2MB/s]
rm: cannot remove 'house_data': No such file or directory
total 92
drwxr-xr-x 6 root root  4096 Oct 16 14:40 .
drwxr-xr-x 1 root root  4096 Oct 16 14:40 ..
-rw-r--r-- 1 root root 49004 Oct 17  2020 metadata.csv
drwxr-xr-x 2 root root  4096 Oct 16 14:40 testA
drwxr-xr-x 2 root root  4096 Oct 16 14:40 testB
drwxr-xr-x 2 root root 12288 Oct 16 14:40 trainA
drwxr-xr-x 2 root root 12288 Oct 16 14:40 trainB


## Loading the grass dataset

In [None]:
! kaggle datasets download -d usharengaraju/grassclover-dataset

! rm -r grass_data

! mkdir grass_data

! unzip -q grassclover-dataset.zip -d grass_data

! ls grass_data -all

Downloading grassclover-dataset.zip to /content
100% 2.04G/2.04G [00:59<00:00, 33.3MB/s]
100% 2.04G/2.04G [00:59<00:00, 36.8MB/s]
rm: cannot remove 'grass_data': No such file or directory
total 16
drwxr-xr-x 3 root root 4096 Oct 16 14:41 .
drwxr-xr-x 1 root root 4096 Oct 16 14:41 ..
drwxr-xr-x 4 root root 4096 Oct 16 14:41 biomass_data


In [None]:
! rm -r data

! mkdir data

rm: cannot remove 'data': No such file or directory


In [None]:
import os
import shutil

In [None]:
# Source directory
src_dir = "./bird_data/withBackground/"

# Destination directory
dest_dir = "./data/"

# Walk and move files from the source to destination directory with their new name
for root, dirs, files in os.walk(src_dir):
    for dir in dirs:
        try:
            shutil.rmtree(f"./data/{dir.replace('_', '')}", )
        except OSError:
            pass
        os.mkdir(f"./data/{dir.replace('_', '')}")
        for file in os.listdir(root + "/" + dir + "/"):
            src_file = os.path.join(root, dir, file)
            dest_file = os.path.join(dest_dir, dir.replace('_',''), file.replace('(','').replace(')',''))
            shutil.copy(src_file, dest_file)

## Moving and renaming the tree, house, and grass data

In [None]:
try:
    shutil.rmtree("./data/NoBird")
except OSError:
    pass
os.mkdir("./data/NoBird")

In [None]:
# Source directory
src_dir = "./leaf_data/train/"

# Destination directory
dest_dir = "./data/"

# Walk and move files from the source to destination directory with their new name
for (i,filename) in enumerate(os.listdir(src_dir),1):
    if i < 151:
        src_path = os.path.join(src_dir, filename)
        dest_path = os.path.join(dest_dir, "NoBird", f"{i}.jpg")
        shutil.copy(src_path, dest_path)

In [None]:
# Source directory
src_dir = "./house_data/trainA/"

# Destination directory
dest_dir = "./data/"

# Walk and move files from the source to destination directory with their new name
for (i, filename) in enumerate(os.listdir(src_dir), 151):
    if i < 301:
        src_path = os.path.join(src_dir, filename)
        dest_path = os.path.join(dest_dir, "NoBird", f"{i}.jpg")
        shutil.copy(src_path, dest_path)

In [None]:
# Source directory
src_dir = "./grass_data/biomass_data/test/images/"

# Destination directory
dest_dir = "./data/"

# Walk and move files from the source to destination directory with their new name
for (i, filename) in enumerate(os.listdir(src_dir), 301):
    if i < 451:
        src_path = os.path.join(src_dir, filename)
        dest_path = os.path.join(dest_dir, "NoBird", f"{i}.jpg")
        shutil.copy(src_path, dest_path)

In [None]:
import os

# Directory path
directory = "./data/"

# Initialize a dictionary to store directory counts
directory_counts = {}

# Iterate through the subdirectories
for subdirectory in os.listdir(directory):
    subdirectory_path = os.path.join(directory, subdirectory)

    if os.path.isdir(subdirectory_path):
        num_files = len(os.listdir(subdirectory_path))
        directory_counts[subdirectory] = num_files

# Print the directory counts
for directory, count in directory_counts.items():
    print(f"Directory '{directory}' contains {count} files.")

# Total count of unique directories
total_directories = len(directory_counts)
print(f"Total unique directories: {total_directories}")

Directory 'Robin' contains 150 files.
Directory 'Wren' contains 150 files.
Directory 'Magpie' contains 92 files.
Directory 'Chaffinch' contains 150 files.
Directory 'Goldfinch' contains 150 files.
Directory 'HouseSparrow' contains 150 files.
Directory 'CoalTit' contains 144 files.
Directory 'LongTailedTit' contains 150 files.
Directory 'Blackbird' contains 150 files.
Directory 'CarrionCrow' contains 131 files.
Directory 'FeralPigeon' contains 93 files.
Directory 'GreatTit' contains 150 files.
Directory 'Bluetit' contains 150 files.
Directory 'Dunnock' contains 150 files.
Directory 'WoodPigeon' contains 150 files.
Directory 'NoBird' contains 450 files.
Directory 'Starling' contains 150 files.
Directory 'CollaredDove' contains 150 files.
Directory 'Jackdaw' contains 131 files.
Directory 'SongThrush' contains 150 files.
Directory 'Greenfinch' contains 150 files.
Total unique directories: 21


## Splitting the data into train, validation, and test sets

In [None]:
try:
    shutil.rmtree("./split_data")
except OSError:
    pass
os.mkdir("./split_data")

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
def split_data(ratio = (0.8, 0.1, 0.1)):
    src_dir = "./data/"
    dest_dir = "./split_data/"

    classes = os.listdir(src_dir)

    for class_name in classes:
        class_folder = os.path.join(src_dir, class_name)
        images = os.listdir(class_folder)

        train_images, temp_images = train_test_split(images, test_size=(ratio[1] + ratio[2]))
        val_images, test_images = train_test_split(temp_images, test_size=ratio[2] / (ratio[1] + ratio[2]))

        for dataset, dataset_name in zip([train_images, val_images, test_images], ["train", "val", "test"]):
            dataset_folder = os.path.join(dest_dir, dataset_name, class_name)
            os.makedirs(dataset_folder, exist_ok=True)

            for image in dataset:
                source_path = os.path.join(class_folder, image)
                destination_path = os.path.join(dataset_folder, image)
                shutil.copy(source_path, destination_path)

    print("Data split completed.")

split_data()

Data split completed.


## Augmenting our training and validation sets

We want to develop a powerful image classification model using our few training and validation examples. That is why we have decided to augment them using Keras' ImageDataGenerator class. The images are augmented with a number of random transformations, so that our model would never see twice the exact same picture. This helps prevent overfitting and helps the model generalize better.

Read more [here](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html).

In [None]:
try:
    shutil.rmtree("./generated_data")
except OSError:
    pass
os.mkdir("./generated_data")
os.mkdir("./generated_data/train")
os.mkdir("./generated_data/val")

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
IMAGE_SHAPE = (1024, 1024)

In [None]:
src_dir = "./split_data/train"
dest_dir = "./generated_data/train"

train_datagen = ImageDataGenerator(
    rescale = 1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

train_generator = train_datagen.flow_from_directory(
    src_dir,
    target_size=IMAGE_SHAPE,
    batch_size=5,
    class_mode='categorical',
    save_to_dir=dest_dir,
)

Found 2630 images belonging to 21 classes.


In [None]:
src_dir = "./split_data/val"
dest_dir = "./generated_data/val"

val_datagen = ImageDataGenerator(rescale = 1./255)

val_generator = val_datagen.flow_from_directory(
    src_dir,
    target_size=IMAGE_SHAPE,
    batch_size=5,
    class_mode='categorical',
    save_to_dir=dest_dir,
)

Found 328 images belonging to 21 classes.


#4. select and open machine learning model for image classification

Why we use a BiT model for transfer learning:

"From architecture perspective BiT is nothing but a 4x times Scaled version of ResNet152V2. The main idea here is of Transfer Learning this model is pre-trained on a Large Dataset, so it can be trained on sub-datasets or basically other small datasets and as the model is pre-trained on a Very large Dataset it is expected that it will perform amazingly well on the small dataset."

https://www.kaggle.com/datasets/utkarshsaxenadn/bitbirdspecies/data

In [None]:
import numpy as np
import time

import PIL.Image as Image
import matplotlib.pylab as plt

import tensorflow as tf
import tensorflow_hub as hub

import datetime

%load_ext tensorboard

In [None]:
BiT ="https://tfhub.dev/google/bit/m-r50x1/1"

classifier = tf.keras.Sequential([
    hub.KerasLayer(BiT, input_shape=IMAGE_SHAPE+(3,))
])

classifier.add(tf.keras.layers.Dense(1001, activation='softmax'))

classifier.summary()

NameError: ignored

In [None]:
grace_hopper = np.array(grace_hopper)/255.0
grace_hopper.shape

NameError: ignored

In [None]:
result = classifier.predict(grace_hopper[np.newaxis, ...])
result.shape

In [None]:
predicted_class = tf.math.argmax(result[0], axis=-1)
predicted_class

In [None]:
labels_path = tf.keras.utils.get_file('ImageNetLabels.txt','https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt')
imagenet_labels = np.array(open(labels_path).read().splitlines())

In [None]:
plt.imshow(grace_hopper)
plt.axis('off')
predicted_class_name = imagenet_labels[predicted_class]
_ = plt.title("Prediction: " + predicted_class_name.title())

We can save the model including weights and biases using <model>.save() or <model>.save_model() according to https://www.tensorflow.org/guide/keras/serialization_and_saving

Creating and fitting the Mode (from https://www.kaggle.com/code/emreiekyurt/bird-species-classification-with-deep-learning)

In [None]:
# 1. Create a base model with tf.keras.applications
base_model = tf.keras.applications.InceptionV3(include_top= False,)

# 2. Freeze the base model
base_model.trainable = False

#3. Create inputs into models
inputs = tf.keras.layers.Input(shape =(300,300,3), name = "input-layer")

#4. Rescaling
#x = tf.keras.layers.experimental.preprocessing.Rescaling(1/255.)(inputs)

#5. Pass the inputs
x = base_model(inputs)
print(f"Shape after passing inputs through base model: {x.shape}")

# 6. Average pool the outputs of the base model
x = tf.keras.layers.GlobalAveragePooling2D(name = "global_average_pooling_layer")(x)
print(f"Shape after GlobalAveragePooling2D: {x.shape}")

#7. Create the output activation layer
outputs = tf.keras.layers.Dense(450, activation = "softmax", name = "output-layer")(x)

# 8. Combine the inputs with outputs into a model
model_0 = tf.keras.Model(inputs, outputs)

# 9. Compile the model
model_0.compile(loss = "categorical_crossentropy",
                optimizer = tf.keras.optimizers.Adam(learning_rate = 0.01),
                metrics = ["accuracy"])


history = model_0.fit(train_data,
                                 epochs=10,
                                 steps_per_epoch = len(train_data),
                                 validation_data = val_data,
                                 validation_steps = int(0.25*len(val_data)),)

In [None]:
model_0.summary()

In [None]:
model_0.evaluate(test_data)

In [None]:
def plot_loss_curves(history):

  loss = history.history["loss"]
  val_loss = history.history["val_loss"]

  accuracy = history.history["accuracy"]
  val_accuracy = history.history["val_accuracy"]

  epochs = range(len(history.history["loss"]))

  #plot loss
  plt.plot(epochs, loss, label = "training_loss")
  plt.plot(epochs, val_loss, label = "val_loss")
  plt.title("loss")
  plt.xlabel("epochs")
  plt.legend()

  #plot accuracy
  plt.figure()
  plt.plot(epochs, accuracy, label = "training_accuracy")
  plt.plot(epochs, val_accuracy, label = "val_accuracy")
  plt.title("accuracy")
  plt.xlabel("epochs")
  plt.legend()

plot_loss_curves(history)

Freeze top layers of Base Model (from the aforementioned notebook)

In [None]:
# To begin fine-tuning lets start by setting the last 10 layers as trainable
base_model.trainable = True

# Un-freeze last 10 layers
for layer in base_model.layers[:-10]:
  layer.trainable = False

# Recompile (we have to compile model every time there is a change)
model_0.compile(loss = "categorical_crossentropy",
                optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001), # when fine-tuning you typically want to lower lr by 10x
                 metrics = ["accuracy"] )

# Check which layers are trainable
for layer_number, layer in enumerate(model_0.layers[1].layers):
  print(layer_number, layer.name, layer.trainable)

In [None]:
# Now we have unfrozen some of the layers on the top
print(len(model_0.trainable_variables))

Fine-tuning and Refitting (from the aforementioned notebook)

In [None]:
initial_epochs = 10
fine_tune_epochs = initial_epochs + 1

# Refit the model
history_2 = model_0.fit(train_data,
                       epochs = fine_tune_epochs,
                       validation_data = val_data,
                       validation_steps = int(0.25*len(val_data)),
                       initial_epoch =  history.epoch[-1],) # Start the epoch where it left before

In [None]:
model_0.evaluate(test_data)

In [None]:
plot_loss_curves(history_2)