<div align="center" style="border-radius: 10px; overflow: hidden; box-shadow: 0px 0px 10px rgba(0, 0, 0, 0.1);">
  <img src="https://storage.googleapis.com/kaggle-datasets-images/8782/12270/c3af536d14e386a2bfd356d1cd84b67e/dataset-cover.jpg?t=2018-01-06-14-10-54" alt="Flower Dataset" style="border-radius: 10px;">
</div>

<div align="center" style="background-color: #f5f5f5; padding: 20px; border-radius: 10px; box-shadow: 0px 0px 10px rgba(0, 0, 0, 0.1); color: #555;">

# Flowers Recognition
### A Convolutional Neural Network with 6.5M Learnable Petals
###### ITHS AI22 Deep Learning Course | December 2023
</div>

[This dataset](https://www.kaggle.com/datasets/alxmamaev/flowers-recognition
) contains 4242 images of flowers.

### Content

The pictures are divided into five classes: chamomile, tulip, rose, sunflower, dandelion.
For each class there are about 800 photos. Photos are not high resolution, about 320x240 pixels. Photos are not reduced to a single size.

The data collection is based on scraped data from flickr, google images, and yandex images.

### Description

make model go vroom vroom

In [1]:
from tensorflow.data import Dataset

2023-12-13 20:49:25.850697: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Resources & Documentation

[api_docs/tf/keras/utils/image_dataset_from_directory](https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory)

[tensorflow.org/tutorials/images/classification](https://www.tensorflow.org/tutorials/images/classification)


# Creating a dataset (Train|Val|Test Split)

We first start out by creating our dataset. For this I use the image_dataset_from_directory API.<br>
By making it a function, we can call it with different parameters later on.

We use the take & skip methods to move a bunch of validation data into test data.

This results in:

xx% training:
- xx% training is used to adjust the weights in the network.
- xx% in-training validation used only to check metrics of the model after each epoch.

xx% testing (never seen by the training process at all)

**#TODO update values when satisfied with inference**

I've decided to stick to just a few batches of of test data (maybe just one?), as my intention is to test the model on new data. (e.g., flower images from search engines etc.)

I've also decided to run with a larger batch size (128) because there is high variance in the dataset.

In [2]:
from tensorflow.keras.preprocessing import image_dataset_from_directory as get_data
from tensorflow import one_hot

def create_dataset(data_dir='Data/flowers', batch_size=128, img_size=(224,224), val_size=0.2, shuffle=True, random_seed=42):

    print(f"Creating dataset with batch size: {batch_size}")

    train_ds = get_data(
      data_dir,
      validation_split=val_size,
      subset="training",
      seed=random_seed,
      image_size=img_size,
      batch_size=batch_size,
      shuffle=shuffle)

    val_ds = get_data(
      data_dir,
      validation_split=val_size,
      subset="validation",
      seed=random_seed,
      image_size=img_size,
      batch_size=batch_size,
      shuffle=shuffle)

    # Extracting test data from validation split
    val_batches = len(val_ds)
    test_ds = val_ds.take((val_batches) // 6)
    new_val_ds = val_ds.skip((val_batches) // 6)

    percent_test = len(test_ds) / (len(val_ds) + len(test_ds)) * 100

    print(f"Moving {len(val_ds) - len(new_val_ds)} batch(es) from validation to test.")
    print(f"Using {len(test_ds)*batch_size} files ({percent_test:.2f}%) for test.\n")

    class_names = train_ds.class_names

    # One-hot encoding labels
    try:
        num_classes = len(train_ds.class_names)
        train_ds = train_ds.map(lambda x, y: (x, one_hot(y, depth=num_classes)))
        new_val_ds = new_val_ds.map(lambda x, y: (x, one_hot(y, depth=num_classes)))
        test_ds = test_ds.map(lambda x, y: (x, one_hot(y, depth=num_classes)))
        print('Labels successfully encoded.')
    except Exception as e:
        print(f'Error during one-hot encoding: {e}')
        
    return train_ds, new_val_ds, test_ds, class_names


In [3]:
train_ds, val_ds, test_ds, class_names = create_dataset(shuffle=True, random_seed=3)

Creating dataset with batch size: 128
Found 4317 files belonging to 5 classes.
Using 3454 files for training.
Found 4317 files belonging to 5 classes.
Using 863 files for validation.
Moving 1 batch(es) from validation to test.
Using 128 files (12.50%) for test.

Labels successfully encoded.


When using `image_dataset_from_directory`, labes are automatically integer encoded in the dataset. One integer represent one type of flower.<br>
For multi-class classification tasks (like identifying different types of flowers), it is common to employ a technique one-hot encoding.<sup>[citation needed]</sup>


In [4]:
for images, labels in train_ds.take(1):
    print(labels.shape)

(128, 5)


# Preprocessing (Normalization)

In [5]:
for image_batch, labels_batch in train_ds:
  print(image_batch.shape)
  print(labels_batch.shape)
  break

(128, 224, 224, 3)
(128, 5)


The image_batch is a tensor of the shape (128, 224, 224, 3). This is a batch of 128 images of shape 224x224x3 (the last dimension refers to color channels RGB). The label_batch is a tensor of the shape (128,5), these are corresponding labels to the 128 images.

For one image:

In [6]:
for image_batch in train_ds:
  print(image_batch[0][0].shape)
  break

(224, 224, 3)


And a snippet of a tensor channel:

```
tf.Tensor(
[[[ 65.03571   132.03572   184.03572  ]
  [ 66.        133.        185.       ]
  [ 66.17857   133.17857   185.35715  ]
  ...
```

In [7]:
import numpy as np
print(type(train_ds))

for images, labels in train_ds.take(1):
    print(np.min(images), np.max(images))

<class 'tensorflow.python.data.ops.map_op._MapDataset'>
0.0 255.0


Default RGB values range between 0 and 255. We want to normalize these values.

In [8]:
def normalize_image(img, label):
    return img / 255, label

train_ds_normalized = train_ds.map(normalize_image)
val_ds_normalized = val_ds.map(normalize_image)
test_ds_normalized = test_ds.map(normalize_image)

for images, labels in train_ds_normalized.take(1):
    print(np.min(images), np.max(images))

0.0 1.0


When we use the map function on a tf.data.Dataset (or PrefetchDataset), we apply a function (such as `normalize_image`) to each element of the dataset.

After normalization:

```
tf.Tensor(
[[[0.14509805 0.14509805 0.11372549]
  [0.14509805 0.14509805 0.11372549]
  [0.14621848 0.14621848 0.11484594]
  ...
  ```

  As such we divide each pixel value by 255, bringing the values into the range [0, 1]. This improves consistecy and helps the model to learn faster (faster convergence).

# Building the model

Here we use a modified base model as desribed in the [Keras Image classification Tutorial](https://www.tensorflow.org/tutorials/images/classification).<br>
To run a gridsearch later on, we create a function to build models.

In [9]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

def get_model(img_size=[224,224],
              num_classes=5,
              num_filters=[16, 32, 64],
              pooling_sizes=[2, 2, 2],
              activations=['relu', 'relu', 'relu'],
              dense_units=[128],
              dense_activations=['relu'],
              optimizer='adam',
              loss='categorical_crossentropy'):

    model = Sequential()

    for filters, pooling_size, activation in zip(num_filters, pooling_sizes, activations):
        model.add(Conv2D(filters, 3, padding='same', activation=activation, input_shape=(img_size[0], img_size[1], 3)))
        model.add(MaxPooling2D(pool_size=pooling_size))

    model.add(Flatten())

    for units, activation in zip(dense_units, dense_activations):
        model.add(Dense(units, activation=activation))

    model.add(Dense(num_classes, activation='softmax'))

    model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

    return model

model = get_model()
model.summary()

# TODO from_logits=True on loss ?


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 224, 224, 16)      448       
                                                                 
 max_pooling2d (MaxPooling2  (None, 112, 112, 16)      0         
 D)                                                              
                                                                 
 conv2d_1 (Conv2D)           (None, 112, 112, 32)      4640      
                                                                 
 max_pooling2d_1 (MaxPoolin  (None, 56, 56, 32)        0         
 g2D)                                                            
                                                                 
 conv2d_2 (Conv2D)           (None, 56, 56, 64)        18496     
                                                                 
 max_pooling2d_2 (MaxPoolin  (None, 28, 28, 64)        0

The total and trainable parameters look reasonable, and the model is trainable with ~6.4 million parameters.

# Training the model

Now we'll call our `get_model()` to compare Adam and SGD.

We use AUTOTUNE to improve performance of the lil 2,2 GHz Quad-Core Intel Core i7 who's doing all the hard work.

In [10]:
from tensorflow.data import AUTOTUNE # For performance

train_ds_normalized = train_ds_normalized.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds_normalized = val_ds_normalized.cache().prefetch(buffer_size=AUTOTUNE)

In [11]:
from tqdm.keras import TqdmCallback

def train(model, epochs=10):
    tqdm_callback = TqdmCallback()

    history = model.fit(
      train_ds_normalized,
      validation_data=val_ds_normalized,
      epochs=epochs,
      callbacks=[tqdm_callback]
    )

    return history

In [12]:
adam20 = train(get_model(), epochs=20) # Same as model above (Adam)
sgd20 = train(get_model(optimizer='sgd'), epochs=20) # Stochastic Gradient Descent

0epoch [00:00, ?epoch/s]

0batch [00:00, ?batch/s]

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


0epoch [00:00, ?epoch/s]

0batch [00:00, ?batch/s]

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [19]:
import plotly.graph_objects as go

def plot_training(*histories, title="None", names=None):
    fig = go.Figure()
    for i, history in enumerate(histories):
        acc = history.history['accuracy']
        val_acc = history.history['val_accuracy']
        loss = history.history['loss']
        val_loss = history.history['val_loss']
        epochs_range = list(range(1, len(acc) + 1))

        fig.add_trace(go.Scatter(x=epochs_range, y=acc, mode='lines', name=f'{names[i]} Training Accuracy'))
        fig.add_trace(go.Scatter(x=epochs_range, y=val_acc, mode='lines', name=f'{names[i]} Validation Accuracy'))

    fig.update_layout(title=title,
                      xaxis_title='Epochs',
                      yaxis_title='Accuracy')

    fig.show()

plot_training(adam20, sgd20, title='Train & Val Accuracy Comparison (Batch Size=128)', names=['Adam Optimizer', 'SGD Optimizer'])

Adam overfits as shit. SGD seems very stable.

We see some improvements in Training Accuracy, however the Validation Acc struggles. We should try to implement some regularizations.

In [22]:
from tensorflow.keras.layers import BatchNormalization, Dropout

def get_model2(img_size=[224,224],
              num_classes=5,
              num_filters=[16, 32, 64],
              pooling_sizes=[2, 2, 2],
              activations=['relu', 'relu', 'relu'],
              dense_units=[128],
              dense_activations=['relu'],
              optimizer='adam',
              loss='categorical_crossentropy'):

    model = Sequential()

    for filters, pooling_size, activation in zip(num_filters, pooling_sizes, activations):
        model.add(Conv2D(filters, 3, padding='same', activation=activation, input_shape=(img_size[0], img_size[1], 3)))
        model.add(BatchNormalization()) # Add batch normalization
        model.add(MaxPooling2D(pool_size=pooling_size))

    model.add(Flatten())

    for units, activation in zip(dense_units, dense_activations):
        model.add(Dense(units, activation=activation))
    
    model.add(Dropout(0.5)) # Add dropout

    model.add(Dense(num_classes, activation='softmax'))

    model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

    return model

model = get_model2()
model.summary()

# TODO from_logits=True on loss ?


Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_13 (Conv2D)          (None, 224, 224, 16)      448       
                                                                 
 batch_normalization (Batch  (None, 224, 224, 16)      64        
 Normalization)                                                  
                                                                 
 max_pooling2d_12 (MaxPooli  (None, 112, 112, 16)      0         
 ng2D)                                                           
                                                                 
 conv2d_14 (Conv2D)          (None, 112, 112, 32)      4640      
                                                                 
 batch_normalization_1 (Bat  (None, 112, 112, 32)      128       
 chNormalization)                                                
                                                      

In [23]:
adam10_regularized = train(get_model2(), epochs=10)
sgd10_regularized = train(get_model2(optimizer='sgd'), epochs=10)

0epoch [00:00, ?epoch/s]

0batch [00:00, ?batch/s]

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


0epoch [00:00, ?epoch/s]

0batch [00:00, ?batch/s]

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [24]:
plot_training(adam10_regularized, sgd10_regularized, title='Train & Val Accuracy Comparison (Batch Size=128) w/ Dropout and BatchNorm', names=['Adam Optimizer', 'SGD Optimizer'])

That did not show any improvements. Worse accuracy and overfitting. Let's try a much smaller batch.

If this doesn't work - reduce regularizations and run again on whicever batch size gave best results.

In [25]:
small_train_ds, small_val_ds, small_test_ds, class_names = create_dataset(shuffle=True, random_seed=3, batch_size=16)

Creating dataset with batch size: 16
Found 4317 files belonging to 5 classes.
Using 3454 files for training.
Found 4317 files belonging to 5 classes.
Using 863 files for validation.
Moving 9 batch(es) from validation to test.
Using 144 files (14.29%) for test.

Labels successfully encoded.


In [26]:
small_train_ds_normalized = small_train_ds.map(normalize_image)
small_val_ds_normalized = small_val_ds.map(normalize_image)
small_test_ds_normalized = small_test_ds.map(normalize_image)

In [30]:
def train2(model, epochs=10):
    tqdm_callback = TqdmCallback()

    history = model.fit(
      small_train_ds_normalized,
      validation_data=small_val_ds_normalized,
      epochs=epochs,
      callbacks=[tqdm_callback]
    )

    return history # todo update train function instead of copy (just make train_ds input arg)

In [31]:
adam10_regularized = train2(get_model2(), epochs=10)
sgd10_regularized = train2(get_model2(optimizer='sgd'), epochs=10)

0epoch [00:00, ?epoch/s]

0batch [00:00, ?batch/s]

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


0epoch [00:00, ?epoch/s]

0batch [00:00, ?batch/s]

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [32]:
plot_training(adam10_regularized, sgd10_regularized, title='Train & Val Accuracy Comparison (Batch Size=16) w/ Dropout and BatchNorm', names=['Adam Optimizer', 'SGD Optimizer'])

This time Adam underfits and SGD *kind of* converges validation early on, causing and overfit. Interesting.
Validation acc of 0.6 is still... ok? After 10 epochs. I think, maybe try to regularize SGD harder, and run it again for like 30 epochs on the same batch size?