# One-shot Siamese Neural Network
In this assignment we were tasked with creating a Convolutional Neural Networks (CNNs). 
A step-by-step CNNs tutorial you can find [here (DeepLearning.ai)](https://www.youtube.com/playlist?list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF).

The assignment is to be carried out using Python V3.6 (or higher) and TensorFlow 2.0 (or higher).

In this assignment, we were tasked with creating a One-shot Siamese Neural Network, using TensorFlow 2.0, based on the work presented by Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov.
As specified, we used the “Labeled Faces in the Wild” dataset with over 5,700 different people. Some people have a single image, while others have dozens. We used, as requested, the improved dataset where the faces were aligned automatically using specialized software.

# Table of Contents
1. [Authors](#Authors)
2. [Purposes of The Assignment](#Purposes-of-The-Assignment)
3. [Instructions](#Instructions)
4. [Dataset Analysis](#Dataset-Analysis)
5. [Code Design](#Code-Design)
6. [Architecture](#Architecture)
7. [Initialization](#Initialization)
8. [Stopping criteria:](#Stopping-criteria)
9. [Network Hyper-Parameters Tuning:](#Network-Hyper-Parameters-Tuning)
10. [Full Experimental Setup](#Full-Experimental-Setup)
11. [Experimental Results](#Experimental-Results)

## Authors
* **Tomer Shahar** - [Tomer Shahar](https://github.com/Tomer-Shahar)
* **Nevo Itzhak** - [Nevo Itzhak](https://github.com/nevoit)

## Purposes of The Assignment 
Enabling students to experiment with building a convolutional neural net and using it on a real-world dataset  and problem.
In addition to practical knowledge in the “how to” of building the network, an additional goal is the integration of useful logging tools for gaining better insight into the training process. Finally, the students are expected to read, understand and (loosely) implement a scientific paper.

In this assignment, you will use convolutional neural networks (CNNs) to carry out the task of facial recognition. As shown in class, CNNs are the current state-of-the-art approach for analyzing image-based datasets. More specifically, you will implement a one-shot classification solution. Wikipedia defines one-shot learning as follows: 
“… an object categorization problem, found mostly in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of samples/images and very large datasets, one-shot learning aims to learn information about object categories from one, or only a few, training samples/images.”

Your work will be based on the paper [Siamese Neural Networks for One-shot Image Recognition](https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf).
Your goal, like that of the paper, is to successfully execute a one-shot learning task for previously unseen objects. Given two facial images of previously unseen persons, your architecture will have to successfully determine whether they are the same person. While we encourage you to use the architecture described in this paper as a starting point, you are more than welcome to explore other possibilities.

## Instructions
-	Use the following dataset - [Labeled Faces in the Wild](http://vis-www.cs.umass.edu/lfw/index.html)

(a) Download the dataset. Note: there are several versions of this dataset, use the version [found here](https://talhassner.github.io/home/projects/lfwa/index.html) (it’s called LFW-a, and is also used in the DeepFace paper).

(b)	Use the following train and test sets to train your model: [Train](http://vis-www.cs.umass.edu/lfw/pairsDevTrain.txt) \ [Test](http://vis-www.cs.umass.edu/lfw/pairsDevTest.txt). [Remember - you will use your test set to perform one-shot learning. This division is set up so that no subject from test set is included in the train set]. Please note it is often a recommended to use a validation set when training your model. Make your own decision whether to use one and what percentage of (training) samples to allocate.

(c) In your report, include an analysis of the dataset (size, number of examples – in total and per class – for the train and test sets, etc). Also provide the full experimental setup you used – batch sizes, the various parameters of your architecture, stopping criteria and any other relevant information. A good rule of thumb: if asked to recreate your work, a person should be able to do so based on the information you provide in your report.

- Implement a Siamese network architecture while using the above-mentioned paper as a reference.

(a) Provide a complete description of your architecture: number of layers, dimensions, filters etc. Make sure to mention parameters such as learning rates, optimization and regularization methods, and the use (if exists) of batchnorm.

(b) Explain the reasoning behind the choices made in answer to the previous section. If your choices were the result of trial and error, please state the fact and describe the changes made throughout your experiments. Choosing certain parameter combination because they appeared in a previously published paper is a perfectly valid reason. 

- In addition to the details requested above, your report needs to include an analysis of your architecture’s performance. Please include the following information:

(a) Convergence times, final loss and accuracy on the test set and holdout set

(b) Graphs describing the loss on the training set throughout the training process

(c) Performance when experimenting with the various parameters

(d) Please include examples of accurate and misclassifications and try to determine why your model was not successful.

(e) Any other information you consider relevant or found useful while training the model

Please note the that report needs to reflect your decision-making process throughout the assignment. Please include all relevant information.

- Please note that your work will not be evaluated solely on performance, but also on additional elements such as code correctness and documentation, a complete and clear documentation of your experimental process, analysis of your results and breadth and originality (where applicable).

![Figure 1 - Siamese network for facial recognition](https://github.com/nevoit/Siamese-Neural-Networks-for-One-shot-Image-Recognition/blob/master/figures/figure%201%20explanation.png?raw=true "Figure 1 - Siamese network for facial recognition")
Figure 1 - Siamese network for facial recognition

## Dataset Analysis
- Size: 5,749  people
- Number of examples –
    - Total:13,233 images. Some people have a single image and some have dozens.
    - Class 1 (identical): 1,100 pairs (in the training file) and 500 pairs (in the testing file)
    - Class 0 (non-identical): 1,100 pairs (in the training file) and 500 pairs (in the testing file)
    - Validation set: 20 percent of the training set - 440 pairs. (leaving 1760 pairs for the actual training data)

## Code Design
Our code consists of three scripts:
1. Data_loader.py - contains the DataLoader class that loads image data, manipulates it, and writes it into a specified path in a certain format.
2. Siamese_network.py - contains the SiameseNetwork class that is our implementation of the network described in the paper. It includes many functions including one that builds the CNN used in the network and a function for finding the optimal hyperparameters.
3. Experiments.py - The script that is actually running. It calls the train and predict methods from SiameseNetwork.

## Architecture
We mostly followed the architecture specified in the paper - The network is two Convolutional Neural Networks that are joined towards the end creating a Siamese network. However, our network is slightly smaller.
Number of layers: Each CNN, before being conjoined, has 5 layers (4 conventional and 1 fully connected layer). Then there is a distance layer, combining both CNNs, with a single output.
Dimensions: For the CNN part:

| Layer | Input Size | Filters | Kernel | Maxpooling | Activation Function|
|---| --- | --- | --- |  ---  | --- |
|  1 | 105x105. Reshaped from 250x250 to adhere to the paper. | 64 | 10x10 | Yes, Stride of 2 |ReLU
| 2 | 64 filters of 10x10 | 128 | 7x7 | Yes, Stride of 2 | ReLU |
| 3 | 128 filters of 7x7 | 128 | 4x4 | Yes, Stride of 2 | ReLU |
| 4 | 128 filters of 4x4 | 256 | 4x4 | No | ReLU |
| 5 | 4096x1 Fully connected feature layer with drop out rate of 0.4 (Fraction of the input units to drop) | - | - |No | Sigmoid|
- There are two identical CNNs as described in the table.
- All CNN layers, except the last one (the fully connected layer), are defined with a fixed stride of 1 (as in the paper), padding value of ‘valid’ (with no zero paddings, the kernel is restricted to traverse only within the image), L2 as kernel regularizer with regularization factor of 2e-4 and perform batch normalization.
- For the last one (the fully connected layer), we used L2 as a kernel regularizer with a regularization factor of 2e-3.
- Please note Any and all parameters that were not mentioned used the default Tensorflow 2.0.0 Keras values.
- After the last layer, we add a layer that connects both sides thus creating the Siamese network. The activation function of this layer is a Sigmoid function which is handy since we have 2 classes (Same vs Not the same person).
- Total params: 38,954,049
- Trainable params: 38,952,897
- Non-trainable params: 1,152

## Initialization
- Weight initialization for all edges was done as described in the paper: A normal distribution with a mean of 0 and a standard deviation of 0.01.
- Bias initialization was also done as it was in the paper, with a mean of 0.5 and a standard deviation of 0.01. However, the first layer has no bias. The paper doesn’t mention if they did this or not, but we found in this paper that occasionally, having no bias for the initial layer might be beneficial. This occurs when the layer is sufficiently large and the data is distributed fairly uniformly, which probably occurs in our case because the training set is predefined. Indeed, in our experiments adding a bias usually reduced the accuracy. Our final model doesn’t have a bias for the first layer.
- Note: the authors used a slightly different bias initialization for the fully connected layers.  Since there are so many edges, they sampled from a larger distribution. In our experiments, the same bias sampling as the rest of the network worked well so we used the same distribution.
- These are fairly typical methods of initializing weights and seemed to work well for the authors so we saw no reason to not imitate them (excluding the fully connected layer).

## Stopping Criteria
We used the EarlyStopping function monitoring on the validation loss with a minimum delta of 0.1 (Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.) and patience 5 (Number of epochs with no improvement after which training will be stopped.). The direction is automatically inferred from the name of the monitored quantity (‘auto’).

## Network Hyper-Parameters Tuning
NOTE: Here we explain the reasons behind the choices of the parameters.
After implementing our Siamese Network, we had to optimize the different parameters used. Many of them, such as layer size, were chosen based on the work in the paper and we decided against trying to find a better combination since the search space would be massive and we don’t know enough to narrow it down.
- Learning Rate: We tried many  different values, ranging from 0.1 to 0.00001. After running numerous experiments, we found 0.00005 to work the best.
- Optimizer: The paper didn’t specify so we used the robust and popular ADAM optimizer.
- Epochs: We tried epochs of 5, 10, 20 and 50. We found 10 to work the best.
- Batch size: We tried multiplications of 16 such as 16, 32 and 64. Our final model has a batch size of 32.

## Full Experimental Setup
Validation Set: Empirically, we learned that using a validation set is better than not if there isn’t enough data. We used the fairly standard 80/20 ratio between training and validation which worked well.
- Batch sizes - 32
- Epochs - 10
- Stopping criteria - 5 epochs without improvement.
- Learning rate: 0.0005
- Min delta for improvement: 0.1

##  Experimental Results
After implementing our Siamese Network, we ran it with many different settings as described above and chose the optimal settings. These are the results:
a.	Convergence times, final loss and accuracy on the test set and holdout set:
- Final Loss on Testing set - 3.106
- Accuracy on Holdout set - 0.734 (73.4%)
- Final Loss on Testing set - 3.111
- Accuracy on Testing set - 0.731 (73.2%)
- Convergence time - 30 seconds
- Prediction time - less than 1 second

b. Graphs describing the loss on the training set throughout the training process:

![loss-epoch](https://github.com/nevoit/Siamese-Neural-Networks-for-One-shot-Image-Recognition/blob/master/figures/loss-epoch.PNG?raw=true "loss-epoch")

Fig.2: Reduction in the loss function for each epoch. The validation set predicted the loss on the test set well.

![loss-acc](https://github.com/nevoit/Siamese-Neural-Networks-for-One-shot-Image-Recognition/blob/master/figures/acc-epoch.PNG?raw=true "loss-acc")

Fig.3: Accuracy of the training and validation sets for each epoch. For the validation set the accuracy plateaued after 2 epochs, but as you can see in fig.1 the loss continued to reduce explaining the increase in  accuracy for the test set for 2 more epochs.

c. Performance when experimenting with various parameters:
We used the best parameters and changed some of them to test their effect, seed 0, learning rate 0.00005, batch_size 32, epochs 10 patience of 5, and minimum delta of 0.1.

We tested the following learning rates: 0.000005, 0.00001, 0.00003 0.00005, 0.00007, 0.0001 and 0.001.

![lr-acc](https://github.com/nevoit/Siamese-Neural-Networks-for-One-shot-Image-Recognition/blob/master/figures/acc_lr.PNG?raw=true "lr-acc")

Fig.4: Here we can see that the learning rate around 0.0005 had similar results, but if it was too large or too small the results dropped drastically.

We tested the following epochs: 1, 2, 3, 5, 10, 15, 20, 30 epochs.

![ep-acc](https://github.com/nevoit/Siamese-Neural-Networks-for-One-shot-Image-Recognition/blob/master/figures/acc_epoch.PNG?raw=true "ep-acc")

Fig.5:# of epochs didn’t change the accuracy much past 2 epochs.

We tested the following batch sizes: 8, 16, 32, 48, 64.

![bs-acc](https://github.com/nevoit/Siamese-Neural-Networks-for-One-shot-Image-Recognition/blob/master/figures/acc_bs.PNG?raw=true "bs-acc")

Fig.6: Curiously, batch size distributes normally around 32 for the test set and is wildly different for the validation set.

d. Please include examples of accurate classifications and misclassifications and try to determine why your model was not successful.

Best correct classification:
Same person (prob: 0.9855): Unsurprising, as the images really are very similar.

![same](https://github.com/nevoit/Siamese-Neural-Networks-for-One-shot-Image-Recognition/blob/master/figures/gordon_campbell.PNG?raw=true "same")

Different people (prob: 0.0000379): It is quite clear that it’s two different people. Nothing too interesting here - The colors and facial expressions are very different.

![babe](https://github.com/nevoit/Siamese-Neural-Networks-for-One-shot-Image-Recognition/blob/master/figures/babe_ruth_joshua_perper.PNG?raw=true "babe")

Worst Misclassification:
Same person (prob:0.0587): Even though both are the same person, the images are radically different - In the left image, Candice is wearing sunglasses, has bright hair and is looking the other way. On the right, she has dark hair, no sunglasses and has her teeth showing. We theorize that most people would classify this wrong as well.

![Candice_Bergen](https://github.com/nevoit/Siamese-Neural-Networks-for-One-shot-Image-Recognition/blob/master/figures/candice_bergen.PNG?raw=true "Candice_Bergen")

Different people (prob: 0.9464): This is quite surprising since it’s quite apparent that this is not the same person, but the network had such high confidence that they are. Perhaps the coat resembles the hair lapping around her head.

![lisa](https://github.com/nevoit/Siamese-Neural-Networks-for-One-shot-Image-Recognition/blob/master/figures/lisa_murkowski_svetlana_belousova.PNG?raw=true "lisa")

e. Any other information you consider relevant or found useful while training the model
- We used K.clear_session() in order to make sure we are in a new session in each combination in the experiment (We imported consider K as tensorflow.keras.backend).
- We initialized the seeds using these lines:

`os.environ['PYTHONHASHSEED'] = str(self.seed)`

`random.seed(self.seed)`

`np.random.seed(self.seed)`

`tf.random.set_seed(self.seed)`


### Pip install

In [None]:
!pip install tensorflow
!pip install pillow
!pip install tqdm
!pip install keras



In [None]:
import os
import pickle

import tqdm
from PIL import Image, ImageOps
import numpy as np


class DataLoader(object):
    """
    Class for loading data from image files
    """
    def __init__(self, width, height, cells, data_path, output_path):
        """
        Proper width and height for each image.
        """
        self.width = width
        self.height = height
        self.cells = cells
        self.data_path = data_path
        self.output_path = output_path

    def _open_image(self, path):
        """
        Using the Image library we open the image in the given path. The path must lead to a .jpg file.
        We then resize it to 105x105 like in the paper (the dataset contains 250x250 images.)

        Returns the image as a numpy array.
        """
        image = Image.open(path)
        image = ImageOps.grayscale(image)
        image = image.resize((self.width, self.height))
        data = np.asarray(image)
        data = np.array(data, dtype='float64')
        return data

    def convert_image_to_array(self, stone, image_num, data_path, predict=False):
        """
        Given a person, image number and datapath, returns a numpy array which represents the image.
        predict - whether this function is called during training or testing. If called when training, we must reshape
        the images since the given dataset is not in the correct dimensions.
        """
        max_zeros = 4
        # image_num = '0' * max_zeros + image_num
        # image_num = image_num[-max_zeros:]
        image_path = os.path.join(data_path, 'stone_samples', stone, f'{image_num}.png')
        image_data = self._open_image(image_path)
        if not predict:
            image_data = image_data.reshape(self.width, self.height, self.cells)
        return image_data

    def load(self, set_name):
        """
        Writes into the given output_path the images from the data_path.
        dataset_type = train or test
        """
        file_path = os.path.join(self.data_path, f'{set_name}.txt')
        print(file_path)
        print('Loading dataset...')
        x_first = []
        x_second = []
        y = []
        names = []
        with open(file_path, 'r') as file:
            lines = file.readlines()
        for line in tqdm.tqdm(lines):
            line = line.split()
            if len(line) == 4:  # Class 0 - non-identical
                names.append(line)
                first_stone_name, first_image_num, second_stone_name, second_image_num = line[0], line[1], line[2], \
                                                                                           line[3]
                first_image = self.convert_image_to_array(stone=first_stone_name, image_num=first_image_num, data_path=self.data_path)
                second_image = self.convert_image_to_array(stone=second_stone_name,
                                                           image_num=second_image_num,
                                                           data_path=self.data_path)
                x_first.append(first_image)
                x_second.append(second_image)
                y.append(0)
            elif len(line) == 3:  # Class 1 - identical
                names.append(line)
                stone_name, first_image_num, second_image_num = line[0], line[1], line[2]
                first_image = self.convert_image_to_array(stone=stone_name,
                                                          image_num=first_image_num,
                                                          data_path=self.data_path)
                second_image = self.convert_image_to_array(stone=stone_name,
                                                           image_num=second_image_num,
                                                           data_path=self.data_path)
                x_first.append(first_image)
                x_second.append(second_image)
                y.append(1)
            elif len(line) == 1:
                print(f'line with a single value: {line}')
        print('Done loading dataset')
        with open(self.output_path, 'wb') as f:
            pickle.dump([[x_first, x_second], y, names], f)


print("Loaded data loader")


Loaded data loader


In [None]:
import os
import pickle
import random

import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras import Input, Sequential, Model
from tensorflow.keras import backend as K
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Lambda, BatchNormalization, Activation, \
    Dropout
from tensorflow.keras.regularizers import l2


class SiameseNetwork(object):
    def __init__(self, seed, width, height, cells, loss, metrics, optimizer, dropout_rate):
        """
        Seed - The seed used to initialize the weights
        width, height, cells - used for defining the tensors used for the input images
        loss, metrics, optimizer, dropout_rate - settings used for compiling the siamese model (e.g., 'Accuracy' and 'ADAM)
        """
        K.clear_session()
        self.load_file = None
        self.seed = seed
        self.initialize_seed()
        self.optimizer = optimizer

        # Define the matrices for the input images
        input_shape = (width, height, cells)
        left_input = Input(input_shape)
        right_input = Input(input_shape)

        # Get the CNN architecture as presented in the paper (read the readme for more information)
        model = self._get_architecture(input_shape)
        encoded_l = model(left_input)
        encoded_r = model(right_input)

        # Add a layer to combine the two CNNs
        L1_layer = Lambda(lambda tensors: K.abs(tensors[0] - tensors[1]))
        L1_siamese_dist = L1_layer([encoded_l, encoded_r])
        L1_siamese_dist = Dropout(dropout_rate)(L1_siamese_dist)

        # An output layer with Sigmoid activation function
        prediction = Dense(1, activation='sigmoid', bias_initializer=self.initialize_bias)(L1_siamese_dist)

        siamese_net = Model(inputs=[left_input, right_input], outputs=prediction)
        self.siamese_net = siamese_net
        self.siamese_net.compile(loss=loss, optimizer=optimizer, metrics=metrics)

    def initialize_seed(self):
        """
        Initialize seed all for environment
        """
        os.environ['PYTHONHASHSEED'] = str(self.seed)
        random.seed(self.seed)
        np.random.seed(self.seed)
        tf.random.set_seed(self.seed)

    def initialize_weights(self, shape, dtype=None):
        """
        Called when initializing the weights of the siamese model, uses the random_normal function of keras to return a
        tensor with a normal distribution of weights.
        """
        return K.random_normal(shape, mean=0.0, stddev=0.01, dtype=dtype, seed=self.seed)

    def initialize_bias(self, shape, dtype=None):
        """
        Called when initializing the biases of the siamese model, uses the random_normal function of keras to return a
        tensor with a normal distribution of weights.
        """
        return K.random_normal(shape, mean=0.5, stddev=0.01, dtype=dtype, seed=self.seed)

    def _get_architecture(self, input_shape):
        """
        Returns a Convolutional Neural Network based on the input shape given of the images. This is the CNN network
        that is used inside the siamese model. Uses parameters from the siamese one shot paper.
        """
        model = Sequential()
        model.add(
            Conv2D(filters=64,
                   kernel_size=(10, 10),
                   input_shape=input_shape,
                   kernel_initializer=self.initialize_weights,
                   kernel_regularizer=l2(2e-4),
                   name='Conv1'
                   ))
        model.add(BatchNormalization())
        model.add(Activation("relu"))
        model.add(MaxPooling2D())

        model.add(
            Conv2D(filters=128,
                   kernel_size=(7, 7),
                   kernel_initializer=self.initialize_weights,
                   bias_initializer=self.initialize_bias,
                   kernel_regularizer=l2(2e-4),
                   name='Conv2'
                   ))
        model.add(BatchNormalization())
        model.add(Activation("relu"))
        model.add(MaxPooling2D())

        model.add(
            Conv2D(filters=128,
                   kernel_size=(4, 4),
                   kernel_initializer=self.initialize_weights,
                   bias_initializer=self.initialize_bias,
                   kernel_regularizer=l2(2e-4),
                   name='Conv3'
                   ))
        model.add(BatchNormalization())
        model.add(Activation("relu"))
        model.add(MaxPooling2D())

        model.add(
            Conv2D(filters=256,
                   kernel_size=(4, 4),
                   kernel_initializer=self.initialize_weights,
                   bias_initializer=self.initialize_bias,
                   kernel_regularizer=l2(2e-4),
                   name='Conv4'
                   ))
        model.add(BatchNormalization())
        model.add(Activation("relu"))

        model.add(Flatten())
        model.add(
            Dense(4096,
                  activation='sigmoid',
                  kernel_initializer=self.initialize_weights,
                  kernel_regularizer=l2(2e-3),
                  bias_initializer=self.initialize_bias))
        return model

    def _load_weights(self, weights_file):
        """
        A function that attempts to load pre-existing weight files for the siamese model. If it succeeds then returns
        True and updates the weights, otherwise False.
        :return True if the file is already exists
        """
        # self.siamese_net.summary()
        self.load_file = weights_file
        if os.path.exists(weights_file):  # if the file is already exists, load and return true
            print('Loading pre-existed weights file')
            self.siamese_net.load_weights(weights_file)
            return True
        return False

    def fit(self, weights_file, train_path, validation_size, batch_size, epochs, early_stopping, patience, min_delta):
        """
        Function for fitting the model. If the weights already exist, just return the summary of the model. Otherwise,
        perform a whole train/validation/test split and train the model with the given parameters.
        """
        with open(train_path, 'rb') as f:
            x_train, y_train, names = pickle.load(f)
        """
        X_train[0]:  |----------x_train_0---------------------------|-------x_val_0--------|
        X_train[1]:  |----------x_train_1---------------------------|-------x_val_1--------|
        y_train:     |----------y_train_0 = y_train_1---------------|----y_val_0=y_val_1---|
        """
        x_train_0, x_val_0, y_train_0, y_val_0 = train_test_split(x_train[0], y_train,
                                                                  test_size=validation_size,
                                                                  random_state=self.seed)
        x_train_1, x_val_1, y_train_1, y_val_1 = train_test_split(x_train[1], y_train,
                                                                  test_size=validation_size,
                                                                  random_state=self.seed)
        x_train_0 = np.array(x_train_0, dtype='float64')
        x_val_0 = np.array(x_val_0, dtype='float64')
        x_train_1 = np.array(x_train_1, dtype='float64')
        x_val_1 = np.array(x_val_1, dtype='float64')
        x_train = [x_train_0, x_train_1]
        x_val = [x_val_0, x_val_1]
        if y_train_0 != y_train_1 and y_val_0 != y_val_1:
            raise Exception("y train lists or y validation list do not equal")
        y_train_both = np.array(y_train_0, dtype='float64')
        y_val_both = np.array(y_val_0, dtype='float64')
        if not self._load_weights(weights_file=weights_file):
            print('No such pre-existed weights file')
            print('Beginning to fit the model')
            callback = []
            if early_stopping:
                """
                We used the EarlyStopping function monitoring on the validation loss with a minimum delta of 0.1
                (Minimum change in the monitored quantity to qualify as an improvement, i.e.
                an absolute change of less than min_delta, will count as no improvement.) and patience 5 
                (Number of epochs with no improvement after which training will be stopped.).
                The direction is automatically inferred from the name of the monitored quantity (‘auto’).
                """
                es = EarlyStopping(monitor='val_loss', min_delta=min_delta, patience=patience, mode='auto', verbose=1)
                callback.append(es)
            self.siamese_net.fit(x_train, y_train_both, batch_size=batch_size, epochs=epochs,
                                 validation_data=(x_val, y_val_both), callbacks=callback, verbose=1)
            self.siamese_net.save_weights(self.load_file)
        # evaluate on the testing set
        loss, accuracy = self.siamese_net.evaluate(x_val, y_val_both, batch_size=batch_size)
        print(f'Loss on Validation set: {loss}')
        print(f'Accuracy on Validation set: {accuracy}')

    def evaluate(self, test_file, batch_size, analyze=False):
        """
        Function for evaluating the final model after training.
        test_file - file path to the test file.
        batch_size - the batch size used in training.

        Returns the loss and accuracy results.
        """
        with open(test_file, 'rb') as f:
            x_test, y_test, names = pickle.load(f)
        print(f'Available Metrics: {self.siamese_net.metrics_names}')
        y_test = np.array(y_test, dtype='float64')
        x_test[0] = np.array(x_test[0], dtype='float64')
        x_test[1] = np.array(x_test[1], dtype='float64')
        # evaluate on the test set
        loss, accuracy = self.siamese_net.evaluate(x_test, y_test, batch_size=batch_size)
        if analyze:
            self._analyze(x_test, y_test, names)
        return loss, accuracy

    def _analyze(self, x_test, y_test, names):
        """
        Function used for evaluating our network in the methods proposed in the assignment.
        We will find:
        - The person who has 2 images that are the most dissimilar to each other
        - The person with the two images that are the most similar to each other
        - Two people with the most dissimilar images, and
        - The two people with the most similar images.
        """
        best_class_0_prob = 1  # correct classification for different people, y=0, prediction->0
        best_class_0_name = None
        worst_class_0_prob = 0  # misclassification for different people, y=0, prediction->1
        worst_class_0_name = None
        best_class_1_prob = 0  # correct classification for same people, y=1, prediction->1
        best_class_1_name = None
        worst_class_1_prob = 1  # misclassification for same people, y=1, prediction->0
        worst_class_1_name = None
        prob = self.siamese_net.predict(x_test)
        for pair_index in range(len(names)):
            name = names[pair_index]
            y_pair = y_test[pair_index]
            pair_prob = prob[pair_index][0]
            if y_pair == 0:  # different people (actual)
                if pair_prob < best_class_0_prob:  # correct classification for different people, y=0, prediction->0
                    best_class_0_prob = pair_prob
                    best_class_0_name = name
                if pair_prob > worst_class_0_prob:  # misclassification for different people, y=0, prediction->1
                    worst_class_0_prob = pair_prob
                    worst_class_0_name = name
            else:  # the same person (actual)
                if pair_prob > best_class_1_prob:  # correct classification for same people, y=1, prediction->1
                    best_class_1_prob = pair_prob
                    best_class_1_name = name
                if pair_prob < worst_class_1_prob:  # misclassification for same people, y=1, prediction->0
                    worst_class_1_prob = pair_prob
                    worst_class_1_name = name

        print(f'correct classification for different people, y=0, prediction->0, name: {best_class_0_name} | prob: {best_class_0_prob}')
        print(f'misclassification for different people, y=0, prediction->1, name: {worst_class_0_name} | prob: {worst_class_0_prob}')
        print(f'correct classification for same people, y=1, prediction->1, name: {best_class_1_name} | prob: {best_class_1_prob}')
        print(f'misclassification for same people, y=1, prediction->0, name: {worst_class_1_name} | prob: {worst_class_1_prob}')


    def predict_stone_class(self, image):
        """
        Function for predicting the stone class of an image.
        image - the image to predict the stone class of.

        Returns the predicted stone class.
        """
        # Prepraing the litho image for the network
        image = np.array(image, dtype='float64')
        image = image.reshape(1, image.shape[0], image.shape[1], image.shape[2])
        # Make de set of comparison with the legend images
        for pattern in legend_patterns:
            prediction = self.siamese_net.predict(image)...
        # Keep the legend images with the highest probability
        return prediction
    

print("Loaded Siamese Network")


Loaded Siamese Network


In [None]:
import os
import random
import time

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.optimizers import Adam

path_separator = os.path.sep
# Environment settings
IS_KAGGLE = False
print("IS_KAGGLE: ", IS_KAGGLE)
LOAD_DATA = True
IS_EXPERIMENT = False
train_name = 'test'
test_name = 'train'
WIDTH = HEIGHT = 124
CEELS = 1
loss_type = "binary_crossentropy"
validation_size = 0.2
early_stopping = True

if IS_KAGGLE:
    # the google drive folder we used
    data_path = '..' + os.path.sep + os.path.join('input', 'faces2')
    output_path = './'
else:
    output_path = './data/'
    data_path = './data/'
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'


def run_combination(l, bs, ep, pat, md, seed, train_path, test_path):
    """
    This function gets the parameters and run the experiment.
    :return: loss - loss on the testing set, accuracy - accuracy on the testing set
    """
    # file types
    model_save_type = 'h5'
    # files paths
    initialize_seed(seed)
    parameters_name = f'seed_{seed}_lr_{l}_bs_{bs}_ep_{ep}_val_{validation_size}_' \
                      f'es_{early_stopping}_pa_{pat}_md_{md}'
    print(f'Running combination with {parameters_name}')
    # A path for the weights
    load_weights_folder = os.path.join(output_path, 'weights')
    if not os.path.exists(load_weights_folder):
        os.mkdir(load_weights_folder)
    load_weights_path = os.path.join(load_weights_folder, f'weights_{parameters_name}.{model_save_type}')

    siamese = SiameseNetwork(seed=seed, width=WIDTH, height=HEIGHT, cells=CEELS, loss=loss_type, metrics=['accuracy'],
                             optimizer=Adam(lr=l), dropout_rate=0.4)
    siamese.fit(weights_file=load_weights_path, train_path=train_path, validation_size=validation_size,
                batch_size=bs, epochs=ep, early_stopping=early_stopping, patience=pat,
                min_delta=md)
    loss, accuracy = siamese.evaluate(test_file=test_path, batch_size=bs, analyze=True)
    print(f'Loss on Testing set: {loss}')
    print(f'Accuracy on Testing set: {accuracy}')
    # predict_pairs(model)
    return loss, accuracy


def run():
    """
    The main function that runs the training and experiments. Uses the global variables above.
    """
    # file types
    data_set_save_type = 'pickle'
    train_path = os.path.join(output_path, f'{train_name}.{data_set_save_type}')  # A path for the train file
    test_path = os.path.join(output_path, f'{test_name}.{data_set_save_type}')  # A path for the test file
    if LOAD_DATA:  # If the training data already exists
        loader = DataLoader(width=WIDTH, height=HEIGHT, cells=CEELS, data_path=data_path, output_path=train_path)
        loader.load(set_name=train_name)
        loader = DataLoader(width=WIDTH, height=HEIGHT, cells=CEELS, data_path=data_path, output_path=test_path)
        loader.load(set_name=test_name)

    result_path = os.path.join(output_path, f'results.csv')  # A path for the train file
    results = {'lr': [], 'batch_size': [], 'epochs': [], 'patience': [], 'min_delta': [], 'seed': [], 'loss': [],
               'accuracy': []}
    for l in lr:
        for bs in batch_size:
            for ep in epochs:
                for pat in patience:
                    for md in min_delta:
                        for seed in seeds:
                            loss, accuracy = run_combination(l=l, bs=bs, ep=ep, pat=pat, md=md, seed=seed,
                                                             train_path=train_path, test_path=test_path)
                            results['lr'].append(l)
                            results['batch_size'].append(bs)
                            results['epochs'].append(ep)
                            results['patience'].append(pat)
                            results['min_delta'].append(md)
                            results['seed'].append(seed)
                            results['loss'].append(loss)
                            results['accuracy'].append(accuracy)
    df_results = pd.DataFrame.from_dict(results)
    df_results.to_csv(result_path)


def initialize_seed(seed):
    """
    Initialize all relevant environments with the seed.
    """
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)


if __name__ == '__main__':
    if IS_EXPERIMENT:
        # Experiments settings
        seeds = [0]
        lr = [0.00005]
        batch_size = [32]
        epochs = [10]
        patience = [5]
        min_delta = [0.1]
    else:
        # Final settings
        seeds = [0]
        lr = [0.00005]
        batch_size = [32]
        epochs = [10]
        patience = [5]
        min_delta = [0.1]

    print(os.name)
    start_time = time.time()
    print('Starting the experiments')
    run()
    print(f'Total Running Time: {time.time() - start_time}')


IS_KAGGLE:  False
posix
Starting the experiments
./data/test.txt
Loading dataset...


100%|██████████| 14/14 [00:00<00:00, 563.61it/s]


['15_3-4_SANDSTONE', '1', '2']
(124, 124)
(124, 124)
(124, 124)
(124, 124)
(124, 124)
(124, 124)
['15_3-4_CLAY', '1', '2']
(124, 124)
(124, 124)
['15_3-4_COAL_LIGNITE', '1', '5']
(124, 124)
(124, 124)
['15_3-4_LIMESTONE', '0', '1']
(124, 124)
(124, 124)
['15_3-4_MARL', '1', '5']
(124, 124)
(124, 124)
(124, 124)
(124, 124)
(124, 124)
(124, 124)
['15_3-4_SAND', '1', '7']
(124, 124)
(124, 124)
['15_3-4_SAND_Coarse', '6', '7']
(124, 124)
(124, 124)
(124, 124)
(124, 124)
(124, 124)
(124, 124)
['15_3-4_TUFF', '1', '477']
(124, 124)
(124, 124)
Done loading dataset
./data/train.txt
Loading dataset...


  0%|          | 0/14 [00:00<?, ?it/s]

['15_3-4_SANDSTONE', '1', '2']
(124, 124)
(124, 124)


100%|██████████| 14/14 [00:00<00:00, 584.86it/s]

(124, 124)
(124, 124)
(124, 124)
(124, 124)
['15_3-4_CLAY', '1', '2']
(124, 124)
(124, 124)
['15_3-4_COAL_LIGNITE', '1', '5']
(124, 124)
(124, 124)
['15_3-4_LIMESTONE', '0', '1']
(124, 124)
(124, 124)
['15_3-4_MARL', '1', '5']
(124, 124)
(124, 124)
(124, 124)
(124, 124)
(124, 124)
(124, 124)
['15_3-4_SAND', '1', '7']
(124, 124)
(124, 124)
['15_3-4_SAND_Coarse', '6', '7']
(124, 124)
(124, 124)
(124, 124)
(124, 124)
(124, 124)
(124, 124)
['15_3-4_TUFF', '1', '477']
(124, 124)
(124, 124)
Done loading dataset
Running combination with seed_0_lr_5e-05_bs_32_ep_10_val_0.2_es_True_pa_5_md_0.1



  super().__init__(name, **kwargs)


No such pre-existed weights file
Beginning to fit the model
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Loss on Validation set: 13.769503593444824
Accuracy on Validation set: 0.6666666865348816
Available Metrics: ['loss', 'accuracy']
correct classification for different people, y=0, prediction->0, name: ['15_3-4_SILTSTONE', '478', '15_3-4_SAND', '7'] | prob: 0.5458135604858398
misclassification for different people, y=0, prediction->1, name: ['15_3-4_ROCK_SALT', '474', '15_3-4_LIMESTONE', '1'] | prob: 0.608163595199585
correct classification for same people, y=1, prediction->1, name: ['15_3-4_TUFF', '1', '477'] | prob: 0.6439122557640076
misclassification for same people, y=1, prediction->0, name: ['15_3-4_COAL_LIGNITE', '1', '5'] | prob: 0.5811665654182434
Loss on Testing set: 13.734379768371582
Accuracy on Testing set: 0.5714285969734192
Total Running Time: 28.366968631744385
