# Assignment 2

***
## Question 2: Triplet networks & one-shot learning (10pt)

In practice 4b.4, we train a Siamese network for one-shot learning task on the Omniglot dataset.  In this assignment, we will work on the same data set with the same task but extend it to triplet networks, we will also compare our model performance under different triplet selection method. The assignment contains the following 4 tasks

### Import packages and mount data
Before everything, we need to import packages and mount data,
*HINT: you could use the dataset in practice 4b.4 directly*

In [0]:
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input, Conv2D, Lambda, Dense, Flatten, MaxPooling2D, Dropout,Concatenate, BatchNormalization
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import binary_crossentropy
import numpy as np
import os
import pickle
import matplotlib.pyplot as plt
from sklearn.utils import shuffle

from typing import Tuple, Dict, List

In [5]:
PATH = os.path.join("drive", "My Drive", "Università", "Deep Learning", "Practical 4","omniglot")

with open(os.path.join(PATH, "omniglot_train.p"), "rb") as f:
    (X_train, c_train) = pickle.load(f)

with open(os.path.join(PATH, "omniglot_test.p"), "rb") as f:
    (X_test, c_test) = pickle.load(f)

print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("")
print("training alphabets")
print([key for key in c_train.keys()])
print("test alphabets:")
print([key for key in c_test.keys()])

X_train shape: (964, 20, 105, 105)
X_test shape: (659, 20, 105, 105)

training alphabets
['Braille', 'Anglo-Saxon_Futhorc', 'Tifinagh', 'Grantha', 'Burmese_(Myanmar)', 'Mkhedruli_(Georgian)', 'Latin', 'Ojibwe_(Canadian_Aboriginal_Syllabics)', 'Balinese', 'Malay_(Jawi_-_Arabic)', 'Early_Aramaic', 'Korean', 'Japanese_(hiragana)', 'Armenian', 'Cyrillic', 'Hebrew', 'Syriac_(Estrangelo)', 'Japanese_(katakana)', 'Blackfoot_(Canadian_Aboriginal_Syllabics)', 'N_Ko', 'Alphabet_of_the_Magi', 'Inuktitut_(Canadian_Aboriginal_Syllabics)', 'Greek', 'Bengali', 'Tagalog', 'Futurama', 'Arcadian', 'Gujarati', 'Asomtavruli_(Georgian)', 'Sanskrit']
test alphabets:
['ULOG', 'Atemayar_Qelisayer', 'Ge_ez', 'Gurmukhi', 'Tengwar', 'Keble', 'Malayalam', 'Oriya', 'Kannada', 'Mongolian', 'Angelic', 'Atlantean', 'Syriac_(Serto)', 'Aurek-Besh', 'Avesta', 'Glagolitic', 'Sylheti', 'Tibetan', 'Manipuri', 'Old_Church_Slavonic_(Cyrillic)']


### Task 2.1: Build  the triplet network (3pt)

We will define a triplet Network for use with the Omniglot dataset. Each branch of the triplet  is a "convnet" model that transforms data to an embeddings space. 

*HINT: you may need "Concatenate" from keras.layer to merge the output layer*

In [155]:
# define a convnet model to transforms data to an embeddings space. 
# === COMPLETE CODE BELOW ===

arch_convnet = "practical" # available choices: ["hoffer", "schroff", "ours", "practical"]

if arch_convnet == "hoffer":
    # Hoffer: filter size {5,3,3,2}, and feature map dimensions {3,64,128,256,128}
    convnet = Sequential([
        Conv2D(3, (5, 5), strides=3, activation="relu", padding="same", input_shape=(105, 105, 1)),
        MaxPooling2D(),
        Conv2D(64, (3, 3), activation="relu", padding="same"),
        MaxPooling2D(),
        Conv2D(128, (3, 3), activation="relu"),
        MaxPooling2D(),
        Conv2D(256, (2, 2), activation="relu"),
        MaxPooling2D(),
        Flatten(),
        Dense(4096, activation="sigmoid")
    ])
elif arch_convnet == "schroff":
    convnet = Sequential([
        Conv2D(64, (1, 1), strides=1, activation="relu", input_shape=(105, 105, 1)),
        Conv2D(64, (3, 3), strides=1, activation="relu"),
        BatchNormalization(),
        MaxPooling2D(),
        
        Conv2D(192, (1, 1), strides=1, activation="relu"),
        Conv2D(192, (3, 3), strides=1, activation="relu"),
        MaxPooling2D(),
        
        Conv2D(384, (1, 1), strides=1, activation="relu"),
        Conv2D(384, (3, 3), strides=1, activation="relu"),
        
        Conv2D(256, (1, 1), strides=1, activation="relu"),
        Conv2D(256, (3, 3), strides=1, activation="relu"),
        
        Conv2D(256, (1, 1), strides=1, activation="relu"),
        Conv2D(256, (3, 3), strides=1, activation="relu"),
        
        MaxPooling2D(),
        Flatten(),
        Dense(4096, activation="sigmoid")
    ])
elif arch_convnet == "ours": # 68.4%, 66.4%
    convnet = Sequential([
        Conv2D(64, 10, activation="relu", input_shape=(105, 105, 1), kernel_regularizer=l2(2e-4)),
        MaxPooling2D(),
        BatchNormalization(),
        Dropout(0.25),
        Conv2D(128, 6, activation="relu", kernel_regularizer=l2(2e-4)),
        MaxPooling2D(),
        BatchNormalization(),
        Dropout(0.25),
        Conv2D(256, 4, activation="relu", kernel_regularizer=l2(2e-4)),
        MaxPooling2D(),
        BatchNormalization(),
        Dropout(0.25),
        Conv2D(256, 4, activation="relu", kernel_regularizer=l2(2e-4)),
        BatchNormalization(),
        Dropout(0.25),
        Flatten(),
        Dense(4096, activation="relu", kernel_regularizer=l2(2e-4))
    ])
else: # also arch_convnet == "practical" #72.4%, 77.4%
    convnet = Sequential([
        Conv2D(64, (10,10), activation='relu', input_shape=(105, 105, 1), kernel_regularizer=l2(2e-4)),
        MaxPooling2D(),
        BatchNormalization(),
        Dropout(0.25),
        Conv2D(128, (7,7), activation='relu', kernel_regularizer=l2(2e-4)),
        MaxPooling2D(),
        BatchNormalization(),
        Dropout(0.25),
        Conv2D(128, (4,4), activation='relu', kernel_regularizer=l2(2e-4)),
        MaxPooling2D(),
        BatchNormalization(),
        Dropout(0.25),
        Conv2D(256, (4,4), activation='relu', kernel_regularizer=l2(2e-4)),
        Flatten(),
        BatchNormalization(),
        Dropout(0.25),
        Dense(4096, activation="sigmoid", kernel_regularizer=l2(1e-3))
    ])  

convnet.summary()

Model: "sequential_39"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_220 (Conv2D)          (None, 96, 96, 64)        6464      
_________________________________________________________________
max_pooling2d_125 (MaxPoolin (None, 48, 48, 64)        0         
_________________________________________________________________
batch_normalization_115 (Bat (None, 48, 48, 64)        256       
_________________________________________________________________
dropout_133 (Dropout)        (None, 48, 48, 64)        0         
_________________________________________________________________
conv2d_221 (Conv2D)          (None, 42, 42, 128)       401536    
_________________________________________________________________
max_pooling2d_126 (MaxPoolin (None, 21, 21, 128)       0         
_________________________________________________________________
batch_normalization_116 (Bat (None, 21, 21, 128)     

In [156]:
# define a Triplet network

# The anchor, positive, negative image are merged together, as the input of the triplet network, then got split to get each one's neural codes.
generated = Input(shape=(3, 105, 105, 1), name='input')

anchor  = Lambda(lambda x: x[:,0])(generated)
pos     = Lambda(lambda x: x[:,1])(generated)
neg     = Lambda(lambda x: x[:,2])(generated)
                    

anchor_embedding    = convnet(anchor)
pos_embedding       = convnet(pos)
neg_embedding       = convnet(neg)  

# merge the anchor, positive, negative embedding together, 
# let the merged layer be the output of triplet network

# === COMPLETE CODE BELOW ===
merged_output = Concatenate()([anchor_embedding, pos_embedding, neg_embedding])

triplet_net = Model(inputs=generated, outputs=merged_output)
triplet_net.summary()

Model: "model_16"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input (InputLayer)              [(None, 3, 105, 105, 0                                            
__________________________________________________________________________________________________
lambda_48 (Lambda)              (None, 105, 105, 1)  0           input[0][0]                      
__________________________________________________________________________________________________
lambda_49 (Lambda)              (None, 105, 105, 1)  0           input[0][0]                      
__________________________________________________________________________________________________
lambda_50 (Lambda)              (None, 105, 105, 1)  0           input[0][0]                      
___________________________________________________________________________________________

### Task 2.2: Define triplet loss (2pt)

You can find the formula of the triplet loss function in our lecture note. When training our model, make sure the network achieves a smaller loss than the margin and the network does not collapse all representations to zero vectors. 

*HINT: If you experience problems to achieve this goal, it might be helpful to tinker the learning rate, you can also play with the margin value to get better performance*

In [0]:
# Notice that the ground truth variable is not used for loss calculation. 
# It is used as a function argument to by-pass some Keras functionality.
# This is because the network structure already implies the ground truth for the anchor image with the "positive" image.
import tensorflow as tf
def triplet_loss(ground_truth, network_output):

    anchor, positive, negative = tf.split(network_output, num_or_size_splits=3, axis=1)        
    
    
    # === COMPLETE CODE BELOW ===
    margin = 0.2 # as specified in Schroff
    anchor_negative = tf.linalg.norm(anchor - negative, axis=1) ** 2
    anchor_positive = tf.linalg.norm(anchor - positive, axis=1) ** 2
    loss = tf.maximum(
        anchor_positive - anchor_negative + margin,
        0
    )

    return tf.reduce_mean(loss)

triplet_net.compile(
    optimizer="adam",
    loss=triplet_loss
)

### Task 2.3: Select triplets for training (3pt)

#### Different  selection method

We have two different options for the triplet selection method, and we will compare the model performance under these two methods after building our model.

(1) Random  triplets selection, including the following steps:
* Pick one random class for anchor
* Pick two different random picture for this class, as the anchor and positive images
* Pick another class for Negative, different from anchor_class
* Pick one random picture from the negative class.

(2) Hard triplets selection. For easy implement, for a picked anchor, positive pair, we will choose the hardest negative to form a hard triplet, that means, after picking an anchor, positive image, we will choose the negative image which is nearest from anchor image from a negative class, ie: "- d(a,n)"  can get the maximum value. The whole process including the following steps:
* Pick one random class for anchor
* Pick two different random picture for this class, as an anchor and positive images
* Pick another class for negative, different from anchor_class
* Pick one hardest picture from the negative class.

*HINT: when picking the hardest negative, you may need the model.predict to get the embedding of images, the calculate the distances*

In [0]:
# Notice that the returned  1 * np.zeros(batch_size) is to by-pass some Keras functionality, corresponding to ground_truth in tripletloss
# We use a variable hard_selection to control which method we are going to use. If we set hard_selection == False, we will select triplets random,If we set the variable hard_selection == True, we will select hard triplets.

# === COMPLETE CODE BELOW === 
def get_batch(
    X: np.ndarray,
    batch_size: int = 64,
    hard_selection: bool = False,
    convnet: Model = None,
) -> Tuple[np.ndarray, np.ndarray]:
    
    while True:

        n_classes, n_examples, w, h = X.shape
        # initialize result
        triplets = np.zeros((batch_size, 3, w, h, 1))
        for i in range(batch_size):
            #Pick one random class for anchor
            anchor_class = np.random.randint(0, n_classes)

            #Pick two different random pics for this class => idx_A and idx_P
            [idx_A,idx_P] = np.random.choice(n_examples,size=2,replace=False)

            #Pick another class for negative, different from anchor_class
            # === COMPLETE CODE BELOW === 
            negative_class = anchor_class
            while negative_class == anchor_class:
                negative_class = np.random.randint(0, n_classes)

            if not hard_selection:
                #Pick a random pic from this negative class => N

                # === COMPLETE CODE BELOW ===   
                idx_N = np.random.randint(0, n_examples)

            else:
                #Pick a hardest pic from this negative class => N
                # === COMPLETE CODE BELOW ===   
                negative_embeddings = convnet(X_train[negative_class], training=False)
                anchor_embedding = convnet(X_train[anchor_class][[idx_A]], training=False)
                
                anchor_negative_distance = tf.reduce_sum((anchor_embedding - negative_embeddings) ** 2, axis=1)
                
                idx_N = tf.argmin(anchor_negative_distance)

            triplets[i][0] = X[anchor_class][idx_A].reshape(w, h, 1)
            triplets[i][1] = X[anchor_class][idx_P].reshape(w, h, 1)
            triplets[i][2] = X[negative_class][idx_N].reshape(w, h, 1)

        yield triplets, 1 * np.zeros(batch_size)

### Task 2.4: One-shot learning with different selection method (2pt)

Function "make_oneshot_task" that can randomly setup such a one-shot task from a given test set (if a language is specified, using only classes/characters from that language), i.e. it will generate N pairs of images, where the first image is always the test image, and the second image is one of the N reference images. The pair of images from the same class will have target 1, all other targets are 0.

The function "test_oneshot" will generate a number (k) of such one-shot tasks and evaluate the performance of a given model on these tasks; it reports the percentage of correctly classified test images

In "test_oneshot", you can use embeddings extracted from the triplet network with L2-distance to evaluate one-shot learning. i.e. for a given one-shot task, obtain embeddings for the test image as well as the support set. Then pick the image from the support set that is closest (in L2-distance) to the test image as your one-shot prediction.

*HINT you can re-use some code from practice 4b.4*

In [0]:
def make_oneshot_task(N, X, c, language=None):
    """Create pairs of (test image, support set image) with ground truth, for testing N-way one-shot learning."""
    n_classes, n_examples, w, h = X.shape
    indices = np.random.randint(0, n_examples, size=(N,))
    if language is not None:
        low, high = c[language]
        if N > high - low:
            raise ValueError("This language ({}) has less than {} letters".format(language, N))
        categories = np.random.choice(range(low,high), size=(N,), replace=False)
    else:  # if no language specified just pick a bunch of random letters
        categories = np.random.choice(range(n_classes), size=(N,), replace=False)            
    true_category = categories[0]
    ex1, ex2 = np.random.choice(n_examples, replace=False, size=(2,))
    test_image = np.asarray([X[true_category, ex1, :, :]]*N).reshape(N, w, h, 1)
    support_set = X[categories, indices, :, :]
    support_set[0, :, :] = X[true_category, ex2]
    support_set = support_set.reshape(N, w, h, 1)
    targets = np.zeros((N,))
    targets[0] = 1
    targets, test_image, support_set = shuffle(targets, test_image, support_set)
    pairs = [test_image, support_set]
    return pairs, targets

In [0]:
def test_oneshot(
    model: Model,
    X: np.ndarray,
    N: int,
    k: int,
    c: Dict[str, List[int]],
    verbose: bool = True
):
    # === COMPLETE CODE BELOW ===       
    if verbose:
        print(f"Evaluating model on {k} random {N}-way one-shot learning tasks...")
    
    n_correct = 0
    for i in range(k):
        inputs, targets = make_oneshot_task(N, X, c)
        test_embedding = model.predict(inputs[0][[0]]) # All first images in inputs are the same, so we can calculate just one
        support_embeddings = model.predict(inputs[1])
        
        # Calculate Euclidean distance of embeddings
        squared_l2_distances = np.sum((test_embedding - support_embeddings) ** 2, axis=1)
        if np.argmin(squared_l2_distances) == np.argmax(targets):
            n_correct += 1
    
    percent_correct = 100 * n_correct / k
    
    if verbose:
        print(f"Average accuracy of {percent_correct}% for {N}-way one-shot learning.")

    return percent_correct

With different triplets selecting method (random and hard), we will train our model and evaluate the model by one-shot learning accuracy.

* You need to explicitly state the accuracy under different  triplets selecting method
* When evaluating model with test_oneshot function, you should evaluate on 20 way one-shot task, and set the number (k) of evaluation one-shot tasks to be 250, then calculate the average accuracy

*HINT: After training our model with random selection method, before train model under hard triplets selection, we should re-build our model (re-run the cell in Task 2.1) to initialize our model and prevent re-use the trained model of random selection*

#### Evaluate one-shot learning with  random triplets selection

In [161]:
# hard_selection == False, selcet triplets randomly
# Train our model and evaluate the model by one-shot learning accuracy.
loops = 10
best_acc = 0
for i in range(loops):
    print("=== Training loop {} ===".format(i+1))
    # === ADD CODE HERE ===
    triplet_net.fit(
        get_batch(X_train, batch_size=64, hard_selection=False), steps_per_epoch=100, epochs=1
    )
    avg_accuracy = test_oneshot(convnet, X_test, 20, 250, c_test)
    best_acc = avg_accuracy if avg_accuracy > best_acc else best_acc
print(f"Best accuracy: {best_acc}%")

=== Training loop 1 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 45.2% for 20-way one-shot learning.
=== Training loop 2 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 66.4% for 20-way one-shot learning.
=== Training loop 3 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 68.4% for 20-way one-shot learning.
=== Training loop 4 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 66.0% for 20-way one-shot learning.
=== Training loop 5 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 64.0% for 20-way one-shot learning.
=== Training loop 6 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 70.8% for 20-way one-shot learning.
=== Training loop 7 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 66.8% for 20-way one-

#### Evaluate one-shot learning with  hard triplets selection

In [153]:
# hard_selection == True, selcet hard triplets
# Train our model and evaluate the model by one-shot learning accuracy.
loops = 10
best_acc = 0
for i in range(loops):
    print("=== Training loop {} ===".format(i+1))
    # === ADD CODE HERE ===
    triplet_net.fit(
        get_batch(X_train, batch_size=64, hard_selection=True, convnet=convnet), steps_per_epoch=100, epochs=1
    )
    avg_accuracy = test_oneshot(convnet, X_test, 20, 250, c_test)
    best_acc = avg_accuracy if avg_accuracy > best_acc else best_acc

print(f"Best accuracy: {best_acc}%")

=== Training loop 1 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 58.4% for 20-way one-shot learning.
=== Training loop 2 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 69.2% for 20-way one-shot learning.
=== Training loop 3 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 65.2% for 20-way one-shot learning.
=== Training loop 4 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 68.4% for 20-way one-shot learning.
=== Training loop 5 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 76.4% for 20-way one-shot learning.
=== Training loop 6 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 74.4% for 20-way one-shot learning.
=== Training loop 7 ===
Evaluating model on 250 random 20-way one-shot learning tasks...
Average accuracy of 75.2% for 20-way one-