## [2022-09-06]
    1. just look inside.

Copyright 2021 The TensorFlow Similarity Authors.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Tensorflow Similarity Supervised Learning Visualization - Intermediate Tutorial

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/similarity/blob/master/examples/supervised/supervised_visualization.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/similarity/blob/master/examples/supervised/supervised_visualization.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

This intermeditate tutorial focuses on demonstrating Tensorflow Similarity's advanced training utilities and visualization capabilities. You will be training a similairty model that will learn how to group images of cats and dogs by training on a subset of the [Oxford-IIIT pet dataset](https://www.tensorflow.org/datasets/catalog/oxford_iiit_pet) dataset. If you are not yet familiar Tensorflow Similarity, you might want to check out the [Hello World tutorial](https://github.com/tensorflow/similarity/blob/master/examples/supervised_hello_world.ipynb) to get familiar with the package.

In this notebook you will learn how to use the:

* `TFDatasetMultiShotMemorySampler()` to directly integrate with the Tensorflow dataset catalog. 
* `EfficientNetSim()` model architecture that leverage the [EficientNet](https://keras.io/api/applications/) backbone, data augmentation, and ImageNet pre-trained weights to have an efficient pre-trained model that we will fine-tune for similarity purposes.
* `EvalCallback()` to track the matching classification perfomance during training.
* `SplitValidationLoss()` callback to seperatly visualize how the performance of the seen and unseen classes evolves during training.
* `projector()` to interactivaly explore the test example embedding space. This provides a sense of how well the model clusters similar looking images and which classes are entangled / confused.
* `CircleLoss()`, which is a efficient hyper parameter sensitive loss.

In [2]:
# 回應 = 1 表示已啟用自動混合精度運算
# 回應 = 0 表示已停止自動混合精度運算
! export TF_ENABLE_AUTO_MIXED_PRECISION=1
! export TF_ENABLE_AUTO_MIXED_PRECISION_GRAPH_REWRITE=1
! echo $TF_ENABLE_AUTO_MIXED_PRECISION

1


In [3]:
#TF_ENABLE_AUTO_MIXED_PRECISION has no effect. 只是warning而已。關掉warning方法，在執行python程式上方多加一行
%env TF_ENABLE_AUTO_MIXED_PRECISION=0

env: TF_ENABLE_AUTO_MIXED_PRECISION=0


## Imports

In [4]:
import os
import random
from time import time

import numpy as np
from IPython.display import Markdown, display
from matplotlib import pyplot as plt

# INFO messages are not printed.
# This must be run before loading other modules.
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"

In [5]:
import tensorflow as tf

In [6]:
# # install TF similarity if needed
# try:
#     import tensorflow_similarity as tfsim  # main package
# except ModuleNotFoundError:
#     !pip install tensorflow_similarity
#     import tensorflow_similarity as tfsim

In [7]:
import tensorflow_similarity as tfsim

Your CPU supports instructions that this binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2
For maximum performance, you can install NMSLIB from sources 
pip install --no-binary :all: nmslib


已安裝了也同樣顯示上面訊息 sudo pip install --no-binary :all: nmslib

In [8]:
tfsim.utils.tf_cap_memory()  # Avoid GPU memory blow up

# Data preparation

In this first step, we are going to load the [Oxford-IIIT pet dataset](https://www.tensorflow.org/datasets/catalog/oxford_iiit_pet) directly from the 
TensorFlow dataset catalog. This dataset has 37 classes representing different breeds of cats and dogs with roughly 200 images for each class. 

However, the dataset images are not of the same size, so we will need to resize them as part of the data loading. The `EfficientNetSim()` expects images to be 224x224 in the default configuration. However, because we use a random crop and resize layer as part of our augmentation strategy, it is important to have images that are slightly larger than the EfficientNet backbone's input size. Hence, we resize the images to 300 in the function below, which works well but you can experiment with different sizes as long as they are above 224.

In [9]:
IMG_SIZE = 300  # @param {type:"integer"}

# preprocessing function that resizes images to ensure all images are the same shape
def resize(img, label):
    size = 300  # slightly larger than EfficienNetB0 size to allow random crops.
    with tf.device("/cpu:0"):
        img = tf.cast(img, dtype="int32") # **TODO** chnage to uint8 for fit the test image in kaggle..
        img = tf.image.resize_with_pad(img, IMG_SIZE, IMG_SIZE)
        return img, label

## TFDatasetMultiShotMemorySampler

The following cell loads data directly from the TensorFlow catalog using TensorFlow similarity
`TFDatasetMultiShotMemorySampler()`. 

Using a sampler is required to ensure that each batch contains at least N samples of each classes incuded in each batch. Otherwise contrastive loss does not work properly as it can't compute positive distances.

As a Similarity Models allows us to match data from unseen classes, you can experiment with the model's ability to generalize by only trainig on a subset of the classes. Feel free to experiment with the ratio of seen and unseed classes by changing the sampler parameters below. The more classes are seen during training the better it will perform.

雖然可以僅訓練子集並分類未訓練的新類，但訓練包含越多類別模型效能越佳。

In [10]:
#子類別數量
training_classes = 16  # @param {type:"slider", min:1, max:37}
#每批中同一類的數量
examples_per_class_per_batch = 4  # @param {type:"integer"}
#隨機選取某數量的子類
train_cls = random.sample(range(37), k=training_classes)
#每批涵蓋的類別數，預設用全部子類
classes_per_batch = max(16, training_classes)
#符合efnet B0的輸入大小
target_img_size = 224  # Size of B0 image input

print(f"Class IDs seen during training {train_cls}")

def img_augmentation(img_batch, y, *args):
    # random resize and crop. Increase the size before we crop.
    img_batch = tf.keras.layers.RandomCrop(target_img_size, target_img_size)(img_batch)
    # random horizontal flip
    img_batch = tf.image.random_flip_left_right(img_batch)
    return img_batch, y


# use the train split for training
train_ds = tfsim.samplers.TFDatasetMultiShotMemorySampler(
    "oxford_iiit_pet",
    splits="train",
    examples_per_class_per_batch=examples_per_class_per_batch,
    classes_per_batch=classes_per_batch,
    preprocess_fn=resize,
    class_list=train_cls,
    augmenter=img_augmentation,
)  # We filter train data to only keep the train classes.

# use the test split for indexing and querying
test_ds = tfsim.samplers.TFDatasetMultiShotMemorySampler(
    "oxford_iiit_pet",
    splits="test",
    total_examples_per_class=20,
    classes_per_batch=classes_per_batch,
    preprocess_fn=resize,
)

Class IDs seen during training [16, 6, 24, 15, 20, 27, 29, 2, 19, 9, 31, 8, 18, 35, 1, 34]


converting train:   0%|          | 0/3680 [00:00<?, ?it/s]

Corrupt JPEG data: 240 extraneous bytes before marker 0xd9
Corrupt JPEG data: premature end of data segment


Preprocessing data:   0%|          | 0/3680 [00:00<?, ?it/s]


The initial batch size is 64 (16 classes * 4 examples per class) with 0 augmenters


filtering examples:   0%|          | 0/3680 [00:00<?, ?it/s]

selecting classes:   0%|          | 0/16 [00:00<?, ?it/s]

gather examples:   0%|          | 0/1600 [00:00<?, ?it/s]

indexing classes:   0%|          | 0/1600 [00:00<?, ?it/s]

converting test:   0%|          | 0/3669 [00:00<?, ?it/s]

Preprocessing data:   0%|          | 0/3669 [00:00<?, ?it/s]


The initial batch size is 32 (16 classes * 2 examples per class) with 0 augmenters


filtering examples:   0%|          | 0/3669 [00:00<?, ?it/s]

selecting classes:   0%|          | 0/37 [00:00<?, ?it/s]

gather examples:   0%|          | 0/740 [00:00<?, ?it/s]

indexing classes:   0%|          | 0/740 [00:00<?, ?it/s]

    Class IDs seen during training [12, 19, 10, 35, 3, 24, 20, 9, 17, 2, 4, 8, 16, 18, 7, 25]
    2022-09-06 10:30:44.170037: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".
    Downloading and preparing dataset 773.52 MiB (download: 773.52 MiB, generated: 774.69 MiB, total: 1.51 GiB) to     ~/tensorflow_datasets/oxford_iiit_pet/3.2.0...
    Dl Completed...: 0 url [00:00, ? url/s]
    Dl Size...: 0 MiB [00:00, ? MiB/s]
    Extraction completed...: 0 file [00:00, ? file/s]   要等待一陣子讓他連接 他會存在lab的當下資料夾下 tfsim_embedding/~/tensorflow_datasets/oxford_iiit_pet/3.2.0 而不是真正的家目錄下

## class mapping

The following dictionaries map the class ids to the breed's name and the species type (Cat or Dog). These are used later on by the interactive projector.

In [11]:
breeds = {
    0: "Abyssinian",
    1: "American bulldog",
    2: "American pit bull terrier",
    3: "Basset hound",
    4: "Beagle",
    5: "Bengal",
    6: "Birman",
    7: "Bombay",
    8: "Boxer",
    9: "British shorthair",
    10: "Chihuahua",
    11: "Egyptian mau",
    12: "English cocker spaniel",
    13: "English setter",
    14: "German shorthaired",
    15: "Great pyrenees",
    16: "Havanese",
    17: "Japanese chin",
    18: "Keeshond",
    19: "Leonberger",
    20: "Maine coon",
    21: "Miniature pinscher",
    22: "Newfoundland",
    23: "Persian",
    24: "Pomeranian",
    25: "Pug",
    26: "Ragdoll",
    27: "Russian blue",
    28: "Saint bernard",
    29: "Samoyed",
    30: "Scottish terrier",
    31: "Shiba inu",
    32: "Siamese",
    33: "Sphynx",
    34: "Staffordshire bull terrier",
    35: "Wheaten terrier",
    36: "Yorkshire terrier",
}
species = {
    0: "Dog",
    1: "Cat",
    2: "Cat",
    3: "Cat",
    4: "Cat",
    5: "Dog",
    6: "Dog",
    7: "Dog",
    8: "Cat",
    9: "Dog",
    10: "Cat",
    11: "Dog",
    12: "Cat",
    13: "Cat",
    14: "Cat",
    15: "Cat",
    16: "Cat",
    17: "Cat",
    18: "Cat",
    19: "Cat",
    20: "Dog",
    21: "Cat",
    22: "Cat",
    23: "Dog",
    24: "Cat",
    25: "Cat",
    26: "Dog",
    27: "Dog",
    28: "Cat",
    29: "Cat",
    30: "Cat",
    31: "Cat",
    32: "Dog",
    33: "Dog",
    34: "Cat",
    35: "Cat",
    36: "Cat",
}

# Model Setup

# Callbacks

Most metrics used to evaluate similarity models cannot be computed without indexing embeddings and performing query matching classification. TensorFlow Similarity provides callbacks that makes it easy to compute these performance metrics during training. 

These callbacks work by taking two disjoint sets of examples:
1. A set of target examples and labels to be indexed. These will be examples returned by the search.
2. A set of query examples and labels that will be search query. 

Then all metrics metrics are compute be analyzing how many correct matches are returned by the search. While very very fast, this process is still too slow to be done for every training step. Instead the evaluation is only computed `on_epoch_end()`.

Additionally, the `EvalCallback()` and `SplitValidationLoss()` are TensorBoard aware - just add a `tf_logdir` path as illustrated below to have your metric logged. **WARNING** Tensorboard logging requires using a `TensorBoard()` callback that uses the same directory.

### 
    * 修改sim模型的metric_embedding_3 層的lauer name為"embedding_norm"，可訓練不可存檔
    * 訓練完sim模型後，在改名"embedding_norm"存檔？原始的metric_embedd名稱作祟，無法存檔！
    
    * 建立sim模型時，追加top layer為keras.layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=-1), name='embedding_norm') 訓練時會無法rreset_index而中止
    
    * 修改sim模型的metric_embedding_3 層的lauer name為"embedding_norm"，可訓練 使用saved_model.save 可saved_model.load讀檔與辨識 OK

In [12]:
num_targets = 200  # @param {type:"integer"}
num_queries = 300  # @param {type:"integer"}
k = 3  # @param {type:"integer"}
log_dir = "logs/%d/" % (time())

# Setup EvalCallback by splitting the test data into targets and queries.
queries_x, queries_y = test_ds.get_slice(0, num_queries)
targets_x, targets_y = test_ds.get_slice(num_queries, num_targets)
tsc = tfsim.callbacks.EvalCallback(
    queries_x,
    queries_y,
    targets_x,
    targets_y,
    metrics=["f1"],
    k=k,
    # tb_logdir=log_dir  # uncomment if you want to track in tensorboard
)

# Setup an EvalCallback for a known and unknown class split.
val_loss = tfsim.callbacks.EvalCallback(
    queries_x,
    queries_y,
    targets_x,
    targets_y,
    metrics=["binary_accuracy"],
    known_classes=tf.constant(train_cls),
    k=k,
    # tb_logdir=log_dir  # uncomment if you want to track in tensorboard
)

# Adding the Tensorboard callback to track metrics in tensorboard.
# tbc = tf.keras.callbacks.TensorBoard(log_dir=log_dir) # uncomment if you want to track in tensorboard

callbacks = [
    val_loss,
    tsc,
    # tbc # uncomment if you want to track in tensorboard
]

## Model training

We are now going to fine-tune an `EfficientNetSim()` model using `CircleLoss()` function. Because we are fine tuning the model we don't need a lot of epochs. In particular, because the dataset is very small. The small dataset size also means the model will not generalize very well, and the callback metric values will not look impressive. However, this is why visual inspection is important, in practice the matching looks really good and the model is good enough.

To improve performance, you can experiment with:
* change the `embedding_size`.
* set the `EfficientNetSim()` `trainable` parameters to 'partial' or 'full' to unfreeze some or all of the backbone layers.
* changing the loss function to `MultiSimilarityLoss()`, `TripletLoss()`, or any other supported loss.
* tweaking the learning rate.
* tweaking the gamma parameter in the `CircleLoss()`.

Additionally, you can also experiment with replacing the `EfficientNetSim()` with the architecture of your choice. 

In [13]:
train_ds.example_shape

(300, 300, 3)

In [18]:
embedding_size = 64 #128  # @param {type:"integer"}

# building model
model = tfsim.architectures.EfficientNetSim( # EfficientNet
# base_model = tfsim.architectures.EfficientNetSim( # EfficientNet
    train_ds.example_shape, # Must match size of EfficientNet version you use
    embedding_size, # Size of the output embedding
    #variant="B0", # Which Variant of the EfficientNet to use. Defaults to "B0".
    #trainable=, #"full" to make the entire backbone trainable, - "partial" to only make the last 3 block trainable - "frozen" to make it not trainable.
    #l2_norm=,
    pooling="gem",    # Can change to use `gem` -> GeneralizedMeanPooling2D
    gem_p=3.0,        # Increase the contrast between activations in the feature map.

    
)


# inputs = base_model.input
# output = base_model.output
# output = tf.keras.layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=-1), name='embedding_norm')(output)

# model = tf.keras.Model(inputs=[inputs], outputs=[output])

In [33]:
# set the last lalyer name 

for layer in model.layers:
    print(layer._name)
    #layer._name = layer.name + str("_2")

model.layers[-1]._name = "embedding_norm"



input_4
efficientnetb0
gem_pool
embedding_norm


In [34]:
model.summary()

Model: "similarity_model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         [(None, 300, 300, 3)]     0         
_________________________________________________________________
efficientnetb0 (Functional)  (None, None, None, 1280)  4049571   
_________________________________________________________________
gem_pool (GeneralizedMeanPoo (None, 1280)              0         
_________________________________________________________________
embedding_norm (MetricEmbedd (None, 64)                81984     
Total params: 4,131,555
Trainable params: 81,984
Non-trainable params: 4,049,571
_________________________________________________________________


In [35]:
epochs = 1  # @param {type:"integer"}
LR = 0.0001  # @param {type:"number"}
gamma = 256  # @param {type:"integer"} # Loss hyper-parameter. 256 works well here.
steps_per_epoch = 100
val_steps = 50


# init similarity loss
loss = tfsim.losses.CircleLoss(gamma=gamma)

# compiling and training
model.compile(optimizer=tf.keras.optimizers.Adam(LR), loss=loss)
history = model.fit(
    train_ds,
    epochs=epochs,
    steps_per_epoch=steps_per_epoch,
    validation_data=test_ds,
    validation_steps=val_steps,
    callbacks=callbacks,
)

Distance metric automatically set to cosine use the distance arg to override.
binary_accuracy_known_classes: 0.7857 - binary_accuracy_unknown_classes: 0.7414
f1: 0.8636


In [36]:
model.layers

[<keras.engine.input_layer.InputLayer at 0x7f42547b5670>,
 <keras.engine.functional.Functional at 0x7f4254362fd0>,
 <tensorflow_similarity.layers.GeneralizedMeanPooling2D at 0x7f4284267ee0>,
 <tensorflow_similarity.layers.MetricEmbedding at 0x7f4254443fa0>]

In [37]:
for layer in model.layers:
    print(layer._name)

input_4
efficientnetb0
gem_pool
embedding_norm


## SaveModel
    let's copy from somehwere out there.
    
    [todo] output = tf.keras.layers.Dense(64, name='embedding')(output)
            output = tf.keras.layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=-1), name='embedding_norm')(output)
            should we add this l2_nor in Top -layer??

In [38]:
# set the last lalyer name 

for layer in model.layers:
    print(layer._name)
    #layer._name = layer.name + str("_2")

model.layers[-1]._name = "embedding_norm"

for layer in model.layers:
    print(layer._name)

input_4
efficientnetb0
gem_pool
embedding_norm
input_4
efficientnetb0
gem_pool
embedding_norm


In [44]:
model_save_dir = 'tfsim-efnet-b0/'

save_options = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost',
    #namespace_whitelist=None, save_debug_info=False, function_aliases=None,
    #experimental_io_device=None, experimental_variable_policy=None,
    #experimental_custom_gradients=True
)


# embedding_norm_model = tf.keras.Sequential(
#     model,
#     tf.keras.layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=-1), name='embedding_norm')
# )
# embedding_norm_model.save(
#     model_save_dir, 
#     options=save_options, 
#     include_optimizer=False
# )


tf.saved_model.save(model, model_save_dir, options=save_options)

# model.save(
#     model_save_dir, 
#     options=save_options, 
#     include_optimizer=False
# )


INFO:tensorflow:Assets written to: tfsim-efnet-b0/assets


INFO:tensorflow:Assets written to: tfsim-efnet-b0/assets


In [45]:
#Reload Model and Check
load_locally = tf.saved_model.LoadOptions(experimental_io_device='/job:localhost')

loaded_model = tf.saved_model.load(
    model_save_dir, 
    #options=load_locally,
)

# loaded_model = tf.keras.models.load_model(
#     model_save_dir, 
#     options=load_locally,
#     compile=False
# ) 

# loaded_model.summary()

embedding_fn = loaded_model.signatures["serving_default"]




In [46]:
from PIL import Image

image_path = tf.keras.utils.get_file(
    "african_elephant.jpg", "https://i.imgur.com/Bvro0YD.png"
)
image_tensor = tf.convert_to_tensor(
        np.array(Image.open(image_path).convert("RGB")).astype(np.float32)
    )
expanded_tensor = tf.expand_dims(image_tensor, axis=0)
embedding = embedding_fn(expanded_tensor)

print(expanded_tensor.dtype, embedding.keys())
embedding['metric_embedding_1']

<dtype: 'float32'> dict_keys(['metric_embedding_1'])


<tf.Tensor: shape=(1, 64), dtype=float32, numpy=
array([[ 0.11287818, -0.10832503, -0.13796909, -0.03698735,  0.19754942,
        -0.05470496, -0.04839032,  0.04517763,  0.0206751 , -0.00838856,
        -0.14765662, -0.1113561 , -0.14909455,  0.06483404,  0.1168419 ,
         0.2262413 , -0.05476293, -0.08007982,  0.10172811, -0.26772088,
        -0.05191431,  0.06654051,  0.22994415, -0.02983509, -0.15514494,
         0.03623642,  0.04111739, -0.1330535 ,  0.08591234,  0.02032786,
        -0.12149359,  0.02411855,  0.1198781 ,  0.1280244 ,  0.18334332,
        -0.21422732, -0.05180394,  0.12627457,  0.03409532,  0.05326991,
        -0.07292303, -0.06666929,  0.12273896, -0.05778563,  0.10370904,
        -0.00138885,  0.16439037,  0.04329295, -0.01539031, -0.03171251,
         0.0512972 , -0.02723633,  0.01199827, -0.04614533, -0.21564917,
        -0.04500039,  0.3992623 , -0.10061475,  0.0315835 ,  0.00495048,
        -0.16167009,  0.06921558, -0.06107598, -0.31805715]],
      dtype=f

# if add tf.math.l2_normalize [TODO]
    savedModel 有問題
    後續嘗試save best model as h5 as keras model formate, then load the best model (h5) back and add input layer, l2_nor layer after 64D. 最後組好才savedModel to dir then zip to submission.zip.
    

In [47]:
# if add tf.math.l2_normalize [TODO]

# load_options = tf.saved_model.LoadOptions(
#     allow_partial_checkpoint=False, experimental_io_device=None,
#     experimental_skip_checkpoint=False
# )


# awesome_model = tf.keras.Sequential(
#     tf.saved_model.load(model_save_dir),# options=load_options),
#     tf.keras.layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=-1), name='embedding_norm')
# )

# awesome_model = tf.saved_model.load(model_save_dir)
# awesome_model.summary()# just skip for summary bug, So basically saved_model.load makes summary not work anymore.

In [48]:
# embedding_norm_model = tf.keras.Sequential(
#     model,
#     tf.keras.layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=-1), name='embedding_norm')
# )

In [49]:
from zipfile import ZipFile
from os.path import basename

dirName = model_save_dir

with ZipFile('submission.zip','w') as zipObj:           
    # Iterate over all the files in directory
    for folderName, subfolders, filenames in os.walk(dirName):
        print(folderName, subfolders, filenames )
        for filename in filenames:
            #create complete filepath of file in directory
            filePath = os.path.join(folderName, filename)
            # Add file to zip
            zipObj.write(filePath, basename(filePath))

            
## https://www.kaggle.com/code/motono0223/guie-clip-tensorflow-train-example            
# save_locally = tf.saved_model.SaveOptions(
#     experimental_io_device='/job:localhost'
# )
# emb_model.save('./embedding_norm_model', options=save_locally)

# from zipfile import ZipFile

# with ZipFile('submission.zip','w') as zip:           
#     zip.write(
#         './embedding_norm_model/saved_model.pb', 
#         arcname='saved_model.pb'
#     ) 
#     zip.write(
#         './embedding_norm_model/variables/variables.data-00000-of-00001', 
#         arcname='variables/variables.data-00000-of-00001'
#     ) 
#     zip.write(
#         './embedding_norm_model/variables/variables.index', 
#         arcname='variables/variables.index'
#     )

tfsim-efnet-b0/ ['assets', 'variables'] ['saved_model.pb']
tfsim-efnet-b0/assets [] []
tfsim-efnet-b0/variables [] ['variables.index', 'variables.data-00000-of-00001']


# Metric Plotting

The following plots show that the model has a hard time generalizing, as the val loss remains mostly flat while the loss decreases.

In [None]:
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.legend(["loss", "val_loss"])
plt.title(f"Loss: {loss.name} - LR: {LR}")
plt.show()

Digging deeper, thanks to the `SplitValidationLoss()` callback, we can contrast the binary accuracy for seen and unseen classes. This reveals that the model peak performance happend at epoch 2. As expected, there is a gap between the seen and unseen binary accuracy, however, this gap is fairly small which indicates that the model generalizes fairly well, even though it only trained on 42% of the classes. 

In [None]:
plt.plot(history.history["binary_accuracy_known_classes"])
plt.plot(history.history["binary_accuracy_unknown_classes"])
plt.legend(["binary_accuracy_known", "binary_accuracy_unknown"])
plt.title(f"Known | Unknown binary_accuracy: {loss.name} - LR: {LR}")
plt.show()

# Indexing
We are going to index about 1000 examples from all 37 classes
and use the remainder of the test dataset as unseen queries

In [None]:
# What is indexed
index_size = 360
query_size = 360
index_x, index_y = test_ds.get_slice(0, index_size)
index_data = tf.cast(index_x, dtype="int32")  # casted so it can displayed

# what will be used as never seen before queries to test performance
test_x, test_y = test_ds.get_slice(index_size, query_size)
test_y = [int(c) for c in test_y]
test_data = tf.cast(test_x, dtype="int32")  # casted so it can displayed

In [None]:
model.reset_index()
model.index(index_x, index_y, data=index_data)

## Visualize Inspection of the Results

As mentioned earlier, it may be difficult to get a sense of the model quality from the metrics alone. A complementary approach is to manually indpect a set of query results to get a sense of the match quality. Looking at the model result, while imperfect, still returns meaningfully similar results. The model is able to find images of similar looking animals irrespective of their pose or image illumination.

In [None]:
num_examples = 5
num_neigboors = 5
idxs = random.sample(range(len(test_y)), num_examples)
batch = tf.gather(test_x, idxs)
nns = model.lookup(batch, k=num_neigboors)
for bid, nn in zip(idxs, nns):
    # view results close by
    if test_y[bid] in train_cls:
        display(Markdown("**Known Class**"))
    else:
        display(Markdown("**Unknown Class**"))
    tfsim.visualization.viz_neigbors_imgs(test_data[bid], test_y[bid], nn, class_mapping=breeds, cmap="Greys")

## Visualize clusters

One of the best ways to quickly get a sense of the quality of how the model is doing and understand it's short comings is to project the embedding into a 2D space. This allows us to inspect clusters of images and understand which classes are entangled.

In [None]:
num_examples_to_clusters = 720  # @param {type:"integer"}
thumb_size = 96  # @param {type:"integer"}
plot_size = 800
vx, vy = test_ds.get_slice(0, num_examples_to_clusters)
tfsim.visualization.projector(
    model.predict(vx), labels=vy, images=vx, class_mapping=breeds, image_size=thumb_size, plot_size=plot_size
)

Thank you for following this tutorial till the end. If you are interested in learning about TensorFlow Similarity advanced features, you can checkout our other notebooks.