# Monitoring models (Tensorboard)

Montiroing models is crucial part when training a model. Continuous monitoring of the model enables to ensure the model training is functioning as intended. Furthermore, it can also provide insights to improvements that can be made to improve model performance and execution time. Here, we will see how we can use the TensorBoard to continuously monitor the model, profile the model as well as visualize various data types such as images and text.

<table align="left">
    <td>
        <a target="_blank" href="https://colab.research.google.com/github/thushv89/manning_tf2_in_action/blob/master/Ch14-Tensorboard/14.1_Tensorboard.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
    </td>
</table>



# Important checks before running this code

## Setting the `TF_GPU_THREAD_MODE` variable

This variable will be something we'll be changing later in the code. The change you do will be persistent. Therefore, if you run this notebook multiple times, you'll be starting running the code with this variable set to a different value than the default. To avoid that, 
* Stop the Jupyter notebook server
* Set this environment variable `TF_GPU_THREAD_MODE=global` which is the default value. To do that, follow the instructions avalable at [this section](#set_environment) **with `global` as the value instead of `gpu_private`** to undo the changes.
* Restart the Juptyer notebook server

## Installing Model profiling with CUDA
In order to make sure all the features of the Tensorboard work, make sure to install the `libcupti` library. It stands for **Lib**rary - **CU**DA **P**rofiling **T**ools **I**nterface. It is a GPU profiling toolkit by NVIDIA, which is required by the Tensboard profiling dashboard.

### Linux Installation - `libcupti`
On linux you can install this using `sudo apt-get install libcupti-dev`.

### Windows Installation - `libcupti`

As opposed to the Linux installation, Windows installation require more work.

* Make sure you have installed the required CUDA installation (e.g. CUDA 11 [>= TensorFlow 2.4.0])
* Next, open the NVIDIA Control Panel to do several changes (These were suggested in the following [Github issue](https://github.com/tensorflow/tensorflow/issues/35860#issuecomment-603728531)),
  * Make sure you have set the Developer Mode by clicking Desktop > Set Developer Mode
  * Make sure you have enabled GPU profiling to all users and not just the adiministrator. 
* For more errors you might face, refer the following page from the official [NVIDIA website](https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti)
* To install `libcupti` (Motivated by this [Stackoverflow question](https://stackoverflow.com/questions/54028188/how-to-install-cuda-profiling-tools-interface-on-windows-10/54029753)),
  * Copy `libcupti_<version>.dll`, `nvperf_host.dll` and `nvperf_target.dll` from the `extras\CUPTI\lib64` to the `bin` folder. Make sure the `libcupti` file has the name, `libcupti_110.dll`.
  * Copy all files in the `extras\CUPTI\lib64` to `lib\x64`
  * Copy all files in the `extras\CUPTI\include` to `include`.

# Importing necessary libraries

In [1]:
import random
import tensorflow as tf
import numpy as np
import tensorflow_datasets as tfds
import shutil
import os
from datetime import datetime

%load_ext tensorboard

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except:
        print("Couldn't set memory_growth")
        pass
    
def fix_random_seed(seed):
    """ Setting the random seed of various libraries """
    try:
        np.random.seed(seed)
    except NameError:
        print("Warning: Numpy is not imported. Setting the seed for Numpy failed.")
    try:
        tf.random.set_seed(seed)
    except NameError:
        print("Warning: TensorFlow is not imported. Setting the seed for TensorFlow failed.")
    try:
        random.seed(seed)
    except NameError:
        print("Warning: random module is not imported. Setting the seed for random failed.")

# Fixing the random seed
random_seed=4321
fix_random_seed(random_seed)

log_datetimestamp_format = "%Y%m%d%H%M%S"
print("TensorFlow version: {}".format(tf.__version__))

TensorFlow version: 2.9.3


In [2]:
if os.path.exists('logs'):
    shutil.rmtree('logs')

# Visualizing Image Data on the TensorBoard

First we're going to visualize some image data on the TensorBoard. This is done by logging some sample images to a specific directory, which is monitored by the TensorBoard for any incoming data.

## Importing the Fashion-MNIST dataset

In [3]:
# Construct a tf.data.Dataset
fashion_ds = tfds.load('fashion_mnist')

print(fashion_ds)

{'test': <PrefetchDataset element_spec={'image': TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), 'label': TensorSpec(shape=(), dtype=tf.int64, name=None)}>, 'train': <PrefetchDataset element_spec={'image': TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), 'label': TensorSpec(shape=(), dtype=tf.int64, name=None)}>}


## Create training/validation/testing data

As we have done before, let's separate the data to training, validation and testing subsets.

In [4]:
# Section 14.1

# Code listing 14.1
def get_train_valid_test_datasets(fashion_ds, batch_size, flatten_images=False):
    
    # Get the training dataset, shuffle it, and output a tuple of (image, label) 
    train_ds = fashion_ds["train"].shuffle(batch_size*20).map(lambda xy: (xy["image"], tf.reshape(xy["label"], [-1])))
    # Get the testing dataset, and output a tuple of (image, label)
    test_ds = fashion_ds["test"].map(lambda xy: (xy["image"], tf.reshape(xy["label"], [-1])))
    
    if flatten_images:
        # Flatten the images to a 1D vector for fully-connected networks
        train_ds = train_ds.map(lambda x,y: (tf.reshape(x, [-1]), y))
        test_ds = test_ds.map(lambda x,y: (tf.reshape(x, [-1]), y))
    
    # Make the validation dataset the first 10000 data
    valid_ds = train_ds.take(10000).batch(batch_size)
    # Make training dataset the rest
    train_ds = train_ds.skip(10000).batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE)
    
    return train_ds, valid_ds, test_ds

## Using `tf.summary` to visualize images on TensorBoard

When logging data to be shown on the TensorBoard, they are logged as `tf.summary` type objects. Since we're working with images here, we'll use `tf.summary.image` object.

In [5]:
# Section 14.1

# Defining the ID to Label map
id2label_map = {
    0: "T-shirt/top",
    1: "Trouser",
    2:"Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle boot"
}

print("Writing to the tensorboard")

log_datetimestamp = datetime.strftime(datetime.now(), log_datetimestamp_format)
image_logdir = "./logs/data_{}/train".format(log_datetimestamp)

# Define a summary writer
image_writer = tf.summary.create_file_writer(image_logdir)

# Write an image with its category
with image_writer.as_default():
    for data in fashion_ds["train"].batch(1).take(10):
        tf.summary.image(id2label_map[int(data["label"].numpy())], data["image"], max_outputs=20, step=0)

# Write a batch of images at once
with image_writer.as_default():
    for data in fashion_ds["train"].batch(20).take(1):
        pass
    tf.summary.image("A training data batch", data["image"], max_outputs=20, step=0)

print('\tDone')

Writing to the tensorboard
	Done


# Spinning up the TensorBoard
 
Here we're using tensorboard magic command on jupyter notebook. This gives us the TensorBoard inline, as if you were to open the Tensorboard in a browser tab. If you call the same command multiple times with the same `logdir` it will reuse the same Tensorboard. If the directories are different a new TensorBoard is spun up. 

There are times you have to restart the TensorBoard to get a fresh view of the logged data. For that,

On Linux,
* Open a command line terminal and execute `ps -ef|grep tensorboard`. This will give the process ID of TensorBoard
* Execute `kill -9 <TensorBoard process ID>` to kill the process.

On Windows,
* Execute the following two lines in the Jupyter notebook
* `!taskkill /IM "tensorboard.exe" /F`
* `!rmdir /s /q C:\Users\<user name>\AppData\Local\Temp\.tensorboard-info`

**Note**: On windows, it's not just enough to kill the process to restart the tensorboard. You have to delete the `C:\Users\<user name>\AppData\Local\Temp\.tensorboard-info` directory as well.

In [6]:
%tensorboard --logdir ./logs --port 6006

Reusing TensorBoard on port 6006 (pid 26792), started 0:19:27 ago. (Use '!kill 26792' to kill it.)

---
# Open [Tensorboard](http://localhost:6006) in the browser
---

# Tracking models on TensorBoard

Here we will compare two models; 
* a fully-connected model and 
* a convolutional neural network. 
To compare them we will use the Fashion-MNIST dataset.

## Monitoring the performance of the fully-connected network

Here we analyse the training and validation performance of the fully-connected network. We will track loss and accuracy of the model.

### Fully-connected network

Here we define a fully connected network with 3 layers. 

In [7]:
# Section 14.2

from tensorflow.keras import layers, models


dense_model = models.Sequential([
    layers.Dense(512, activation='relu', input_shape=(784,)),
    layers.Dense(256, activation='relu'),
    layers.Dense(10, activation='softmax')
])

dense_model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=['accuracy'])


## Training the model

In [8]:
# Section 14.2

log_datetimestamp = datetime.strftime(datetime.now(), log_datetimestamp_format)
dense_log_dir = os.path.join("logs","dense_{}".format(log_datetimestamp))

batch_size = 64
tr_ds, v_ds, ts_ds = get_train_valid_test_datasets(fashion_ds, batch_size=batch_size, flatten_images=True)

# Defining the tensorboard callback, it will log information to the defined log_dir directory
tb_callback = tf.keras.callbacks.TensorBoard(log_dir=dense_log_dir, profile_batch=0)

# Train the model
dense_model.fit(tr_ds, validation_data=v_ds, epochs=10, callbacks=[tb_callback])


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x11b0b97b640>

To view the results of the fully connected model,

---
## Open [Tensorboard](http://localhost:6006) in the browser
---

## Monitoring the performance of the CNN

Now let's define a CNN model, train it and visualize model performance on the TensorBoard

In [9]:
# Section 14.2

import tensorflow.keras.backend as K
K.clear_session()

conv_model = models.Sequential([
    layers.Conv2D(filters=32, kernel_size=(5,5), strides=(2,2), padding='same', activation='relu', input_shape=(28,28,1)),
    layers.Conv2D(filters=16, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

conv_model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=['accuracy'])
conv_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 14, 14, 32)        832       
                                                                 
 conv2d_1 (Conv2D)           (None, 14, 14, 16)        4624      
                                                                 
 flatten (Flatten)           (None, 3136)              0         
                                                                 
 dense (Dense)               (None, 10)                31370     
                                                                 
Total params: 36,826
Trainable params: 36,826
Non-trainable params: 0
_________________________________________________________________


## Training the model

In [10]:
log_datetimestamp = datetime.strftime(datetime.now(), log_datetimestamp_format)
conv_log_dir = os.path.join("logs","conv_{}".format(log_datetimestamp))

In [11]:
batch_size = 64
tr_ds, v_ds, ts_ds = get_train_valid_test_datasets(fashion_ds, batch_size=batch_size, flatten_images=False)

# This tensorboard call back does the followin
# 1. Log loss and accuracy
# 2. Plot activation histograms every two epochs
tb_callback = tf.keras.callbacks.TensorBoard(log_dir=conv_log_dir, histogram_freq=2, profile_batch=0)

conv_model.fit(tr_ds, validation_data=v_ds, epochs=10, callbacks=[tb_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x11b0b93c9a0>

To view the result comparison between the fully connected model and the CNN,

---
## Open [Tensorboard](http://localhost:6006) in the browser
---

# Logging custom metrics to the TensorBoard

Some times, we need to log custom metrics to the TensorBoard to visualize and understand them. Here we train two models with and without batch normalization. Then, to observe the effect of batch normalization on the weight parameters, we will analyze the mean and standard deviation of the absolute weights of the second layer. 

In [12]:
# Section 14.3

from tensorflow.keras import layers, models
import tensorflow.keras.backend as K

K.clear_session()

dense_model = models.Sequential([
    layers.Dense(512, activation='relu', input_shape=(784,)),    
    layers.Dense(256, activation='relu', name='log_layer'),    
    layers.Dense(10, activation='softmax')
])

dense_model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=['accuracy'])

dense_model_bn = models.Sequential([
    layers.Dense(512, activation='relu', input_shape=(784,)),
    layers.BatchNormalization(),
    layers.Dense(256, activation='relu', name='log_layer_bn'),
    layers.BatchNormalization(),
    layers.Dense(10, activation='softmax')
])

dense_model_bn.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=['accuracy'])

## Training the model

In [13]:
log_datetimestamp = datetime.strftime(datetime.now(), log_datetimestamp_format)
exp_log_dir = os.path.join("logs","weights_exp_{}".format(log_datetimestamp))

In [14]:
# Section 14.3

# Code listing 14.2
def train_model(model, dataset, log_dir, log_layer_name, epochs):    
    
    # Define the writer
    writer = tf.summary.create_file_writer(log_dir)
    
    step = 0
    # Open the writer
    with writer.as_default():        
        tot_iterations_in_epoch = 0  # Total iterations in an epoch
        
        # For every epoch
        for e in range(epochs):
            print("Training epoch {}".format(e+1))
            # For every iteration in the epoch
            for batch in tr_ds:
                # Compute the step
                
                # Train with one batch
                model.train_on_batch(*batch)
                # Get the weights of the layer [0] - weights / [1] - bias
                w = model.get_layer(log_layer_name).get_weights()[0]
                
                # Log mean and std of absolute weights
                tf.summary.scalar("mean_weights", np.mean(np.abs(w)), step=step)
                tf.summary.scalar("std_weights", np.std(np.abs(w)), step=step)
                
                # Flush to the disk from the buffer
                writer.flush()
                
                step += 1
            print('\tDone')
    
    print("Training completed\n")
    
batch_size = 64
tr_ds, _, _ = get_train_valid_test_datasets(fashion_ds, batch_size=batch_size, flatten_images=True)
train_model(dense_model, tr_ds, exp_log_dir + '/standard', "log_layer", 5)

tr_ds, _, _ = get_train_valid_test_datasets(fashion_ds, batch_size=batch_size, flatten_images=True)
train_model(dense_model_bn, tr_ds, exp_log_dir + '/bn', "log_layer_bn", 5)

Training epoch 1
	Done
Training epoch 2
	Done
Training epoch 3
	Done
Training epoch 4
	Done
Training epoch 5
	Done
Training completed

Training epoch 1
	Done
Training epoch 2
	Done
Training epoch 3
	Done
Training epoch 4
	Done
Training epoch 5
	Done
Training completed



Don't forget that you can look at the results in the [TensorBoard](http://localhost:6006)


# Profiling models to detect performance bottlenecks

Here we will profile a convolutional neural network to undrestand performance bottlenecks and computational intensive parts of the pipeline. To highlight our messages, we will use a slightly complex CNN than the one above.

## Download the data

Here we will use a dataset containing images of flowers from [this link](https://www.robots.ox.ac.uk/~vgg/data/flowers/17/17flowers.tgz).

In [5]:
# Section 14.4

# Downloading the data

import os
import requests
import tarfile

import shutil

# Retrieve the data
if not os.path.exists(os.path.join('data', '17flowers.tgz')):
    
    url="https://www.robots.ox.ac.uk/~vgg/data/flowers/17/17flowers.tgz"

    # Get the file from web
    r = requests.get(url)

    if not os.path.exists('data'):
        os.makedirs('data')

    # Write to a file
    with open(os.path.join('data', '17flowers.tgz'), 'wb') as f:
        f.write(r.content)

else:
    print("The tar file already exists.")

if not os.path.exists(os.path.join('data', '17flowers')):
    # Write to a file
    tarf = tarfile.open(os.path.join("data","17flowers.tgz"))
    tarf.extractall(os.path.join('data', '17flowers'))
else:
    print("The extracted data already exists")

The tar file already exists.
The extracted data already exists


## Define the CNN model

In [6]:
# Code listing 14.3
def get_cnn_model():
    
    conv_model = models.Sequential([
        layers.Conv2D(filters=64, kernel_size=(5,5), strides=(1,1), padding='same', activation='relu', input_shape=(64,64,3)),
        layers.BatchNormalization(),
        layers.MaxPooling2D(pool_size=(3,3), strides=(2,2)),
        layers.Conv2D(filters=128, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'),
        layers.BatchNormalization(),
        layers.Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'),
        layers.BatchNormalization(),
        layers.Conv2D(filters=512, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'),
        layers.BatchNormalization(),
        layers.AveragePooling2D(pool_size=(2,2), strides=(2,2)),
        layers.Flatten(),        
        layers.Dense(512),
        layers.LeakyReLU(),
        layers.LayerNormalization(),                
        layers.Dense(256),
        layers.LeakyReLU(),
        layers.LayerNormalization(),                
        layers.Dense(17),
        layers.Activation('softmax', dtype='float32')
    ])
    return conv_model

In [4]:
print(os.environ["TF_GPU_THREAD_MODE"])

gpu_private


In [18]:
# Section 14.4

import os
from tensorflow.keras import layers, models
import tensorflow.keras.backend as K
K.clear_session()

profile_log_dir = os.path.join("logs","profile")
    
conv_model = get_cnn_model()

conv_model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=['accuracy'])
#conv_model.summary()

def get_flower_datasets(image_dir, batch_size, flatten_images=False):

    # Get the training dataset, shuffle it, and output a tuple of (image, label)
    dataset = tf.data.Dataset.list_files(os.path.join(image_dir,'*.jpg'), shuffle=False)

    def get_image_and_label(file_path):

        tokens = tf.strings.split(file_path, os.path.sep)        
        label = (tf.strings.to_number(tf.strings.split(tf.strings.split(tokens[-1],'.')[0], '_')[-1])-1)//80

        # load the raw data from the file as a string
        img = tf.io.read_file(file_path)
        img = tf.image.decode_jpeg(img, channels=3)

        return tf.image.resize(img, [64, 64]), label

    dataset = dataset.map(get_image_and_label).shuffle(400)

    # Make the validation dataset the first 10000 data
    valid_ds = dataset.take(250).batch(batch_size)
    # Make training dataset the rest
    train_ds = dataset.skip(250).batch(batch_size)

    return train_ds, valid_ds

batch_size = 32
tr_ds, v_ds = get_flower_datasets(
    os.path.join('data', '17flowers','jpg'), batch_size=batch_size, flatten_images=False
)
    
# This tensorboard call back does the followin
# 1. Log loss and accuracy
# 2. Profile the model memory/time for 10 batches
tb_callback = tf.keras.callbacks.TensorBoard(log_dir=profile_log_dir, profile_batch=[10, 20])

conv_model.fit(tr_ds, validation_data=v_ds, epochs=2, callbacks=[tb_callback])


Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x11b0aee18b0>

## Improving the CNN backed up by TensorBoard profiler findings

We are going to introduce the following changes
* Optimize the `tf.data` pipeline by incorporating prefetching and parallaized map functions
* Use mixed precision training
* Use private threads for the GPU to launch kernels

GPUs having a CUDA computing capability of more than 7 will be able to run mixed precision computations. If not, you will see an error similar to below.

```
WARNING:tensorflow:Mixed precision compatibility check (mixed_float16): WARNING
Your GPU may run slowly with dtype policy mixed_float16 because it does not have compute capability of at least 7.0. Your GPU:
  GeForce GTX 960M, compute capability 5.0
See https://developer.nvidia.com/cuda-gpus for a list of GPUs and their compute capabilities.
If you will use compatible GPU(s) not attached to this host, e.g. by running a multi-worker model, you can ignore this warning. This message will only be logged once
```

If you have the capability, you will see something like,

```
INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: GeForce RTX 2070, compute capability 7.5
```

<a id="set_environment"></a>

## Setting Environment Variables

To set environment variables you can do the following.

### Linux

Set the environment variable by,
* Opening a terminal 
* Run `export TF_GPU_THREAD_MODE=gpu_private`
* Verify the environment variable is set by calling `echo $TF_GPU_THREAD_MODE`
* Open a new shell and start the jupyter notebook server

### Windows

Set the environment variable by,
* From the start menu select `Edit the system environment variables`
* Click the button called `environment variables`
* Add a new environment variable `TF_GPU_THREAD_MODE=gpu_private` in the opened dialog
* Open a new command prompt and start the jupyter notebook server

### Conda environment

To set environment variables in a conda environment,
* Activate the conda environment with `conda activate manning.tf2`
* Run `conda env config vars set TF_GPU_THREAD_MODE=gpu_private`
* Deactivate and reactivate the environment, for the variable to take effect
* Start the jupyter notebook server

In [7]:
opt_profile_log_dir = os.path.join("logs","optimized_profile")

In [8]:
# Section 14.4

from tensorflow.keras import mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)

from tensorflow.keras import layers, models
import tensorflow.keras.backend as K
K.clear_session()

conv_model = get_cnn_model()

conv_model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=['accuracy'])

# Code listing 14.4
def get_flower_datasets(image_dir, batch_size, flatten_images=False):

    # Get the training dataset, shuffle it, and output a tuple of (image, label)
    dataset = tf.data.Dataset.list_files(os.path.join(image_dir,'*.jpg'), shuffle=False)

    def get_image_and_label(file_path):

        tokens = tf.strings.split(file_path, os.path.sep)        
        label = (tf.strings.to_number(tf.strings.split(tf.strings.split(tokens[-1],'.')[0], '_')[-1])-1)//80

        # load the raw data from the file as a string
        img = tf.io.read_file(file_path)
        img = tf.image.decode_jpeg(img, channels=3)

        return tf.image.resize(img, [64, 64]), label

    dataset = dataset.map(
        get_image_and_label,
        num_parallel_calls=tf.data.AUTOTUNE
    ).shuffle(400)

    # Make the validation dataset the first 10000 data
    valid_ds = dataset.take(250).batch(batch_size)
    # Make training dataset the rest
    train_ds = dataset.skip(250).batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE)

    return train_ds, valid_ds

batch_size = 32
tr_ds, v_ds = get_flower_datasets(os.path.join('data', '17flowers','jpg'), batch_size=batch_size, flatten_images=False)

# This tensorboard call back does the followin
# 1. Log loss and accuracy
# 2. Profile the model memory/time for 370-410 batches
tb_callback = tf.keras.callbacks.TensorBoard(log_dir=opt_profile_log_dir, profile_batch=[10, 20])

conv_model.fit(tr_ds, validation_data=v_ds, epochs=2, callbacks=[tb_callback])

# Resetting to float32
policy = mixed_precision.Policy('float32')
mixed_precision.set_global_policy(policy)


Epoch 1/2
Epoch 2/2


## Checking the data types when using mixed precision training

Here we can see how data types automatically changes between inputs, variables and outputs if you use mixed precision training.

In [9]:
# Section 14.4

print("Input to the layers have the data type: {}".format(conv_model.get_layer("conv2d_1").input.dtype))
print("Variables in the layers have the data type: {}".format(conv_model.get_layer("conv2d_1").trainable_variables[0].dtype))
print("Output of the layers have the data type: {}".format(conv_model.get_layer("conv2d_1").output.dtype))

Input to the layers have the data type: <dtype: 'float16'>
Variables in the layers have the data type: <dtype: 'float32'>
Output of the layers have the data type: <dtype: 'float16'>


# Visualizing word vectors on TensorBoard

Here, we are going to visualize word vectors on TensorBoard. TensorBoar has a dedicated section to display high dimensional vectors like word vectors. It internally provides dimensionality reduction mechanisms to map word vectors to 2D or 3D planes and analyse the data visually.

## Download GloVe word vectors

GloVe word vectors are a freely available set of word vectors produced as a part of [this paper](https://nlp.stanford.edu/pubs/glove.pdf). You can find more information on [this website](https://nlp.stanford.edu/projects/glove/) as well.

In [10]:
# Section 14.5

import os
import requests
import zipfile

if not os.path.exists(os.path.join('data','glove.6B.zip')):
    
    print("Downloading")
    url = "http://nlp.stanford.edu/data/glove.6B.zip"
    # Get the file from web
    r = requests.get(url)

    if not os.path.exists('data'):
        os.mkdir('data')
    
    # Write to a file
    with open(os.path.join('data','glove.6B.zip'), 'wb') as f:
        f.write(r.content)
    print("\tDone")
    
else:
    print("The zip file already exists.")
    
if not os.path.exists(os.path.join('data', 'glove.6B.50d.txt')):
    print("Extracting data")
    with zipfile.ZipFile(os.path.join('data','glove.6B.zip'), 'r') as zip_ref:
        zip_ref.extractall('data')
    print("\tDone")
else:
    print("The extracted data already exists")

Downloading
	Done
Extracting data
	Done


## Getting the most common words in the IMDB movie review dataset

We will use the IMDB movie review dataset for this exercise. It contains movie reviews written by critics for various movies. We will analyse the word vectors of the most common words appearing in this text corpus.

In [11]:
import numpy as np
import pandas as pd

review_ds = tfds.load('imdb_reviews')
train_review_ds = review_ds["train"]

corpus = []
for data in train_review_ds:      
    txt = str(np.char.decode(data["text"].numpy(), encoding='utf-8')).lower()
    corpus.append(str(txt))

Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to ~\tensorflow_datasets\imdb_reviews\plain_text\1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...: 0 examples [00:00, ? examples/s]

Shuffling ~\tensorflow_datasets\imdb_reviews\plain_text\1.0.0.incompleteQD0ELX\imdb_reviews-train.tfrecord*...…

Generating test examples...: 0 examples [00:00, ? examples/s]

Shuffling ~\tensorflow_datasets\imdb_reviews\plain_text\1.0.0.incompleteQD0ELX\imdb_reviews-test.tfrecord*...:…

Generating unsupervised examples...: 0 examples [00:00, ? examples/s]

Shuffling ~\tensorflow_datasets\imdb_reviews\plain_text\1.0.0.incompleteQD0ELX\imdb_reviews-unsupervised.tfrec…

Dataset imdb_reviews downloaded and prepared to ~\tensorflow_datasets\imdb_reviews\plain_text\1.0.0. Subsequent calls will reuse this data.


We will use the most common 5000 words as our sample

In [12]:
from collections import Counter

corpus = " ".join(corpus)

cnt = Counter(corpus.split())
print(cnt.most_common(100))

most_common_words = [w for w,_ in cnt.most_common(5000)]

[('the', 322198), ('a', 159953), ('and', 158572), ('of', 144462), ('to', 133967), ('is', 104171), ('in', 90527), ('i', 70480), ('this', 69714), ('that', 66292), ('it', 65505), ('/><br', 50935), ('was', 47024), ('as', 45102), ('for', 42843), ('with', 42729), ('but', 39764), ('on', 31619), ('movie', 30887), ('his', 29059), ('are', 28743), ('not', 28597), ('film', 27777), ('you', 27564), ('have', 27344), ('he', 26177), ('be', 25691), ('at', 22731), ('one', 22480), ('by', 21976), ('an', 21240), ('they', 20624), ('from', 19934), ('all', 19740), ('who', 19407), ('like', 18779), ('so', 18099), ('just', 17309), ('or', 16769), ('has', 16570), ('her', 16540), ('about', 16486), ("it's", 15970), ('some', 15280), ('if', 15189), ('out', 14510), ('what', 14055), ('very', 13633), ('when', 13609), ('more', 13170), ('there', 13094), ('she', 12234), ('would', 12027), ('even', 12010), ('good', 11926), ('my', 11766), ('only', 11566), ('their', 11317), ('no', 11273), ('really', 11065), ('had', 11042), ('whi

## Read GloVe vectors 

Here we read the GloVe vectors and only keep the vectors corresponding to the most common words we identified above.

In [13]:
df = pd.read_csv(os.path.join('data', 'glove.6B.50d.txt'), header=None, index_col=0, sep=None, error_bad_lines=False, encoding='utf-8')
df.head()

  df = pd.read_csv(os.path.join('data', 'glove.6B.50d.txt'), header=None, index_col=0, sep=None, error_bad_lines=False, encoding='utf-8')


  df = pd.read_csv(os.path.join('data', 'glove.6B.50d.txt'), header=None, index_col=0, sep=None, error_bad_lines=False, encoding='utf-8')
Skipping line 9: field larger than field limit (131072)


Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,...,41,42,43,44,45,46,47,48,49,50
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
the,0.418,0.24968,-0.41242,0.1217,0.34527,-0.044457,-0.49688,-0.17862,-0.00066,-0.6566,...,-0.29871,-0.15749,-0.34758,-0.045637,-0.44251,0.18785,0.002785,-0.18411,-0.11514,-0.78581
",",0.013441,0.23682,-0.16899,0.40951,0.63812,0.47709,-0.42852,-0.55641,-0.364,-0.23938,...,-0.080262,0.63003,0.32111,-0.46765,0.22786,0.36034,-0.37818,-0.56657,0.044691,0.30392
.,0.15164,0.30177,-0.16763,0.17684,0.31719,0.33973,-0.43478,-0.31086,-0.44999,-0.29486,...,-6.4e-05,0.068987,0.087939,-0.10285,-0.13931,0.22314,-0.080803,-0.35652,0.016413,0.10216
of,0.70853,0.57088,-0.4716,0.18048,0.54449,0.72603,0.18157,-0.52393,0.10381,-0.17566,...,-0.34727,0.28483,0.075693,-0.062178,-0.38988,0.22902,-0.21617,-0.22562,-0.093918,-0.80375
to,0.68047,-0.039263,0.30186,-0.17792,0.42962,0.032246,-0.41376,0.13228,-0.29847,-0.085253,...,-0.094375,0.018324,0.21048,-0.03088,-0.19722,0.082279,-0.09434,-0.073297,-0.064699,-0.26044


In [14]:
print("Full size of Glove: {}".format(df.shape[0]))
df_common = df.loc[df.index.isin(most_common_words)]
print("Size after only considering the most common words: {}".format(df_common.shape))

Full size of Glove: 399694
Size after only considering the most common words: (3595, 50)


## Writing the word vectors in order to be projected on TensorBoard

In [15]:
# Section 14.5

# Code listing 14.5
from tensorboard.plugins import projector

log_dir=os.path.join('logs', 'embeddings')

# Save the weights we want to analyse as a variable. Note that the first
# value represents any unknown word, which is not in the metadata, so
# we will remove that value.
weights = tf.Variable(df_common.values)
print(weights.shape)
# Create a checkpoint from embedding, the filename and key are
# name of the tensor.
checkpoint = tf.train.Checkpoint(embedding=weights)
checkpoint.save(os.path.join(log_dir, "embedding.ckpt"))

with open(os.path.join(log_dir, 'metadata.tsv'), 'w') as f:
    for w in df_common.index:
        f.write(w+'\n')
        
# Set up config
config = projector.ProjectorConfig()
embedding = config.embeddings.add()
# The name of the tensor will be suffixed by `/.ATTRIBUTES/VARIABLE_VALUE`
#embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"
embedding.metadata_path = 'metadata.tsv'
projector.visualize_embeddings(log_dir, config)


(3595, 50)


# Highlighting word vectors

There is a section in the word vectors panel where you can search for specific vectors. You can use regex patterns like the one below to search there and highlight specific vectors.

`(?:fred|larry|mrs\.|mr\.|michelle|sea|denzel|beach|comedy|theater|idiotic|sadistic|marvelous|loving|gorg|bus|truck|lugosi)`

# Separate TensorBoard for word vectors

We also need a separate TensorBoard service (we will use a different port). As visualizing word vectors, the TensorBoard expects to find the data in a very specific folder. Since for the previous TensorBoard we had already defined a different structure, we'll have to view word vectors in a different TensorBoard.

In [16]:
%tensorboard --logdir logs/embeddings/ --port 6007

---
# Open [Tensorboard for Word Vectors](http://localhost:6007) in the browser
---