# Trigger word detection using GRU

This notebook is based on the programming assignment "Trigger Word Detection" of deeplearning.ai, course Sequence models, week Sequence models and Attention mechanism. The figures are also taken from that assignment.

We will implement a network architecture using GRUs for trigger word detection. Trigger word detection is the technology that allows devices like Amazon Alexa, Google Home, Apple Siri to wake up upon hearing a certain word.  


## Learning objectives

- Application of Gated Rrecurrent Units (GRU) in TensorFlow
- Understand GRU architecture difference between Keras and TesnorFlow
- Apply Batch Normalization
- Use Adam optimizer with decay rate
- Convert hd5 file format to TensorFlow ckpt file
- Load Keras pre-trained weights to TensorFlow model

In [1]:
import numpy as np
import random
import sys
import io
import os
import glob
import IPython
%matplotlib inline

import tensorflow as tf
print(tf.__version__)

1.14.0


## Training set


In [2]:
# Load preprocessed training examples
X = np.load("./XY_train/X.npy")
Y = np.load("./XY_train/Y.npy")

print(X.shape)
print(Y.shape)

(26, 5511, 101)
(26, 1375, 1)


## Development set


In [3]:
# Load preprocessed dev set examples
X_dev = np.load("./XY_dev/X_dev.npy")
Y_dev = np.load("./XY_dev/Y_dev.npy")

print(X_dev.shape)
print(Y_dev.shape)

(25, 5511, 101)
(25, 1375, 1)


## Convert kernel and recurrent_kernel from keras to gate_kernel and candidate_kernel in TensorFlow

We split the kernel and recurrent_kernel from keras into the z (update gate), r (reset gate) and h (hidden layer). From the keras documentation self.kernel has size (inputs, units\*3) and self.recurrent_kernel has size (units, units\*3). 

From self.kernel we get 3 kernels with size (input, units) i.e. self.kernel_z=self.kernel(input,:units), self.kernel_r=self.kernel(input,units:units\*2) and self.kernel_h=self.kernel(input,units\*2:)

From self.recurrent_kernel we get 3 kernels with size (units, units) i.e. self.recurrent_kernel_z=self.recurrent_kernel(units,:units), self.recurrent_kernel_r=self.recurrent_kernel(units,units:units\*2) and self.recurrent_kernel_h=self.recurrent_kernel(units,units\*2:)

From the TensorFlow documentation gate_kernel has size (input+units, 2\*units), gate_bias has size 2\*units, candidate_kernel has size (input+units,units) and candidate_bias has size (units)

From the TensorFlow documentation

gate_inputs = math_ops.matmul(array_ops.concat([inputs, state], 1), self._gate_kernel)

gate_inputs = nn_ops.bias_add(gate_inputs, self._gate_bias)

value = math_ops.sigmoid(gate_inputs)

r, u = array_ops.split(value=value, num_or_size_splits=2, axis=1)

Since gate_inputs = math_ops.matmul(array_ops.concat([inputs, state], 1), self._gate_kernel) that means self._gate_kernel)  should be concatenation of kernel and recurrent_kernel with this order. 

Since r, u = array_ops.split(value=value, num_or_size_splits=2, axis=1) that means that self._gate_kernel includes gates r and z with r first and z second (opposite from keras)

Similarly

candidate = math_ops.matmul(array_ops.concat([inputs, r_state], 1), self._candidate_kernel)
candidate = nn_ops.bias_add(candidate, self._candidate_bias)

where self._candidate_kernel corresponds to candidate hidden state

In [4]:
# Convert kernel and recurrent_kernel from keras to gate_kernel and candidate_kernel in TensorFlow

def convert_kernel(kernel):
    kernel_z, kernel_r, kernel_h = np.hsplit(kernel, 3)
    return kernel_z, kernel_r, kernel_h

def gate_and_candidate_kernel(kernel,recurrent_kernel):
    kernel_z, kernel_r, kernel_h=convert_kernel(kernel[0])
    recurrent_kernel_z, recurrent_kernel_r, recurrent_kernel_h=convert_kernel(recurrent_kernel[0])

    r_concat=np.concatenate([kernel_r,recurrent_kernel_r],axis=0)
    z_concat=np.concatenate([kernel_z,recurrent_kernel_z],axis=0)
    gate_kernel=np.concatenate([r_concat,z_concat],axis=1)

    candidate_kernel=np.concatenate([kernel_h,recurrent_kernel_h],axis=0)

    return gate_kernel, candidate_kernel

def convert_bias(bias):
    bias = bias.reshape(3, -1) 
    dim2 = bias.shape[1]
    bias = bias[[1, 0, 2], :].reshape(-1)
    bias_1 = bias[:dim2*2]
    bias_2 = bias[dim2*2:]
    return bias_1, bias_2

In [5]:
def get_batches(x, y, batch_size):
    n_batches = len(x)//batch_size
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]

In [6]:
def build_inputs(Tx, n_freq, Ty):
    
    inputs_ = tf.placeholder(tf.float32,[None,Tx,n_freq],name='inputs_')
    targets_ = tf.placeholder(tf.float32,[None,Ty,1], name='targets_')
    training = tf.placeholder_with_default(False, shape=(), name='training')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    return inputs_, targets_, training, keep_prob

In [8]:
def build_gru(gru_size, gru_layers, batch_size, keep_prob):
    
    # Your basic GRU cell
    gru = tf.contrib.rnn.GRUCell(gru_size) 
    
    # Add dropout to the cell
    drop = tf.contrib.rnn.DropoutWrapper(gru, output_keep_prob=keep_prob)
    
    # Stack up multiple GRU layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([drop for _ in range(gru_layers)])
    
    # Getting an initial state of all zeros
    initial_state = cell.zero_state(batch_size, tf.float32)

    return cell, initial_state

In [9]:
def build_loss(logits, targets):
        
    loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=targets))

    return loss

In [10]:
def build_optimizer(loss, learning_rate, decay_rate):
    
    global_step = tf.Variable(0, trainable=False)
    decay_steps = 1.0
    learning_rate = tf.train.inverse_time_decay(learning_rate, global_step, decay_steps, decay_rate)
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss, global_step)
    
    return optimizer, learning_rate

## Network architecture
The model will use 1-D convolutional layers, GRU layers, and dense layers.
<img src="images/model_.png" style="width:600px;height:600px;">

One key step of this model is the 1D convolutional step (near the bottom of Figure). It inputs the 5511 step spectrogram, and outputs a 1375 step output, which is then further processed by multiple layers to get the final $T_y = 1375$ step output. This layer plays a role similar to the 2D convolutions, of extracting low-level features and then possibly generating an output of a smaller dimension. 

Computationally, the 1-D conv layer also helps speed up the model because now the GRU  has to process only 1375 timesteps rather than 5511 timesteps. The two GRU layers read the sequence of inputs from left to right, then ultimately uses a dense+sigmoid layer to make a prediction for $y^{\langle t \rangle}$. Because $y$ is binary valued (0 or 1), we use a sigmoid output at the last layer to estimate the chance of the output being 1, corresponding to the user having just said "activate."

Note that we use a uni-directional RNN rather than a bi-directional RNN. This is really important for trigger word detection, since we want to be able to detect the trigger word almost immediately after it is said. If we used a bi-directional RNN, we would have to wait for the whole 10 sec of audio to be recorded before we could tell if "activate" was said in the first second of the audio clip.  

In [11]:
class trigger_model:
    
    def __init__(self, batch_size, Tx, n_freq, Ty, gru_size, gru_layers, learning_rate, decay_rate):
    
        tf.reset_default_graph()
        
        # Build the input placeholder tensors
        self.inputs_, self.targets_, self.training, self.keep_prob = build_inputs(Tx, n_freq, Ty)
        
        # CONV layer
        self.conv1d_output=tf.layers.conv1d(self.inputs_,filters=256,kernel_size=15,strides=4)
        self.batchNorm_ = tf.layers.batch_normalization(self.conv1d_output, training=self.training)
        self.batchNorm_act = tf.nn.relu(self.batchNorm_)
        self.dropout_ = tf.nn.dropout(self.batchNorm_act,rate=1-self.keep_prob)

         # First GRU Layer
        with tf.variable_scope('GRU_1'):
            gru_cell_1, self.initial_state_gru_cell_1 = build_gru(gru_size, gru_layers, batch_size, self.keep_prob)
            self.outputs_gru_cell_1, state_1 = tf.nn.dynamic_rnn(gru_cell_1, self.dropout_, initial_state=self.initial_state_gru_cell_1)
            self.final_state_gru_cell_1 = state_1
            self.batchNorm_1 = tf.layers.batch_normalization(self.outputs_gru_cell_1, training=self.training)
        
         # Second GRU Layer
        with tf.variable_scope('GRU_2'):
            gru_cell_2, self.initial_state_gru_cell_2 = build_gru(gru_size, gru_layers, batch_size, self.keep_prob)
            self.outputs_gru_cell_2, state_2 = tf.nn.dynamic_rnn(gru_cell_2, self.batchNorm_1, initial_state=self.initial_state_gru_cell_2)
            self.final_state_gru_cell_2 = state_2
            self.batchNorm_2 = tf.layers.batch_normalization(self.outputs_gru_cell_2, training=self.training)
            self.dropout_2 = tf.nn.dropout(self.batchNorm_2,rate=0.8)
        
         # Dense layer with sigmoid activation
            self.logits = tf.contrib.layers.fully_connected(self.dropout_2, 1, activation_fn=None, scope='logits')
            self.predictions = tf.nn.sigmoid(self.logits) 
        
#         correct_pred = tf.equal(tf.round(self.predictions), self.targets_)
#         self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
    
        self.loss = build_loss(self.logits, self.targets_)
        self.optimizer, self.learning_rate = build_optimizer(self.loss, learning_rate, decay_rate)
        
        self.acc, self.acc_op = tf.metrics.accuracy(labels=self.targets_, 
                                  predictions=tf.round(self.predictions))

In [12]:
Tx = 5511 # The number of time steps input to the model from the spectrogram
n_freq = 101 # Number of frequencies input to the model at each time step of the spectrogram
Ty = 1375 # The number of time steps in the output of our model
batch_size = 5
gru_size = 128 
gru_layers = 1
learning_rate = 0.0001
decay_rate = 0.01

## Load the weights (trained in keras) from the hd5 file and create a checkpoint
### run the next cell 1 time

In [14]:
model_pretrained=tf.keras.models.load_model('./models/tr_model.h5')
sess = tf.keras.backend.get_session()
saver = tf.train.Saver()
save_path = saver.save(sess, "checkpoints_pretrained/model_pretrained.ckpt")

W0815 21:38:31.580236 4541654464 nn_ops.py:4224] Large dropout rate: 0.8 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
W0815 21:38:31.840733 4541654464 nn_ops.py:4224] Large dropout rate: 0.8 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
W0815 21:38:32.149090 4541654464 nn_ops.py:4224] Large dropout rate: 0.8 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
W0815 21:38:32.278696 4541654464 nn_ops.py:4224] Large dropout rate: 0.8 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
W0815 21:38:33.092138 4541654464 deprecation.py:323] From /Users/vasileios-mariosgkortsas/anaconda3/envs/dog-project/lib/python3.5/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.

## pretrained weights from hd5 file

In [15]:
tf.reset_default_graph()
with tf.Session() as sess:
    #sess.run(tf.global_variables_initializer())
    saver = tf.train.import_meta_graph('checkpoints_pretrained/model_pretrained.ckpt.meta')
    saver.restore(sess,tf.train.latest_checkpoint('checkpoints_pretrained'))    
    for variable in tf.trainable_variables():
        print(variable.name, variable.shape)

W0815 21:38:46.105550 4541654464 deprecation.py:323] From /Users/vasileios-mariosgkortsas/anaconda3/envs/dog-project/lib/python3.5/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.


conv1d_3/kernel:0 (15, 101, 256)
conv1d_3/bias:0 (256,)
batch_normalization_7/gamma:0 (256,)
batch_normalization_7/beta:0 (256,)
gru_5/kernel:0 (256, 384)
gru_5/recurrent_kernel:0 (128, 384)
gru_5/bias:0 (384,)
batch_normalization_8/gamma:0 (128,)
batch_normalization_8/beta:0 (128,)
gru_6/kernel:0 (128, 384)
gru_6/recurrent_kernel:0 (128, 384)
gru_6/bias:0 (384,)
batch_normalization_9/gamma:0 (128,)
batch_normalization_9/beta:0 (128,)
time_distributed_3/kernel:0 (128, 1)
time_distributed_3/bias:0 (1,)


## Assign the pretrained weight values to variables that we will use to load the trainable weights of the TensorFlow model

In [16]:
# assign the pretrained weight values to variables that we will use to load the trainable weights of my model
tf.reset_default_graph()
with tf.Session() as sess:
    #sess.run(tf.global_variables_initializer())
    saver = tf.train.import_meta_graph('checkpoints_pretrained/model_pretrained.ckpt.meta')
    saver.restore(sess,tf.train.latest_checkpoint('checkpoints_pretrained'))
    
    var = [v for v in tf.trainable_variables() if v.name == 'conv1d_3/kernel:0']
    pretrained_var_1=np.array(sess.run(var))
    var = [v for v in tf.trainable_variables() if v.name == 'conv1d_3/bias:0']
    pretrained_var_2=np.array(sess.run(var))
    var = [v for v in tf.trainable_variables() if v.name == 'batch_normalization_7/gamma:0']
    pretrained_var_3=np.array(sess.run(var))
    var = [v for v in tf.trainable_variables() if v.name == 'batch_normalization_7/beta:0']
    pretrained_var_4=np.array(sess.run(var))
    
    var = [v for v in tf.trainable_variables() if v.name == 'gru_5/kernel:0']
    pretrained_gru1_kernel=np.array(sess.run(var))
    var = [v for v in tf.trainable_variables() if v.name == 'gru_5/recurrent_kernel:0']
    pretrained_gru1_recurrent_kernel=np.array(sess.run(var))
    var = [v for v in tf.trainable_variables() if v.name == 'gru_5/bias:0']
    pretrained_gru1_bias=np.array(sess.run(var))
    
    var = [v for v in tf.trainable_variables() if v.name == 'batch_normalization_8/gamma:0']
    pretrained_var_8=np.array(sess.run(var))
    var = [v for v in tf.trainable_variables() if v.name == 'batch_normalization_8/beta:0']
    pretrained_var_9=np.array(sess.run(var))
    
    var = [v for v in tf.trainable_variables() if v.name == 'gru_6/kernel:0']
    pretrained_gru2_kernel=np.array(sess.run(var))
    var = [v for v in tf.trainable_variables() if v.name == 'gru_6/recurrent_kernel:0']
    pretrained_gru2_recurrent_kernel=np.array(sess.run(var))
    var = [v for v in tf.trainable_variables() if v.name == 'gru_6/bias:0']
    pretrained_gru2_bias=np.array(sess.run(var))
    
    var = [v for v in tf.trainable_variables() if v.name == 'batch_normalization_9/gamma:0']
    pretrained_var_13=np.array(sess.run(var))
    var = [v for v in tf.trainable_variables() if v.name == 'batch_normalization_9/beta:0']
    pretrained_var_14=np.array(sess.run(var))
    
    var = [v for v in tf.trainable_variables() if v.name == 'time_distributed_3/kernel:0']
    pretrained_var_15=np.array(sess.run(var))
    var = [v for v in tf.trainable_variables() if v.name == 'time_distributed_3/bias:0']
    pretrained_var_16=np.array(sess.run(var))

## Trainable parameters of the TesnorFlow model

In [18]:
# trainable parameters of my model
tf.reset_default_graph()
model = trigger_model(batch_size=batch_size, Tx=Tx, n_freq=n_freq, Ty=Ty, gru_size=gru_size, gru_layers=gru_layers,
                     learning_rate=learning_rate, decay_rate=decay_rate)
with tf.Session() as sess:
    #sess.run(tf.local_variables_initializer())
    sess.run(tf.global_variables_initializer())
    
    for variable in tf.trainable_variables():
        print(variable.name, variable.shape)
        #print(sess.run(variable))

conv1d/kernel:0 (15, 101, 256)
conv1d/bias:0 (256,)
batch_normalization/gamma:0 (256,)
batch_normalization/beta:0 (256,)
GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/gates/kernel:0 (384, 256)
GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/gates/bias:0 (256,)
GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/candidate/kernel:0 (384, 128)
GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/candidate/bias:0 (128,)
GRU_1/batch_normalization/gamma:0 (128,)
GRU_1/batch_normalization/beta:0 (128,)
GRU_2/rnn/multi_rnn_cell/cell_0/gru_cell/gates/kernel:0 (256, 256)
GRU_2/rnn/multi_rnn_cell/cell_0/gru_cell/gates/bias:0 (256,)
GRU_2/rnn/multi_rnn_cell/cell_0/gru_cell/candidate/kernel:0 (256, 128)
GRU_2/rnn/multi_rnn_cell/cell_0/gru_cell/candidate/bias:0 (128,)
GRU_2/batch_normalization/gamma:0 (128,)
GRU_2/batch_normalization/beta:0 (128,)
GRU_2/logits/weights:0 (128, 1)
GRU_2/logits/biases:0 (1,)


In [19]:
tf.reset_default_graph()
model = trigger_model(batch_size=batch_size, Tx=Tx, n_freq=n_freq, Ty=Ty, gru_size=gru_size, gru_layers=gru_layers,
                     learning_rate=learning_rate, decay_rate=decay_rate)
extra_graphkeys_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

epochs = 1
saver = tf.train.Saver() 

train_acc = []

with tf.Session() as sess:
    sess.run(tf.local_variables_initializer())
    sess.run(tf.global_variables_initializer())
    
    # assign to my trainable weights the pretrained weight values
    for variable in tf.trainable_variables():

        if (variable.name=='conv1d/kernel:0'):
            sess.run(tf.assign(variable, pretrained_var_1[0]))
        elif (variable.name=='conv1d/bias:0'):
            sess.run(tf.assign(variable, pretrained_var_2[0]))
        elif (variable.name=='batch_normalization/gamma:0'):
            sess.run(tf.assign(variable, pretrained_var_3[0]))
        elif (variable.name=='batch_normalization/beta:0'):
            sess.run(tf.assign(variable, pretrained_var_4[0]))
            
        elif (variable.name=='GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/gates/kernel:0'):
            gate_kernel, _=gate_and_candidate_kernel(pretrained_gru1_kernel,
                                                                    pretrained_gru1_recurrent_kernel)
            sess.run(tf.assign(variable, gate_kernel))
        elif (variable.name=='GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/candidate/kernel:0'):
            _, candidate_kernel=gate_and_candidate_kernel(pretrained_gru1_kernel,
                                                                    pretrained_gru1_recurrent_kernel)
            sess.run(tf.assign(variable, candidate_kernel))
        elif (variable.name=='GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/gates/bias:0'):
            bias_1,_=convert_bias(pretrained_gru1_bias[0])
            sess.run(tf.assign(variable, bias_1))
        elif (variable.name=='GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/candidate/bias:0'):
            _,bias_2=convert_bias(pretrained_gru1_bias[0])
            sess.run(tf.assign(variable, bias_2))
            
        elif (variable.name=='GRU_1/batch_normalization/gamma:0'):
            sess.run(tf.assign(variable, pretrained_var_8[0]))
        elif (variable.name=='GRU_1/batch_normalization/beta:0'):
            sess.run(tf.assign(variable, pretrained_var_9[0]))
            
        elif (variable.name=='GRU_2/rnn/multi_rnn_cell/cell_0/gru_cell/gates/kernel:0'):
            gate_kernel, _=gate_and_candidate_kernel(pretrained_gru2_kernel,
                                                                    pretrained_gru2_recurrent_kernel)
            sess.run(tf.assign(variable, gate_kernel))
        elif (variable.name=='GRU_2/rnn/multi_rnn_cell/cell_0/gru_cell/candidate/kernel:0'):
            _, candidate_kernel=gate_and_candidate_kernel(pretrained_gru2_kernel,
                                                                    pretrained_gru2_recurrent_kernel)
            sess.run(tf.assign(variable, candidate_kernel))
        elif (variable.name=='GRU_2/rnn/multi_rnn_cell/cell_0/gru_cell/gates/bias:0'):
            bias_1,_=convert_bias(pretrained_gru2_bias[0])
            sess.run(tf.assign(variable, bias_1))
        elif (variable.name=='GRU_2/rnn/multi_rnn_cell/cell_0/gru_cell/candidate/bias:0'):
            _,bias_2=convert_bias(pretrained_gru2_bias[0])
            sess.run(tf.assign(variable, bias_2))
            
        elif (variable.name=='GRU_2/batch_normalization/gamma:0'):
            sess.run(tf.assign(variable, pretrained_var_13[0]))
        elif (variable.name=='GRU_2/batch_normalization/beta:0'):
            sess.run(tf.assign(variable, pretrained_var_14[0]))
            
        elif (variable.name=='GRU_2/logits/weights:0'):
            sess.run(tf.assign(variable, pretrained_var_15[0]))
        elif (variable.name=='GRU_2/logits/biases:0'):
            sess.run(tf.assign(variable, pretrained_var_16[0]))
            
            
    for e in range(epochs):
        iteration = 1
        state_gru_cell_1 = sess.run(model.initial_state_gru_cell_1) 
        state_gru_cell_2 = sess.run(model.initial_state_gru_cell_2)
        
        for (x, y) in get_batches(X, Y, batch_size):
            feed = {model.inputs_: x,
                     model.targets_: y,
                     model.keep_prob: 0.2,
                     model.training: True,
                     model.initial_state_gru_cell_1: state_gru_cell_1,
                     model.initial_state_gru_cell_2: state_gru_cell_2}
      
            batch_loss, batch_acc, batch_acc_op,state_gru_cell_2,state_gru_cell_1,_,_=sess.run([model.loss,
                                                model.acc, model.acc_op,
                                                model.final_state_gru_cell_2,
                                                model.final_state_gru_cell_1,
                                                model.optimizer, 
                                                 extra_graphkeys_update_ops],feed_dict=feed)
            
            train_acc.append(batch_acc_op)
            
            if iteration%1==0:
                print("Epoch: {}/{}".format(e+1, epochs),
                      "Iteration: {}".format(iteration),
                      "Train loss: {:.3f}".format(batch_loss),
                      "Train accuracy: {:.4f}".format(batch_acc_op)
                     )
                
            iteration +=1
            
    print("Train accuracy (mean): {:.4f}".format(np.mean(train_acc)))
    saver.save(sess, "checkpoints/trigger_word_detection.ckpt")

Epoch: 1/1 Iteration: 1 Train loss: 0.075 Train accuracy: 0.9811
Epoch: 1/1 Iteration: 2 Train loss: 0.065 Train accuracy: 0.9799
Epoch: 1/1 Iteration: 3 Train loss: 0.066 Train accuracy: 0.9792
Epoch: 1/1 Iteration: 4 Train loss: 0.090 Train accuracy: 0.9755
Epoch: 1/1 Iteration: 5 Train loss: 0.254 Train accuracy: 0.9672
Train accuracy (mean): 0.9766


In [20]:
# to confirm that variables are not trained during testing phase, get tha velaue of trainable variable

tf.reset_default_graph()
model = trigger_model(batch_size=batch_size, Tx=Tx, n_freq=n_freq, Ty=Ty, gru_size=gru_size, gru_layers=gru_layers,
                     learning_rate=learning_rate, decay_rate=decay_rate)

with tf.Session() as sess:
    saver = tf.train.Saver()
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))

    for variable in tf.trainable_variables():
        if (variable.name=='GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/gates/kernel:0'):
            train_var=sess.run(variable)

## Testing

In [21]:
tf.reset_default_graph()
model = trigger_model(batch_size=batch_size, Tx=Tx, n_freq=n_freq, Ty=Ty, gru_size=gru_size, gru_layers=gru_layers,
                     learning_rate=learning_rate, decay_rate=decay_rate)


test_acc=[]

with tf.Session() as sess:
    saver = tf.train.Saver()
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    sess.run(tf.local_variables_initializer())
    
    state_gru_cell_1 = sess.run(model.initial_state_gru_cell_1) 
    state_gru_cell_2 = sess.run(model.initial_state_gru_cell_2)
    
    iteration=1   
    for (x, y) in get_batches(X_dev, Y_dev, batch_size):
        feed = {model.inputs_: x,
                     model.targets_: y,
                     model.keep_prob: 1,
                     model.training: False,
                     model.initial_state_gru_cell_1: state_gru_cell_1,
                     model.initial_state_gru_cell_2: state_gru_cell_2}
      
        batch_loss, batch_acc, batch_acc_op,state_gru_cell_2,state_gru_cell_1=sess.run([model.loss,
                                                model.acc, model.acc_op,
                                                model.final_state_gru_cell_2,
                                                model.final_state_gru_cell_1],feed_dict=feed)
        
        if iteration%1==0:
            print("Iteration: {}".format(iteration),
                      "Test loss: {:.3f}".format(batch_loss),
                      "Test accuracy: {:.4f}".format(batch_acc_op)
                     )
        iteration +=1

        test_acc.append(batch_acc_op)
    print("Test accuracy (mean): {:.4f}".format(np.mean(test_acc)))
    
    # to confirm that variables are not trained during testing phase
    for variable in tf.trainable_variables():
        if (variable.name=='GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/gates/kernel:0'):
            test_var=sess.run(variable)

Iteration: 1 Test loss: 0.699 Test accuracy: 0.9271
Iteration: 2 Test loss: 0.643 Test accuracy: 0.9306
Iteration: 3 Test loss: 0.649 Test accuracy: 0.9318
Iteration: 4 Test loss: 0.702 Test accuracy: 0.9305
Iteration: 5 Test loss: 0.634 Test accuracy: 0.9313
Test accuracy (mean): 0.9303


In [22]:
# to confirm that variables are not trained during testing phase
np.array_equal(train_var,test_var)

True

## Number of trainable parameters

In [23]:
model = trigger_model(batch_size=batch_size, Tx=Tx, n_freq=n_freq, Ty=Ty, gru_size=gru_size, gru_layers=gru_layers,
                     learning_rate=learning_rate, decay_rate=decay_rate)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    total_parameters = 0
    for variable in tf.trainable_variables():
        print(variable)
        # shape is an array of tf.Dimension
        shape = variable.get_shape()
        print('shape of weight matrix: ',shape)
        #print(len(shape))
        variable_parameters = 1
        for dim in shape:
            #print(dim)
            variable_parameters *= dim.value
        print('number of trainable parameters: ',variable_parameters)
        print('------------------------------')
        total_parameters += variable_parameters
    print('total number of trainable parameters: ',total_parameters)

<tf.Variable 'conv1d/kernel:0' shape=(15, 101, 256) dtype=float32_ref>
shape of weight matrix:  (15, 101, 256)
number of trainable parameters:  387840
------------------------------
<tf.Variable 'conv1d/bias:0' shape=(256,) dtype=float32_ref>
shape of weight matrix:  (256,)
number of trainable parameters:  256
------------------------------
<tf.Variable 'batch_normalization/gamma:0' shape=(256,) dtype=float32_ref>
shape of weight matrix:  (256,)
number of trainable parameters:  256
------------------------------
<tf.Variable 'batch_normalization/beta:0' shape=(256,) dtype=float32_ref>
shape of weight matrix:  (256,)
number of trainable parameters:  256
------------------------------
<tf.Variable 'GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/gates/kernel:0' shape=(384, 256) dtype=float32_ref>
shape of weight matrix:  (384, 256)
number of trainable parameters:  98304
------------------------------
<tf.Variable 'GRU_1/rnn/multi_rnn_cell/cell_0/gru_cell/gates/bias:0' shape=(256,) dtype=float32