Copyright (C) 2019 Software Platform Lab, Seoul National University

Licensed under the Apache License, Version 2.0 (the "License"); 

you may not use this file except in compliance with the License. 

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 

Unless required by applicable law or agreed to in writing, software 

distributed under the License is distributed on an "AS IS" BASIS, 


WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 


See the License for the specific language governing permissions and


limitations under the License.

**Colab 101**

Colab is a free Jupyter notebook environment offered by Google Research. 

>[3-4. RNN](#scrollTo=Ki_RHIwPJvyn)

>>[Recurrent neural network](#scrollTo=s97fwEOoiXk3)

>>[LSTM](#scrollTo=8p7tgEPijeSd)

>>[Prepare PTB dataset](#scrollTo=V5N1npMVQqJz)

>>[Define input preprocessing functions](#scrollTo=xrE4WkZNUn9o)

>>>[Quiz 1](#scrollTo=cnNhG69dgWsA)

>>>[Quiz 2](#scrollTo=piRAhFIGGRc_)

>>[Build RNN model](#scrollTo=a4UKKwwsZa4I)

>>>[Setting hyperparameters](#scrollTo=5zhtEo46Hatf)

>>>[Word embeddings](#scrollTo=M8UKfIQ8FdFy)

>>>[Quiz 3](#scrollTo=sGJ_-LgwIige)

>>>[Define RNN graph](#scrollTo=SWyh4MHCtgJ4)

>>>[Quiz 4](#scrollTo=HJbfZ_i8xUf7)

>>>[Define loss](#scrollTo=9Vyd8rYE7JTv)

>>>[Define optimizer(train_op)](#scrollTo=Lpb3sfKO7MA7)

>>[Run RNN model](#scrollTo=MA_-1EhV8hSr)

>>[Visualize using TensorBoard](#scrollTo=C7FK5eg_4WCy)

>>>[Preparation: Setting Up TensorBoard](#scrollTo=2_oD0_sK5hm6)

>>>[Visualize the graph](#scrollTo=trtXkJsH6i6i)

>>[Visualize the perplexity](#scrollTo=Dy-9kJge87po)



#3-4. RNN

## Recurrent neural network


*   Learn sequential data
*   Ex. prediction of a word after a partial sentence, understanding of the current scene in a video based on previous scences

![RNN cell](https://drive.google.com/uc?id=1hvEtbzjuT8hxBtNNTrBRMxthCcfFN5FG) 
출처: https://colah.github.io/posts/2015-08-Understanding-LSTMs



## LSTM


* Gradient vanishing problem: during backpropagation, as gradient is calculated by chain rule, the final grandient becomes almost zero
* Long short-term memory: solves gradient vanishing problem and handles long-term dependencies
![LSTM](https://drive.google.com/uc?id=1W9JKubIgJoyvzQy4U8KGNQWcVziu_6fT)
출처: https://colah.github.io/posts/2015-08-Understanding-LSTMs


Let's learn simple LSTM model for language modeling. The code comes from [TF-RNN tutorial](https://github.com/tensorflow/models/tree/master/tutorials/rnn/ptb)

##  Prepare PTB dataset

PTB is dataset widely used for natural language processing(NLP). It annotates syntactic or semantic label as a tree structure. A leaf node is matching to a word. 

In [1]:
#@title Click to download PTB
!rm -rf data*
!rm -rf simple-examples*
!wget http://www.fit.vutbr.cz/%7Eimikolov/rnnlm/simple-examples.tgz
!tar -xzvf simple-examples.tgz
!mv simple-examples/data ./
!ls

--2019-05-03 07:20:26--  http://www.fit.vutbr.cz/%7Eimikolov/rnnlm/simple-examples.tgz
Resolving www.fit.vutbr.cz (www.fit.vutbr.cz)... 147.229.9.23, 2001:67c:1220:809::93e5:917
Connecting to www.fit.vutbr.cz (www.fit.vutbr.cz)|147.229.9.23|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 34869662 (33M) [application/x-gtar]
Saving to: ‘simple-examples.tgz’


2019-05-03 07:20:35 (3.64 MB/s) - ‘simple-examples.tgz’ saved [34869662/34869662]

./
./simple-examples/
./simple-examples/data/
./simple-examples/data/ptb.test.txt
./simple-examples/data/ptb.train.txt
./simple-examples/data/ptb.valid.txt
./simple-examples/data/README
./simple-examples/data/ptb.char.train.txt
./simple-examples/data/ptb.char.test.txt
./simple-examples/data/ptb.char.valid.txt
./simple-examples/models/
./simple-examples/models/swb.ngram.model
./simple-examples/models/swb.rnn.model
./simple-examples/models/README
./simple-examples/rnnlm-0.2b/
./simple-examples/rnnlm-0.2b/CHANGE.log
./simple-exam

## Define input preprocessing functions


*   Mapping a word into an identifer



In [0]:
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================


"""Utilities for parsing PTB text files."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import os
import sys

import tensorflow as tf

Py3 = sys.version_info[0] == 3

def _read_words(filename):
  with tf.gfile.GFile(filename, "r") as f:
    if Py3:
      return f.read().replace("\n", "<eos>").split()
    else:
      return f.read().decode("utf-8").replace("\n", "<eos>").split()


def _build_vocab(filename):
  data = _read_words(filename)

  counter = collections.Counter(data)
  count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

  words, _ = list(zip(*count_pairs))
  word_to_id = dict(zip(words, range(len(words))))

  return word_to_id


def _file_to_word_ids(filename, word_to_id):
  data = _read_words(filename)
  return [word_to_id[word] for word in data if word in word_to_id]


def ptb_raw_data(data_path=None):
  """Load PTB raw data from data directory "data_path".

  Reads PTB text files, converts strings to integer ids,
  and performs mini-batching of the inputs.

  The PTB dataset comes from Tomas Mikolov's webpage:

  http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz

  Args:
    data_path: string path to the directory where simple-examples.tgz has
      been extracted.

  Returns:
    tuple (train_data, valid_data, test_data, vocabulary)
    where each of the data objects can be passed to PTBIterator.
  """

  train_path = os.path.join(data_path, "ptb.train.txt")
  valid_path = os.path.join(data_path, "ptb.valid.txt")
  test_path = os.path.join(data_path, "ptb.test.txt")

  word_to_id = _build_vocab(train_path)
  train_data = _file_to_word_ids(train_path, word_to_id)
  valid_data = _file_to_word_ids(valid_path, word_to_id)
  test_data = _file_to_word_ids(test_path, word_to_id)
  vocabulary = len(word_to_id)
  return train_data, valid_data, test_data, vocabulary


def ptb_producer(raw_data, batch_size, num_steps, name=None):
  """Iterate on the raw PTB data.

  This chunks up raw_data into batches of examples and returns Tensors that
  are drawn from these batches.

  Args:
    raw_data: one of the raw data outputs from ptb_raw_data.
    batch_size: int, the batch size.
    num_steps: int, the number of unrolls.
    name: the name of this operation (optional).

  Returns:
    A pair of Tensors, each shaped [batch_size, num_steps]. The second element
    of the tuple is the same data time-shifted to the right by one.

  Raises:
    tf.errors.InvalidArgumentError: if batch_size or num_steps are too high.
  """
  with tf.name_scope(name, "PTBProducer", [raw_data, batch_size, num_steps]):
    raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32)

    data_len = tf.size(raw_data)
    batch_len = data_len // batch_size
    data = tf.reshape(raw_data[0 : batch_size * batch_len],
                      [batch_size, batch_len])

    epoch_size = (batch_len - 1) // num_steps
    assertion = tf.assert_positive(
        epoch_size,
        message="epoch_size == 0, decrease batch_size or num_steps")
    with tf.control_dependencies([assertion]):
      epoch_size = tf.identity(epoch_size, name="epoch_size")

    i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()
    x = tf.strided_slice(data, [0, i * num_steps],
                         [batch_size, (i + 1) * num_steps])
    x.set_shape([batch_size, num_steps])
    y = tf.strided_slice(data, [0, i * num_steps + 1],
                         [batch_size, (i + 1) * num_steps + 1])
    y.set_shape([batch_size, num_steps])
    return x, y
  
class PTBInput(object):
    """The input data."""

    def __init__(self, config, data, name=None):
      self.batch_size = batch_size = config.batch_size
      self.num_steps = num_steps = config.num_steps
      self.epoch_size = ((len(data) // batch_size) - 1) // num_steps
      self.input_data, self.targets = ptb_producer(
          data, batch_size, num_steps, name=name)

### Quiz 1
Find the identifier of the word "market" using the defined functions above. (data_path='data')

In [3]:
data_path='data'
train_path = os.path.join(data_path, "ptb.train.txt")
word_to_id = _build_vocab(train_path)
print(word_to_id['market'])

47


### Quiz 2
Guess how many words in x or y of ptb_producer when 
**batch_size=2 and num_steps=5.**
Then check it yourself.


In [4]:
data_path='data'
train_data, _, _, _ = ptb_raw_data(data_path)

with tf.Graph().as_default():
    # Fill below
    x,y = ptb_producer(train_data,2,5)
    with tf.Session() as sess:
      sess.run(tf.global_variables_initializer())
      tf.train.start_queue_runners(sess=sess)
      print(sess.run(x))
      print(sess.run(y))
    

Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.range(limit).shuffle(limit).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensors(tensor).repeat(num_epochs)`.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
[[9970 9971 9972 9974 9975]
 [1969    0   98   89 2254]]
[[9980 9981 9982 9983 9984]


In [5]:
print(x)
print(y)

Tensor("PTBProducer/StridedSlice:0", shape=(2, 5), dtype=int32)
Tensor("PTBProducer/StridedSlice_1:0", shape=(2, 5), dtype=int32)


Exception in thread QueueRunnerThread-PTBProducer/input_producer-PTBProducer/input_producer/input_producer_EnqueueMany:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run
    enqueue_callable()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1257, in _single_operation_run
    self._call_tf_sessionrun(None, {}, [], target_list, None)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled
	 [[{{node PTBProducer/input_producer/input_producer_EnqueueMany}}]]



## Build RNN model


### Setting hyperparameters

In [0]:
class MediumConfig(object):
  """Medium config."""
  init_scale = 0.05
  learning_rate = 1.0
  max_grad_norm = 5
  num_layers = 2
  num_steps = 35
  hidden_size = 650
  max_epoch = 6
  max_max_epoch = 39
  keep_prob = 0.5
  lr_decay = 0.8
  batch_size = 20
  vocab_size = 10000
   
config = MediumConfig()

### Word embeddings 
Convert word ids to vector representations. 

In [7]:
def inputs(input):
    size = config.hidden_size
    vocab_size = config.vocab_size
    with tf.device("/cpu:0"):
      embedding = tf.get_variable(
          "embedding", [vocab_size, size])
      inputs = tf.nn.embedding_lookup(embedding, input.input_data)

      inputs = tf.nn.dropout(inputs, config.keep_prob)
      return inputs
  
with tf.Graph().as_default():
    data_path = 'data'
    raw_data = ptb_raw_data(data_path)
    train_data, valid_data, test_data, _ = raw_data
    train_input = PTBInput(config=config, data=train_data, name="TrainInput")
    print(inputs(train_input))

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Tensor("dropout/mul:0", shape=(20, 35, 650), dtype=float32, device=/device:CPU:0)


### Quiz 3
Guess the tensor shape of output of embedding lookup when reading embedding for wordIDs=[3, 9, 20] (vocab_size=100, embedding_size=4)

Then check it yourself. 
What happens if wordIs=[3,9,200] if other conditions are the same?

In [11]:
vocab_size = 100
embedding_size = 4
with tf.Graph().as_default():
  wordIDs=tf.placeholder(dtype=tf.int32)
  embedding=tf.get_variable("embedding",[vocab_size,embedding_size])
  lookup = tf.nn.embedding_lookup(embedding,wordIDs)
  sess = tf.Session() 
  sess.run(tf.global_variables_initializer())
  print(sess.run(lookup,feed_dict={wordIDs:[3,9,20]}))
  print(sess.run(lookup,feed_dict={wordIDs:[3,9,20]}))
  # Fill below
  

[[ 0.00571907  0.16578987  0.05818084  0.05671334]
 [ 0.02451965  0.15589058 -0.0014375  -0.22457291]
 [-0.05039348 -0.12316238 -0.08555493  0.19126329]]
[[ 0.00571907  0.16578987  0.05818084  0.05671334]
 [ 0.02451965  0.15589058 -0.0014375  -0.22457291]
 [-0.05039348 -0.12316238 -0.08555493  0.19126329]]


### Define RNN graph

In [12]:
def build_rnn_graph_lstm(inputs, config, is_training=True):
    """Build the inference graph using canonical LSTM cells."""
    # Slightly better results can be obtained with forget gate biases
    # initialized to 1 but the hyperparameters of the model would need to be
    # different than reported in the paper.
    def make_cell():
      cell = tf.contrib.rnn.BasicLSTMCell(
          config.hidden_size, forget_bias=0.0, state_is_tuple=True,
          reuse=not is_training)
      if is_training and config.keep_prob < 1:
        cell = tf.contrib.rnn.DropoutWrapper(
            cell, output_keep_prob=config.keep_prob)
      return cell

    # Stacking multiple LSTMs
    cell = tf.contrib.rnn.MultiRNNCell(
        [make_cell() for _ in range(config.num_layers)], state_is_tuple=True)

    initial_state = cell.zero_state(config.batch_size, tf.float32)
    state = initial_state
    # Simplified version of tf.nn.static_rnn().
    # This builds an unrolled LSTM for tutorial purposes only.
    # In general, use tf.nn.static_rnn() or tf.nn.static_state_saving_rnn().
    #
    # The alternative version of the code below is:
    #
    # inputs = tf.unstack(inputs, num=self.num_steps, axis=1)
    # outputs, state = tf.nn.static_rnn(cell, inputs,
    #                                   initial_state=self._initial_state)
    outputs = []
    with tf.variable_scope("RNN"):
      for time_step in range(config.num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)
    output = tf.reshape(tf.concat(outputs, 1), [-1, config.hidden_size])
    return output, state, initial_state
  
  
with tf.Graph().as_default():
    data_path = 'data'
    raw_data = ptb_raw_data(data_path)
    train_data, valid_data, test_data, _ = raw_data
    train_input = PTBInput(config=config, data=train_data, name="TrainInput")
    inputs_ = inputs(train_input)
    
    output, final_state, initial_state = build_rnn_graph_lstm(inputs_, config)
    print(output)

Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.
Tensor("Reshape:0", shape=(700, 650), dtype=float32)


### Quiz 4
Define a single layer GRU RNN graph with the length of RNN cells as 10 (num_steps=10). 

**HINT: use `tf.nn.rnn_cell.GRUCell(config.hidden_size)`**

In [13]:
def build_rnn_graph_GRU(inputs, config, is_training=True):
    # Fill this function
    def make_cell():
      cell = tf.nn.rnn_cell.GRUCell(config.hidden_size,reuse=not is_training)
      if is_training and config.keep_prob < 1:
        cell = tf.contrib.rnn.DropoutWrapper(
            cell, output_keep_prob=config.keep_prob)
      return cell

    cell = make_cell()
    
    initial_state = cell.zero_state(config.batch_size, tf.float32)
    state = initial_state
    
    outputs = []
    with tf.variable_scope("RNN"):
      for time_step in range(config.num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)
    output = tf.reshape(tf.concat(outputs, 1), [-1, config.hidden_size])
    return output, state, initial_state

num_steps = 10
new_config = MediumConfig()
new_config.num_steps = num_steps
with tf.Graph().as_default():
    data_path = 'data'
    raw_data = ptb_raw_data(data_path)
    train_data, valid_data, test_data, _ = raw_data
    train_input = PTBInput(config=new_config, data=train_data, name="TrainInput")
    
    output, final_state, initial_state  = build_rnn_graph_GRU(inputs(train_input), new_config)
    print(output)
          

Instructions for updating:
This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.
Tensor("Reshape:0", shape=(200, 650), dtype=float32)


### Define loss 
`tf.contrib.seq2seq.sequence_loss`: Weighted cross-entropy loss for a sequence of logits.

In [0]:
def loss(config, input_, output):
  softmax_w = tf.get_variable(
        "softmax_w", [config.hidden_size, config.vocab_size], tf.float32)
  softmax_b = tf.get_variable("softmax_b", [config.vocab_size], tf.float32)
  logits = tf.nn.xw_plus_b(output, softmax_w, softmax_b)
  # Reshape logits to be a 3-D tensor for sequence loss
  logits = tf.reshape(logits, [config.batch_size, config.num_steps, config.vocab_size])

  # Use the contrib sequence loss and average over the batches
  loss = tf.contrib.seq2seq.sequence_loss(
        logits,
        input_.targets,
        tf.ones([config.batch_size, config.num_steps], tf.float32),
        average_across_timesteps=False,
        average_across_batch=True)
   
  cost = tf.reduce_sum(loss)
  return cost
    

### Define optimizer(train_op)

In [0]:
def optimizer(cost):
  tvars = tf.trainable_variables()
  # 특정 iteration 에서 haunting 안되게 해서 전체적으로 특정 iteration에 의한 학습 실패를방지할 때 사용 하는거란다.
  grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
                                    config.max_grad_norm)
  optimizer = tf.train.GradientDescentOptimizer(1.0)
  train_op = optimizer.apply_gradients(
        zip(grads, tvars),
        global_step=tf.train.get_or_create_global_step())
  return train_op

## Run RNN model


In [16]:
import numpy as np
import time

def define_graph(config):
  data_path = 'data'
  raw_data = ptb_raw_data(data_path)
  train_data, valid_data, test_data, _ = raw_data
  train_input = PTBInput(config=config, data=train_data, name="TrainInput")
  initializer = tf.random_uniform_initializer(-config.init_scale,
                                              config.init_scale)
  with tf.variable_scope("Model", reuse=None, initializer=initializer):
      output, final_state, initial_state = build_rnn_graph_lstm(inputs(train_input), config)
    
      cost = loss(config, train_input, output)
      train_op = optimizer(cost)
  return cost, initial_state, final_state, train_op
      
def run(sess, cost, initial_state, final_state, train_op):
  init = tf.global_variables_initializer()
  tf.train.start_queue_runners(sess=sess)
  sess.run(init)
    
  state = sess.run(initial_state)
    
  costs = 0.0
  iters = 0
  start_time = time.time()
  for step in range(1500):
      feed_dict = {}
      for i, (c, h) in enumerate(initial_state):
        feed_dict[c] = state[i].c
        feed_dict[h] = state[i].h
      
      cost_, state, _ = sess.run((cost, final_state, train_op), feed_dict=feed_dict)
      costs += cost_
      iters += config.num_steps
      
      if step % 150 == 0:
         print("step:%d perplexity: %.3f speed: %.0f wps" %
            (step, np.exp(costs / iters),
             iters * config.batch_size /
             (time.time() - start_time)))


config =  MediumConfig()
with tf.Graph().as_default():
    cost, initial_state, final_state, train_op = define_graph(config)
   
    sess = tf.Session()
    run(sess, cost, initial_state, final_state, train_op)

Instructions for updating:
Use tf.cast instead.
step:0 perplexity: 9999.885 speed: 1025 wps
step:150 perplexity: 1144.049 speed: 10589 wps
step:300 perplexity: 808.168 speed: 10963 wps
step:450 perplexity: 649.722 speed: 11069 wps
step:600 perplexity: 556.796 speed: 11093 wps
step:750 perplexity: 490.870 speed: 11078 wps
step:900 perplexity: 441.449 speed: 11079 wps
step:1050 perplexity: 407.333 speed: 11058 wps
step:1200 perplexity: 376.908 speed: 11031 wps
step:1350 perplexity: 354.444 speed: 11017 wps


## Visualize using TensorBoard

### Preparation: Setting Up TensorBoard
This setting is only for colab (you don't need to know the details). If you use command line instead of colab, you can just use **'tensorboard --logdir=<logdir> --port=<port>'**

In [17]:
#download and unzip ngrok
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip

--2019-05-03 07:32:44--  https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
Resolving bin.equinox.io (bin.equinox.io)... 52.73.94.166, 52.2.175.150, 3.209.102.29, ...
Connecting to bin.equinox.io (bin.equinox.io)|52.73.94.166|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14991793 (14M) [application/octet-stream]
Saving to: ‘ngrok-stable-linux-amd64.zip.1’


2019-05-03 07:32:44 (65.6 MB/s) - ‘ngrok-stable-linux-amd64.zip.1’ saved [14991793/14991793]

Archive:  ngrok-stable-linux-amd64.zip
replace ngrok? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: ngrok                   


In [0]:
#run tensorboard
LOG_DIR = './log'
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)
#run ngrok
get_ipython().system_raw('./ngrok http 6006 &')

### Visualize the graph

In [0]:
config =  MediumConfig()
with tf.Graph().as_default():
    define_graph(config)
    
    writer = tf.summary.FileWriter('./log', tf.get_default_graph())

In [20]:
#get the URL for tensorboard
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

https://9c803642.ngrok.io


## Visualize the perplexity

In [0]:
def run_with_summary(sess, cost, initial_state, final_state, train_op):
  writer = tf.summary.FileWriter('./log', tf.get_default_graph())
  init = tf.global_variables_initializer()
  tf.train.start_queue_runners(sess=sess)
  sess.run(init)
    
  state = sess.run(initial_state)
    
  costs = 0.0
  iters = 0
  start_time = time.time()
  for step in range(1500):
      feed_dict = {}
      for i, (c, h) in enumerate(initial_state):
        feed_dict[c] = state[i].c
        feed_dict[h] = state[i].h
      
      cost_, state, _ = sess.run((cost, final_state, train_op), feed_dict=feed_dict)
      costs += cost_
      iters += config.num_steps
      
      #add summary
      perplexity_summ = tf.Summary()
      perplexity_summ.value.add(
        tag='perplexity', simple_value=np.exp(costs/iters))
      
      writer.add_summary(perplexity_summ, step)
      if step % 150 == 0:
         print("step:%d perplexity: %.3f speed: %.0f wps" %
            (step, np.exp(costs / iters),
             iters * config.batch_size /
             (time.time() - start_time)))
          
        


In [22]:

config =  MediumConfig()
with tf.Graph().as_default():
    cost, initial_state, final_state, train_op = define_graph(config)
   
    sess = tf.Session()
    run_with_summary(sess, cost, initial_state, final_state, train_op)
    
    

step:0 perplexity: 9990.830 speed: 1701 wps
step:150 perplexity: 1203.123 speed: 10548 wps
step:300 perplexity: 835.977 speed: 10756 wps
step:450 perplexity: 665.731 speed: 10817 wps
step:600 perplexity: 568.895 speed: 10850 wps
step:750 perplexity: 500.004 speed: 10856 wps
step:900 perplexity: 448.743 speed: 10861 wps
step:1050 perplexity: 413.395 speed: 10842 wps
step:1200 perplexity: 382.214 speed: 10838 wps
step:1350 perplexity: 359.225 speed: 10845 wps
