# BitTiger DS501 Week 13

# Sequence to Sequence Models in TensorFlow

把 sequence of words/characters 餵給 RNN，依照先前的單字來預測下一個單字出現的機率，然後一次產生一個單字。


## Preprocessing

用全唐詩當輸入，但是要先做 preprocessing，把詩的標題，空白字元，無法正確顯示而變成小方塊的字元，標點符號，和特殊符號都拿掉

然後還要對輸入做 random permutation

In [1]:
import re
import numpy as np

input_file = '/Users/ytshen/Desktop/BitTiger_DS501/Week_13/poetry.txt'
poems = []
with open(input_file) as f:
    for line in f.readlines(): # 把全文讀到一個大大的 list 裡面，每一行是 list 中的一個元素
        line = line.strip() # 兩端的空白要拿掉

        # remove title
        if ':' in line:
            line = line.split(':') # 詩的標題:詩的內容
        if len(line) == 2:
            poem = line[1] # 只取詩的內容
        else:
            continue

        # 有特殊符號的就不要
        if re.search(r'[(（《_□]', poem):
            continue

        # 太長或太短的都不要
        if len(poem) < 5 or len(poem) > 40: # 把宋詞和元曲等等的字太多或太少的拿掉
            continue

        # 刪除標點符號
        poem = re.sub(u'[，。]', '', poem)

        poems.append(poem)


# 打亂詩的順序
poems = np.random.permutation(poems)

檢查看看有幾首詩會變成真正的輸入

In [2]:
print('Number of poems = ', len(poems))

Number of poems =  11103


### Train test split

In [3]:
train = poems[:-10]
test = poems[-10:] # 只用最後 10 首詩當作 test

print(len(train))
print(len(test))

11093
10


### Map words to IDs

#### 所有 NLP 的第一個步驟都是把 words 轉成 IDs

要把每個單字編號，建立一個字典來把單字和編號對應，也要建一個能由編號找單字的字典

單字 map 到 IDs 基本上是依照單字出現的頻率來決定的

In [4]:
import time
from tensorflow.keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer(num_words=None, lower=False, char_level=True)
tokenizer.fit_on_texts(poems) # 產生 words to IDs
word_index = tokenizer.word_index # 獲得 words to IDs 的 dictionary

print("Number of unique chars: {}".format(len(word_index)))

Number of unique chars: 4761


In [5]:
print(word_index)

{'不': 1, '人': 2, '一': 3, '山': 4, '风': 5, '无': 6, '花': 7, '来': 8, '日': 9, '春': 10, '上': 11, '何': 12, '水': 13, '时': 14, '月': 15, '中': 16, '云': 17, '有': 18, '年': 19, '天': 20, '君': 21, '是': 22, '知': 23, '夜': 24, '秋': 25, '相': 26, '得': 27, '归': 28, '江': 29, '处': 30, '长': 31, '为': 32, '见': 33, '今': 34, '如': 35, '去': 36, '明': 37, '里': 38, '白': 39, '心': 40, '自': 41, '下': 42, '在': 43, '家': 44, '行': 45, '生': 46, '声': 47, '尽': 48, '青': 49, '此': 50, '向': 51, '前': 52, '三': 53, '南': 54, '门': 55, '千': 56, '空': 57, '万': 58, '事': 59, '未': 60, '落': 61, '头': 62, '看': 63, '玉': 64, '西': 65, '清': 66, '多': 67, '雨': 68, '客': 69, '高': 70, '寒': 71, '飞': 72, '满': 73, '到': 74, '欲': 75, '莫': 76, '回': 77, '流': 78, '更': 79, '城': 80, '东': 81, '别': 82, '金': 83, '树': 84, '开': 85, '独': 86, '朝': 87, '入': 88, '愁': 89, '子': 90, '似': 91, '红': 92, '应': 93, '深': 94, '将': 95, '草': 96, '阳': 97, '还': 98, '谁': 99, '酒': 100, '闲': 101, '烟': 102, '路': 103, '道': 104, '新': 105, '马': 106, '出': 107, '与': 108, '身': 109, '望': 110, '闻': 11

word_index 的 id 是從 1 開始，但是我們要從 0 開始。正好需要補上 `<PAD>` 當成 0 所以沒問題

只是要注意 words 和 indices 的數目要一樣，才能一對一的對應

In [6]:
word_index['<PAD>'] = 0

#### 也需要一個 IDs 轉回 words 的 map

In [7]:
# Map IDs to words
reverse_word_index = {v:k for k, v in word_index.items()}

In [8]:
print(reverse_word_index)

{1: '不', 2: '人', 3: '一', 4: '山', 5: '风', 6: '无', 7: '花', 8: '来', 9: '日', 10: '春', 11: '上', 12: '何', 13: '水', 14: '时', 15: '月', 16: '中', 17: '云', 18: '有', 19: '年', 20: '天', 21: '君', 22: '是', 23: '知', 24: '夜', 25: '秋', 26: '相', 27: '得', 28: '归', 29: '江', 30: '处', 31: '长', 32: '为', 33: '见', 34: '今', 35: '如', 36: '去', 37: '明', 38: '里', 39: '白', 40: '心', 41: '自', 42: '下', 43: '在', 44: '家', 45: '行', 46: '生', 47: '声', 48: '尽', 49: '青', 50: '此', 51: '向', 52: '前', 53: '三', 54: '南', 55: '门', 56: '千', 57: '空', 58: '万', 59: '事', 60: '未', 61: '落', 62: '头', 63: '看', 64: '玉', 65: '西', 66: '清', 67: '多', 68: '雨', 69: '客', 70: '高', 71: '寒', 72: '飞', 73: '满', 74: '到', 75: '欲', 76: '莫', 77: '回', 78: '流', 79: '更', 80: '城', 81: '东', 82: '别', 83: '金', 84: '树', 85: '开', 86: '独', 87: '朝', 88: '入', 89: '愁', 90: '子', 91: '似', 92: '红', 93: '应', 94: '深', 95: '将', 96: '草', 97: '阳', 98: '还', 99: '谁', 100: '酒', 101: '闲', 102: '烟', 103: '路', 104: '道', 105: '新', 106: '马', 107: '出', 108: '与', 109: '身', 110: '望', 111: '闻

抽樣檢查一下

In [9]:
random_num = np.random.randint(len(word_index), size=20)
print(random_num)

for k, v in word_index.items():
    if v in random_num:
        print('{} {}'.format(k, v))

[3146 3158 3873 2996 1885 1057 3286 3824  941 4721 1746 3407  144 4330
 4080  297 3932 2031 2953 2423]
可 144
杨 297
祥 941
滩 1057
眇 1746
循 1885
予 2031
甫 2423
绩 2953
钻 2996
癖 3146
薤 3158
涴 3286
巘 3407
濠 3824
龌 3873
驳 3932
滚 4080
瀺 4330
毙 4721


### Perpare input data

tokenizer fits 完有 words to IDs 了，要套用到 train 和 test 上得到各自的 words to IDs dictionary

In [10]:
X_train = tokenizer.texts_to_sequences(train)
X_test = tokenizer.texts_to_sequences(test)

# 檢查結果
print(X_train[0])
print(''.join([reverse_word_index[i] for i in X_train[0]]))

[133, 61, 73, 188, 220, 392, 55, 753, 743, 94, 395, 19, 909, 349, 141, 34, 9, 708, 52, 40]
叶落满庭阴朱门试院深昔年辛苦地今日负前心


因為 `X_train` 和 `X_test` 都是 list of list，要把他們扁平化變成 list
* 要把 data 串成一個大大的 vector
* 然後把這個 vector reshape 成 matrix

In [11]:
X_train = [i for ids in X_train for i in ids]
X_test = [i for ids in X_test for i in ids]

# 檢查結果
print(X_train[:20])
print(''.join([reverse_word_index[i] for i in X_train[:20]]))

print(X_test[:20])
print(''.join([reverse_word_index[i] for i in X_test[:20]]))

[133, 61, 73, 188, 220, 392, 55, 753, 743, 94, 395, 19, 909, 349, 141, 34, 9, 708, 52, 40]
叶落满庭阴朱门试院深昔年辛苦地今日负前心
[70, 354, 12, 545, 643, 42, 1331, 1, 1310, 576, 1323, 2087, 441, 638, 420, 1139, 97, 760, 1708, 309]
高才何必贵下位不妨贤孟简虽持节襄阳属浩然


## Define an input object

data 串成一個大大的 vector 之後要
1. Reshape 成一個 matrix
2. 要切成一個一個的 batch
   * 所以要計算 batch size 和 batch length
   * 也要算 epoch size

輸出是輸入往旁邊移動一個單位

In [12]:
import tensorflow as tf

class PoemInput(object):
    def __init__(self, data, config, name=None):
        self.batch_size = batch_size = config.batch_size
        self.num_steps = num_steps = config.num_steps
        self.epoch_size = ((len(data) // batch_size) - 1) // num_steps # 注意這邊都是用整數除法
        self.sources, self.targets = self.input_producer(data, batch_size, num_steps, name=name)
        
    def input_producer(self, raw_data, batch_size, num_steps, name=None):
        '''
        Reshape the poem data to form input and output.
        
        Args:
        raw_data: a list of words
        batch_size: int, the batch size
        num_steps: int, the sequence length
        
        Returns:
        A pair of Tensors, each shaped [batch_size, num_steps].
        The second element of the tuple is the same data time-shifted to the right by one.
        '''
        raw_data = tf.convert_to_tensor(raw_data, name='raw_data', dtype=tf.int32) # op: 把 input 轉成 tensor 的格式
        # Get size of the 1-d tensor
        data_len = tf.size(raw_data)
        # Calculate how many batches
        batch_len = data_len // batch_size
        # Crop data that does not fit in batch
        data = tf.reshape(raw_data[0: batch_size * batch_len], [batch_size, batch_len])
        # Calculate how many batches in an epoch
        epoch_size = (batch_len - 1) // num_steps
        
        # Make sure there is at least one batch
        assertion = tf.assert_positive(epoch_size, message='epoch_size=0, decrease batch_size or num_steps') # 至少要有一個 batch
        with tf.control_dependencies([assertion]):
            epoch_size = tf.cast(tf.identity(epoch_size, name='epoch_size'), tf.int64)
            
        # Start generating slices
        # range_input_producer returns a sequence of IDs
        i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue() # 相當於 for i in range(epoch_size) 但是要用 tensorfolw 的方式，因為這是在 build graph
        x = data[:, i * num_steps : (i + 1) * num_steps]
        y = data[:, i * num_steps + 1 : (i + 1) * num_steps + 1] # 輸出是輸入往旁邊移動一個單位
        
        return x, y

## Define hyperparameters

In [13]:
class Hparam(object):
    learning_rate = 1.0
    max_grad_norm = 5
    num_layers = 1
    num_steps = 35 # 一次取多少個字當訓練資料，會影響訓練速度，用 35 是因為 5*7 = 35 (五言絕句和七言絕句)，
    vocab_size = len(word_index)
    embedding_size = 128
    hidden_size = 128 # LSTM memory 的大小
    warmup_epochs = 3 # 前 3 個 epochs 的 learning rate 固定不變，從第四個 epoch 才開始會改變 learning rate
    num_epochs_to_train = 5
    keep_prob = 0.6 # dropout 掉 0.4，保留 0.6 的資訊 (dropout 用 keras 是指定 drop 多少，tensorflow 是指定保留多少)
    lr_decay = 0.9 # 新的 learning rate 變成前一個的 0.9 倍 
    batch_size = 100 # 通常希望 batch size 大一點，但是要 GPU memory 能裝得下的大小
    
config = Hparam()

## Build model

要定義下列 Model 的結構
* Input
* Size of layers (由上面的 hyperparameter 設定好了)
* Connection between layers
* Variables in layers
* Output (要能由外部取得)
* Loss
* Operations that apply the gradients (optimizer)
* Placeholder for feeding special values (model 與其他程式溝通的管道)
* Properties that can be read from outside

MultiRNNCell 可以用來建立深度大於 1 的 RNN，就不需要自己手動建立很多層

In [14]:
from tensorflow.contrib.cudnn_rnn import CudnnLSTM
from tensorflow.contrib.rnn import BasicLSTMCell, MultiRNNCell
from tensorflow.nn import embedding_lookup, dropout

In [15]:
class MySeq2SeqModel(object):
    def __init__(self, is_training, config, input_):
        self._is_training = is_training
        self._input = input_
        self._cell = None

        # Hyperparameters
        self.batch_size = input_.batch_size
        self.num_steps = input_.num_steps
        rnn_size = config.hidden_size
        vocab_size = config.vocab_size
        embedding_size = config.embedding_size

        # Embeddings can only exist on CPU
        with tf.device('/cpu:0'): # embedding layer 只支援 CPU 操作，所以要用 with 包起來
            embedding_weights = tf.get_variable('embedding', [vocab_size, embedding_size])
            embed_inputs = tf.nn.embedding_lookup(embedding_weights, input_.sources)
            # embed_inputs 代表的值就是 word IDs 轉成的 embedding

        # 需要使用 dropout 時
        if is_training and config.keep_prob < 1.:
            embed_inputs = tf.nn.dropout(embed_inputs, config.keep_prob)

        # Build RNN using CudnnLSTM
        # 輸出的第二個參數是 state 現在用不到，所以用 _ 取代
        output, _ = self._build_rnn(embed_inputs, config, is_training)
        
        # Build RNN using basic LSTM
#         output, _ = self._build_rnn_old_lstm(embed_inputs, config, is_training)
        
        # Remember RNN output is [batch_size * time, rnn_size]
        # Dense layer for projecting onto vocabulary size
        softmax_w = tf.get_variable('softmax_w', [rnn_size, vocab_size]) # 要把 rnn_size 投影到 vocab_size 用來表示每個單字出現的機率
        softmax_b = tf.get_variable('softmax_b', [vocab_size])
        logits = tf.nn.xw_plus_b(output, softmax_w, softmax_b) # 三個參數分別對應 X, W, b，logits 代表每個字出現的機率 (沒有 normalized)
        
        # Reshape logits to be a 3-D tensor for sequence loss
        # 計算 loss 時要求用 3-D tensor，所以要把 logits 的維度改成 loss 計算需要的形狀
        logits = tf.reshape(logits, [self.batch_size, self.num_steps, vocab_size])
        self._logits = logits # 方便外部程式取用 logits 的值
        
        # Use the contrib sequences loss and average over the batches
        loss = tf.contrib.seq2seq.sequence_loss(logits, # model output
                                                input_.targets, # 正確答案
                                                tf.ones([self.batch_size, self.num_steps]), # 一個 batch 的維度構成的 weight
                                                average_across_timesteps=False, # 一般計算 sequence loss 時這兩個參數都是這樣設定
                                                average_across_batch=True)
        # Update the cost
        self._cost = tf.reduce_sum(loss) # 要把 loss 的值記住，外部程式才能取用
        
        if not is_training:
            return
        
        # A variable to store learning rate
        self._lr = tf.Variable(0.0, trainable=False) # trainable=False 表示不計算 gradient
        
        # Calculate gradients
        # Get a list of trainable variables
        tvars = tf.trainable_variables()
        # Get gradient and clip by norm
        grads, _ = tf.clip_by_global_norm(tf.gradients(self._cost, tvars), # 計算 gradient
                                          config.max_grad_norm) # 把 gradient 的值限制在 max_grad_norm 的範圍內
        
        # Define an optimizer
        # Note that the optimizer reads the value of learning rate from variable
        optimizer = tf.train.GradientDescentOptimizer(self._lr) # 把 gradient 套用在所有 variables 上面
        # Define an operation that actually applies the gradients
        self._train_op = optimizer.apply_gradients(zip(grads, tvars),
                                                   global_step=tf.train.get_or_create_global_step())
        # A placeholder for feeding new learning rates
        self._new_lr = tf.placeholder(tf.float32,
                                      shape=[], # 表示是一個 scalar 不是 vector
                                      name='new_learning_rate')
        self._lr_update_op = tf.assign(self._lr, self._new_lr) # 把 _new_lr 指派給 _lr
        
    def _build_rnn(self, inputs, config, is_training):
        # RNN requires time-major
        inputs = tf.transpose(inputs, [1, 0, 2]) # 把 batch_size 和 time 的軸互換，0, 1, 2 表示原本的軸的順序
        self._cell = CudnnLSTM(num_layers=config.num_layers,
                               num_units=config.hidden_size) # LSTM 的 memory 的寬度
        self._cell.build(inputs.get_shape()) # RNN 會把 cell 複製好幾份，加快訓練速度
        outputs, state = self._cell(inputs) # Feed/teacher forcing
        
        # Transpose from time-major to batch-major
        outputs = tf.transpose(outputs, [1, 0, 2])
        # Reshape from [batch, time, rnn_size] to [batch * time, rnn_size]
        # For computing softmax later
        outputs = tf.reshape(outputs, [-1, config.hidden_size]) # -1 表示第一個維度由 tensorflow 來決定
        return outputs, state # state 是 RNN cell memory 裡面所存放的資料 
        
    def _build_rnn_old_lstm(self, inputs, config, is_training):
        def make_cell():
            cell = BasicLSTMCell(config.hidden_size,
                                 forget_bias=0.0,
                                 state_is_tuple=True,
                                 reuse=not is_training)
            if is_training and config.keep_prob < 1:
                cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=config.keep_prob)
            return cell
        
        cell = tf.contrib.rnn.MultiRNNCell([make_cell() for _ in range(config.num_layers)],
                                           state_is_tuple=True)
        self._initial_state = cell.zero_state(config.batch_size, tf.float32)
        state = self._initial_state
        outputs = []
        inputs = tf.unstack(inputs, num=self.num_steps, axis=1)
        outputs, state = tf.nn.static_rnn(cell, inputs, initial_state=self._initial_state)
        output = tf.reshape(tf.concat(outputs, 1), [-1, config.hidden_size])
        return output, state
    
    def assign_lr(self, session, lr_value):
        session.run(self._lr_update_op, feed_dict={self._new_lr: lr_value})
        
    @property
    def input(self):
        return self._input
    
    @property
    def cost(self):
        return self._cost
    
    @property
    def lr(self):
        return self._lr
    
    @property
    def train_op(self):
        return self._train_op
    
    @property
    def logits(self):
        return self._logits

## Define training operation for an epoch

In [16]:
def run_epoch(session, model, do_op=None, verbose=False):
    start_time = time.time()
    costs = 0.0
    iters = 0
    feed_to_model_dict = {'cost' : model.cost} # 可以獲得目前 model 的 loss

    # If an operation is provided, put that in the feed
    if do_op is not None:
        feed_to_model_dict['do_op'] = do_op
        
    for step in range(model.input.epoch_size):
        # Use the session to run, feed the dictionary
        s_out = session.run(feed_to_model_dict)
        # The returned dictionary will contain the information we need
        cost = s_out['cost']
        # Accumulate cost
        costs += cost
        # Accumulate number of training steps
        iters += model.input.num_steps
        # Print loss periodically
        if verbose and (step + 1) % (model.input.epoch_size // 5) == 0:
            print('%.0f%% ppl: %.3f, speed: %.0f char/sec' %
                  ((step + 1) * 100.0 / model.input.epoch_size,
                   np.exp(costs / iters),
                   iters * model.input.batch_size / (time.time() - start_time)))
            
    return np.exp(costs / iters)

## Define a generator model

Generator model 是用來產生新的唐詩，在 generator model 的階段表示 training model 已經做好了

In [17]:
class MyGeneratorModel(object):
    def __init__(self, config):
        self._input = tf.placeholder(tf.int32, shape=[1], name='_input')
        self.batch_size = 1
        self.num_steps = config.num_steps
        rnn_size = config.hidden_size
        vocab_size = config.vocab_size
        embedding_size = config.embedding_size
        
        # Embeddings can only exist on CPU
        with tf.device('/cpu:0'):
            embedding_weights = tf.get_variable('embedding', [vocab_size, embedding_size])
            embed_inputs = tf.nn.embedding_lookup(embedding_weights, self._input)
            embed_inputs = tf.expand_dims(embed_inputs, 0)
            
        # Build RNN using CudnnLSTM
        self._cell = CudnnLSTM(num_layers=config.num_layers,
                               num_units=config.hidden_size)
        
        # Build final projection layer
        softmax_w = tf.get_variable('softmax_w', [rnn_size, vocab_size])
        softmax_b = tf.get_variable('softmax_b', [vocab_size])
        
        # Collect a sequence of output word IDs
        self._output_word_ids = []
        
        # Decode first word
        outputs, state = self._cell(embed_inputs)
        outputs = tf.reshape(outputs, [-1, config.hidden_size])
        logits = tf.nn.xw_plus_b(outputs, softmax_w, softmax_b) # 得到 word ID
        
        # Get input for next step
        next_input = tf.argmax(logits, axis=-1) # 由上一個字預測到的下一個字的 word ID
        next_input = tf.squeeze(next_input)
        self._output_word_ids.append(next_input)
        
        # Convert next input to word embeddings
        next_input = tf.nn.embedding_lookup(embedding_weights, next_input)
        next_input = tf.reshape(next_input, [1, 1, embedding_size])
        
        # Feed back to LSTM
        # 基本上和上面一模一樣
        for _ in range(self.num_steps - 1): # 上面做了一次預測，所以這邊要減 1
            outputs, state = self._cell(next_input, state)
            outputs = tf.reshape(outputs, [-1, config.hidden_size])
            logits = tf.nn.xw_plus_b(outputs, softmax_w, softmax_b)
            next_input = tf.argmax(logits, axis=-1)
            next_input = tf.squeeze(next_input)
            self._output_word_ids.append(next_input)
            
            next_input = tf.nn.embedding_lookup(embedding_weights, next_input)
            next_input = tf.reshape(next_input, [1, 1, embedding_size])
            
    @property
    def output_word_ids(self):
        return self._output_word_ids

## Define a call to generator

要有一個 decoder 用來把 ID 轉回單字

In [18]:
def decode_text(text, max_len_newline=5):
    words = [reverse_word_index.get(i, '<UNK>') for i in text]
    fixed_width_string = []
    
    for w_pos in range(len(words)):
        fixed_width_string.append(words[w_pos])
        if (w_pos + 1) % max_len_newline == 0:
            fixed_width_string.append('\n')
    
    return ''.join(fixed_width_string)

In [19]:
def run_generator(session, model, seed_word, config):
    feed_to_model_dict = {model._input : [seed_word]} # 給 model input
    fetch_model_dict = {'output_word_ids' : model.output_word_ids} # 拿 model output

    # An example of sending and receiving data from the model
    vals = session.run(fetches=fetch_model_dict, feed_dict=feed_to_model_dict)
    output_word_ids = vals['output_word_ids']
    
    # Decode to readable words
    print(decode_text([seed_word] + output_word_ids, (config.num_steps + 1) // 4))
    
    return

## Main training controller

Controller 用來
* 建立 model 用來訓練 
* 建立 model 用來測試
* 準備輸入資料
* 定義在訓練時，什麼東西要寫入到 log，可以用 TensorBoard 看
* 建立 session
* 改變 learning rate
* 獲得結果

In [20]:
def main(_):
    with tf.Graph().as_default():
        initializer = tf.random_uniform_initializer(-0.1, 0.1)
        
        with tf.name_scope('Train'):
            # Create input producer
            train_input = PoemInput(X_train, config, name='TrainInput')
            
            # Create the model instance
            with tf.variable_scope('Model', reuse=None, initializer=initializer):
                m = MySeq2SeqModel(is_training=True, config=config, input_=train_input)
            
            # Add information to logs
            # 可以用 TensorBoard 來讀取這些 logs
            tf.summary.scalar('Training_Loss', m.cost)
            tf.summary.scalar('Learning_Rate', m.lr)
            
        with tf.name_scope('Test'):
            eval_config = Hparam()
            eval_config.batch_size = 1 # test 只需要一個 batch
            eval_config.num_steps = 20
            
            # Create another input for test data
            # Note that eval config was set locally
            test_input = PoemInput(X_test, eval_config, name='TestInput')
            
            #Create another model but reuse the variables in the training model
            with tf.variable_scope('Model', reuse=True): # reuse=True 會把 training model 直接 copy 過來用
                mtest = MySeq2SeqModel(is_training=False, config=eval_config, input_=test_input)

        with tf.name_scope('Gen'):
            generator_config = Hparam()
            generator_config.batch_size = 1
            generator_config.num_steps = 19 # 五言絕句只有 20 個字
            
            # Create generator model
            with tf.variable_scope('Model', reuse=True):
                mgenerate = MyGeneratorModel(config=generator_config)
                
        # Hardware settings
        config_proto = tf.ConfigProto(allow_soft_placement=True)
        
        # Create a MonitoredTrainingSession that controls the training process
        # Also automatically logs and reports
        # Note the `checkpoint_dir` setting
        with tf.train.MonitoredTrainingSession(checkpoint_dir='logs',
                                               config=config_proto,
                                               log_step_count_steps=-1) as session:
            for i in range(config.num_epochs_to_train):
                # Calculate learning rate decay
                lr_decay = config.lr_decay ** max(i + 1 - config.warmup_epochs, 0.0)
                
                # Set learning rate
                m.assign_lr(session, config.learning_rate * lr_decay)
                
                # Print new learning rate
                print('Epoch: %d, Learning rate: %.3f' % (i + 1, session.run(m.lr)))
                
                # Train one epoch and report loss
                train_perplexity = run_epoch(session, m, do_op=m.train_op, verbose=True)
                print('Epoch: %d Train Perplexity: %.3f' % (i + 1, train_perplexity))
                
            # End of training
            # Evaluate test set performance
            test_perplexity = run_epoch(session, mtest)
            print('Test perplexity: %.3f' % test_perplexity)
            
            # Set a seed word and generate new poem
            seed_word = '天'
            run_generator(session, mgenerate, seed_word=word_index[seed_word], config=generator_config)

## Start training

多呼叫 `main()` 幾次會得到更好的結果

In [None]:
main(1)

## Clear previous output

如果有修改 model 就不能直接重跑 session，會出問題

要重跑時，以要清空 log

In [None]:
!rm -R logs

## Summary

1. Preprocessing for language modeling data
    * Create a dictionary that maps words to unique IDs
    * Convert words to ID
    * Reshape sequences to unified lengths
    * Create a helper to produce data
2. Building a model using tensorflow
    * Hyperparameters
    * Training operation
    * Testing operation
    * Control function
3. Training and evaluation
    * Observe loss
    * Evaluate on test set

## Appendix

把資料上傳到 Google Drive

In [None]:
from google.colab import drive
drive.mount('/gdrive')

In [None]:
!cp /gdrive/My\ Drive/Colab\ Notebooks/poetry.txt ./

In [None]:
***
***
***

# Attention

把 attention 加入到 seq2seq model 來做不同語言的翻譯


## CWMT corpus

下載中英文翻譯 dataset: [CWMT](http://nlp.nju.edu.cn/cwmt-wmt/)

In [None]:
import time
import numpy as np

import tensorflow as tf
from tensorflow.contrib.cudnn_rnn import CudnnLSTM
from tensorflow.contrib.rnn import BasicLSTMCell, MultiRNNCell
from tensorflow.nn import embedding_lookup, dropout
from tensorflow.keras.precessing.text import Tokenizer
from tensorflow.keras.precessing.sequence import pad_sequences

In [None]:
c_sents = [ss.strip() for ss in open('Book14_cn.txt').readlines()]

In [None]:
c_sents[0]

In [None]:
c_sents=[ss.strip() for ss in open("Book14_en.tet").readlines()]

In [None]:
c_sents[0]

In [None]:
c_tokenizer = Tokenizer(num_words=None, lower=False, char_level=True)

# Create word to ID dictionary
c_tokenizer.fit_on_texts(c_sents)

# Get dictionary
c_word_index = c_tokenizer.work_index

# Fix word to ID
c_word_index = {c: i + 1 for c, i in c_word_index.items()}
c_word_index['<PAD>'] = 0
c_word_index['<UNK>'] = 1
c_tokenizer.word_index = c_word_index
c_reverse_word_index = dict([(v, k) for (k, v) in c_word_index.items()])

In [None]:
# Sort wort index by ID
for (w, i) in sorted(c_word_index.items(), key=lambda w: w[1]):
    # print some words to check if there are errors!
    if i > 10 and i < len(c_word_index) - 5:
        continue
    print('{} {}'.format(w, i))

In [None]:
e_vocab_size = 20000
e_tokenizer = Tokenizer(num_words=e_vocab_size, lower=True, oov_token='<UNK>')

# Create word to ID dictionary
e_tokenizer.fit_on_texts(e_sents)

# Get dictionary
e_word_index = e_tokenizer.word_index

# Fix word to ID
e_word_index = {e: i + 1 for e, i in e_word_index.items() if i < e_vocab_size - 1}
e_word_index['<PAD>'] = 0
e_word_index['<UNK>'] = 1

e_tokenizer.word_index = e_word_index

e_reverse_word_index = dict([(v, k) for (k, v) in e_word_index.items()])

# sort word index by ID
for (w, i) in sorted(e_word_index.items(), key=lambda w: w[1]):
    # print some words to check if there are errors!
    if i > 10 and i < len(e_word_index) - 5:
        continue
    print('{} {}'.format(w, i))

In [None]:
c_sents = c_tokenizer.texts_to_sequences(c_sents)
e_sents = e_tokenizer.texts_to_sequences(e_sents)

c_sents = pad_sequences(c_sents, value=c_word_index['<PAD>'], padding='post', truncating='post', maxlen=10)
e_sents = pad_sequences(e_sents, value=e_word_index['<PAD>'], padding='post', truncating='post', maxlen=10)

train_data = (c_sents[:-5], e_sents[:-5])
test_data = (c_sents[-5:], e_sents[-5:])

In [None]:
train_data[0][0], train_data[0][1]

## Change hyperparameters

In [None]:
class Hparam(object):
    source_vocab_size = len(c_word_index)
    target_vocab_size = len(e_word_index)

## Prepare input for translation

In [None]:
class TranslationInput(object):
    def __init__(self, data, config, name=None):
        self.batch_size = batch_size = config.batch_size
        self.num_steps = config.num_steps
        self.sources, self.targets = self.input_producer(data, batch_size, name=name)
        
    def input_producer(self, raw_data, batch_size, name=None):
        source_data = tf.convert_to_tensor(raw_data[0], name='source_data', dtype=tf.int32)
        target_data = tf.convert_to_tensor(raw_data[0], name='target_data', dtype=tf.int32)
        
        num_batches = len(raw_data[0]) // self.batch_size
        i = tf.train.range_input_producer(num_batches, shuffle=False).dequeue()
        x = source_data[i * self.batch_size : (i + 1) * self.batch_size, :]
        y = target_data[i * self.batch_size : (i + 1) * self.batch_size, :]
        
        return x, y