# 聊天机器人教程

**作者** [Matthew Inkawhich](https://github.com/MatthewInkawhich)



在本教程中，我们将探索一个有趣的序列到序列(Seq2Seq)模型的用例。我们将使用[Cornell Movie-Dialogs Corpus](https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html)的电影脚本训练一个简单的聊天机器人。

对话模型是人工智能研究的热门话题。
聊天机器人在包括客户服务应用和在线帮助台在内的各种场景都用应用。
这些机器人通常使由基于索引的模型驱动的，它们响应特定的问题，输出预定义的响应。在像公司IT帮助台这样高度受限的领域中，这些模型可能是住够了，不过对于一半用例来说，他们是不够健壮的。
教一台机器与人类在多个领域内进行有意义的对话是一个远未解决的研究性问题。
在最近的深度学习模型热潮中，像Google[Neural Conversational Model](https://arxiv.org/abs/1506.05869)这样的强大的生成模型的出现，标志着多领域生成对话模型的一大进步。
this tutorial, we will implement this kind of model in PyTorch.
在这个教程中，我们将用`PyTorch`实现这种模型。

![bot](https://pytorch.org/tutorials/_images/bot.png)

```console
> hello?
Bot: hello .
> where am I?
Bot: you re in a hospital .
> who are you?
Bot: i m a lawyer .
> how are you doing?
Bot: i m fine .
> are you my friend?
Bot: no .
> you're under arrest
Bot: i m trying to help you !
> i'm just kidding
Bot: i m sorry .
> where are you from?
Bot: san francisco .
> it's time for me to leave
Bot: i know .
> goodbye
Bot: goodbye .
```

**教程要点：**

- 加载和预处理[Cornell Movie-Dialogs Corpus](https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html)数据集
- 使用[Luong attention mechanism(s)](https://arxiv.org/abs/1508.04025)实现一个序列到序列模型
- 使用小批次联合训练编码器和解码器模型
- 实现贪婪搜索解码模块
- 与训练后的聊天机器人互动

**致谢**

本教程借用以下来源的代码：

1. Yuan-Kuei Wu 的 pytorch-chatbot 实现:
   https://github.com/ywk991112/pytorch-chatbot

1. Sean Robertson 的 practical-pytorch seq2seq-translation 例子:
   https://github.com/spro/practical-pytorch/tree/master/seq2seq-translation

1. FloydHub 的 Cornell Movie Corpus 预处理代码:
   https://github.com/floydhub/textutil-preprocess-cornell-movie-corpus




## 准备工作

首先，从<https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html>下载ZIP文件解压缩到当前目录的`data`子目录中。

此后，我们导入一些必要的包。




In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

import torch
from torch.jit import script, trace
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
import csv
import random
import re
import os
import unicodedata
import codecs
from io import open
import itertools
import math


USE_CUDA = torch.cuda.is_available()
device = torch.device("cuda" if USE_CUDA else "cpu")
print(device)

cuda


## 数据加载与预处理

下一步，将我们之前下载的数据重新格式化，并把数据加载到我们所要使用的结构中。

[Cornell Movie-Dialogs Corpus](https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html)
是一个丰富的电影角色对话数据集：

- 来自10,292对电影角色的220,579个对话
- 来自617部电影的9,035个角色
- 总共304,713句话

这个数据集庞大且多样，其语言形式、时间段、情感都有很大变化。我们希望这种多样性使我们的模型能够适应多种形式的输入和询问。

首先，我们看几行数据文件，了解其原始格式。




In [2]:
corpus_name = "cornell movie-dialogs corpus"
corpus = os.path.join("data", corpus_name)

def printLines(file, n=10):
    with open(file, encoding='iso-8859-1') as datafile:
        for i, line in enumerate(datafile):
            if i < n:
                print(line)

printLines(os.path.join(corpus, "movie_lines.txt"))

L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!

L1044 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ They do to!

L985 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I hope so.

L984 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ She okay?

L925 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Let's go.

L924 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ Wow

L872 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Okay -- you're gonna need to learn how to lie.

L871 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ No

L870 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I'm kidding.  You know how sometimes you just become this "persona"?  And you don't know how to quit?

L869 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Like my fear of wearing pastels?



### 创建格式化数据文件

为方便起见，我们将创建一个格式良好的数据文件，其中每一行包含一个以制表符分隔的*询问语句*和一个*响应语句*对。

以下函数用于分析 *movie_lines.txt* 数据文件的原始行数据

- `loadLines` 将每一行分割成具有多个字段(lineID, characterID, movieID, character, text)的字典
- `loadConversations` 将来自 `loadLines` 的，包含多个字段的行，根据 *movie_conversations.txt* 组成对话。
- `extractSentencePairs` 从对话中提取句子对




In [3]:
# Splits each line of the file into a dictionary of fields
def loadLines(fileName, fields):
    lines = {}
    with open(fileName, encoding='iso-8859-1') as f:
        for line in f:
            values = line.split(" +++$+++ ")
            # Extract fields
            lineObj = {}
            for i, field in enumerate(fields):
                lineObj[field] = values[i]
            lines[lineObj['lineID']] = lineObj
    return lines


# Groups fields of lines from `loadLines` into conversations based on *movie_conversations.txt*
def loadConversations(fileName, lines, fields):
    conversations = []
    with open(fileName, encoding='iso-8859-1') as f:
        for line in f:
            values = line.split(" +++$+++ ")
            # Extract fields
            convObj = {}
            for i, field in enumerate(fields):
                convObj[field] = values[i]
            # Convert string to list (convObj["utteranceIDs"] == "['L598485', 'L598486', ...]")
            lineIds = eval(convObj["utteranceIDs"])
            # Reassemble lines
            convObj["lines"] = []
            for lineId in lineIds:
                convObj["lines"].append(lines[lineId])
            conversations.append(convObj)
    return conversations


# Extracts pairs of sentences from conversations
def extractSentencePairs(conversations):
    qa_pairs = []
    for conversation in conversations:
        # Iterate over all the lines of the conversation
        for i in range(len(conversation["lines"]) - 1):  # We ignore the last line (no answer for it)
            inputLine = conversation["lines"][i]["text"].strip()
            targetLine = conversation["lines"][i+1]["text"].strip()
            # Filter wrong samples (if one of the lists is empty)
            if inputLine and targetLine:
                qa_pairs.append([inputLine, targetLine])
    return qa_pairs

现在，调用这些函数，并创建文件，文件名是 *formatted_movie_lines.txt* 。




In [4]:
# Define path to new file
datafile = os.path.join(corpus, "formatted_movie_lines.txt")

delimiter = '\t'
# Unescape the delimiter
delimiter = str(codecs.decode(delimiter, "unicode_escape"))

# Initialize lines dict, conversations list, and field ids
lines = {}
conversations = []
MOVIE_LINES_FIELDS = ["lineID", "characterID", "movieID", "character", "text"]
MOVIE_CONVERSATIONS_FIELDS = ["character1ID", "character2ID", "movieID", "utteranceIDs"]

# Load lines and process conversations
print("\nProcessing corpus...")
lines = loadLines(os.path.join(corpus, "movie_lines.txt"), MOVIE_LINES_FIELDS)
print("\nLoading conversations...")
conversations = loadConversations(os.path.join(corpus, "movie_conversations.txt"),
                                  lines, MOVIE_CONVERSATIONS_FIELDS)

# Write new csv file
print("\nWriting newly formatted file...")
with open(datafile, 'w', encoding='utf-8') as outputfile:
    writer = csv.writer(outputfile, delimiter=delimiter, lineterminator='\n')
    for pair in extractSentencePairs(conversations):
        writer.writerow(pair)

# Print a sample of lines
print("\nSample lines from file:")
printLines(datafile)


Processing corpus...

Loading conversations...

Writing newly formatted file...

Sample lines from file:
Can we make this quick?  Roxanne Korrine and Andrew Barrett are having an incredibly horrendous public break- up on the quad.  Again.	Well, I thought we'd start with pronunciation, if that's okay with you.

Well, I thought we'd start with pronunciation, if that's okay with you.	Not the hacking and gagging and spitting part.  Please.

Not the hacking and gagging and spitting part.  Please.	Okay... then how 'bout we try out some French cuisine.  Saturday?  Night?

You're asking me out.  That's so cute. What's your name again?	Forget it.

No, no, it's my fault -- we didn't have a proper introduction ---	Cameron.

Cameron.	The thing is, Cameron -- I'm at the mercy of a particularly hideous breed of loser.  My sister.  I can't date until she does.

The thing is, Cameron -- I'm at the mercy of a particularly hideous breed of loser.  My sister.  I can't date until she does.	Seems like she

### 数据加载和修建减

我们的下一个任务是创建词汇表并将询问/响应句子对加载到内存中。

注意我们处理的是**词**序列，它们没有隐式映射到离散数值空间。因此，我们必须创建一个将我们在数据集中遇到的每个唯一词映射到索引值的映射(Mapping)。

为此，我们定义一个`Voc`类，它保存从词到索引值的映射，以及从索引值到词的反向映射，唯一词的数量，总词数。
这个类提供了像词表添加词的方法(`addWord`)，添加句子中所有词的方法(`addSentence`)，以及削减不常见词的方法(`trim`)。
More on trimming later.




In [5]:
# Default word tokens
PAD_token = 0  # Used for padding short sentences
SOS_token = 1  # Start-of-sentence token
EOS_token = 2  # End-of-sentence token

class Voc:
    def __init__(self, name):
        self.name = name
        self.trimmed = False
        self.word2index = {}
        self.word2count = {}
        self.index2word = {PAD_token: "PAD", SOS_token: "SOS", EOS_token: "EOS"}
        self.num_words = 3  # Count SOS, EOS, PAD

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.num_words
            self.word2count[word] = 1
            self.index2word[self.num_words] = word
            self.num_words += 1
        else:
            self.word2count[word] += 1

    # Remove words below a certain count threshold
    def trim(self, min_count):
        if self.trimmed:
            return
        self.trimmed = True

        keep_words = []

        for k, v in self.word2count.items():
            if v >= min_count:
                keep_words.append(k)

        print('keep_words {} / {} = {:.4f}'.format(
            len(keep_words), len(self.word2index), len(keep_words) / len(self.word2index)
        ))

        # Reinitialize dictionaries
        self.word2index = {}
        self.word2count = {}
        self.index2word = {PAD_token: "PAD", SOS_token: "SOS", EOS_token: "EOS"}
        self.num_words = 3 # Count default tokens

        for word in keep_words:
            self.addWord(word)

现在，可以开始装配词表和询问/回复句子对。在准备好使用这些数据之前，我们还得执行一些预处理。

- 首先，我们必须使用`unicodeToAscii`将字符串从Unicode转为ASCII。
- 其次，我们应该把所有字母转为小写，并且裁剪掉所有的除基本标点符号之外的所有非字母字符(`normalizeString`)。
- 最后，为了训练更快的收敛，我们将过滤掉(`filterPairs`)长度大于`MAX_LENGTH`阈值的句子。




In [6]:
MAX_LENGTH = 10  # Maximum sentence length to consider

# Turn a Unicode string to plain ASCII, thanks to
# https://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )

# Lowercase, trim, and remove non-letter characters
def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    s = re.sub(r"\s+", r" ", s).strip()
    return s

# Read query/response pairs and return a voc object
def readVocs(datafile, corpus_name):
    print("Reading lines...")
    # Read the file and split into lines
    lines = open(datafile, encoding='utf-8').\
        read().strip().split('\n')
    # Split every line into pairs and normalize
    pairs = [[normalizeString(s) for s in l.split('\t')] for l in lines]
    voc = Voc(corpus_name)
    return voc, pairs

# Returns True if both sentences in a pair 'p' are under the MAX_LENGTH threshold
def filterPair(p):
    # Input sequences need to preserve the last word for EOS token
    return all(len(s.split(' ')) < MAX_LENGTH for s in p)

# Filter pairs using filterPair condition
def filterPairs(pairs):
    return [pair for pair in pairs if filterPair(pair)]

# Using the functions defined above, return a populated voc object and pairs list
def loadPrepareData(corpus, corpus_name, datafile, save_dir):
    print("Start preparing training data ...")
    voc, pairs = readVocs(datafile, corpus_name)
    print("Read {!s} sentence pairs".format(len(pairs)))
    pairs = filterPairs(pairs)
    print("Trimmed to {!s} sentence pairs".format(len(pairs)))
    print("Counting words...")
    for pair in pairs:
        voc.addSentence(pair[0])
        voc.addSentence(pair[1])
    print("Counted words:", voc.num_words)
    return voc, pairs


# Load/Assemble voc and pairs
save_dir = os.path.join("data", "save")
voc, pairs = loadPrepareData(corpus, corpus_name, datafile, save_dir)
# Print some pairs to validate
print("\npairs:")
for pair in pairs[:10]:
    print(pair)

Start preparing training data ...
Reading lines...
Read 221282 sentence pairs
Trimmed to 64217 sentence pairs
Counting words...
Counted words: 17996

pairs:
['there .', 'where ?']
['you have my word . as a gentleman', 'you re sweet .']
['hi .', 'looks like things worked out tonight huh ?']
['you know chastity ?', 'i believe we share an art instructor']
['have fun tonight ?', 'tons']
['well no . . .', 'then that s all you had to say .']
['then that s all you had to say .', 'but']
['but', 'you always been this selfish ?']
['do you listen to this crap ?', 'what crap ?']
['what good stuff ?', 'the real you .']


另一种有利于在训练期间实现更快收敛的策略是修剪掉我们词汇表中很少使用的单词。减小特征空间也会缓和模型逼近的难度。我们将通过两个步骤完成此操作：

1. 用`voc.trim`修剪掉数量小于`MIN_COUNT`阈值的词。

1. 过滤掉含有被修剪词的句子对。




In [7]:
MIN_COUNT = 3    # Minimum word count threshold for trimming

def trimRareWords(voc, pairs, MIN_COUNT):
    # Trim words used under the MIN_COUNT from the voc
    voc.trim(MIN_COUNT)
    # Filter out pairs with trimmed words
    keep_pairs = []
    for pair in pairs:
        input_sentence = pair[0]
        output_sentence = pair[1]
        keep_input = True
        keep_output = True
        # Check input sentence
        for word in input_sentence.split(' '):
            if word not in voc.word2index:
                keep_input = False
                break
        # Check output sentence
        for word in output_sentence.split(' '):
            if word not in voc.word2index:
                keep_output = False
                break

        # Only keep pairs that do not contain trimmed word(s) in their input or output sentence
        if keep_input and keep_output:
            keep_pairs.append(pair)

    print("Trimmed from {} pairs to {}, {:.4f} of total".format(len(pairs), len(keep_pairs), len(keep_pairs) / len(pairs)))
    return keep_pairs


# Trim voc and pairs
pairs = trimRareWords(voc, pairs, MIN_COUNT)

keep_words 7820 / 17993 = 0.4346
Trimmed from 64217 pairs to 53120, 0.8272 of total


## 为模型准备数据

尽管我们已经投入很大精力来准备和整理数据，把它们放到了漂亮的词表对象和句子对列表，但我们的模型最终需要的的输入却是Tensor的张量数字。
可以在[Seq2Seq翻译教程](../intermediate/seq2seq_translation_tutorial)中找到为模型准备预处理数据的方法。
在那个教程中，我们使用的批次值为1，这意味着我们所要做的就是将句子对中的单词转换为词汇表中对应的索引值，并将其提供给模型。

另外，如果你对加速培训和/或想要利用GPU并行化功能感兴趣，则需要使用小批量培训。

使用小批量也意味着我们必须注意批量中句子长度的变化。
为了容纳同一批次中不同大小的句子，我们要让批量输入张量的形状 *(max_length，batch_size)* 中的短于 *max_length* 的句子在 *EOS_token* 之后用零填充。

如果我们只是简单地通过将单词转换为其索引值(`indexesFromSentence`)和零填充的方法将英语句子转换为张量，张量的形状将是 *(batch_size，max_length)* ，并且在第一维上的索引将在所有时间步骤中返回完整序列。
但是，我们需要能够沿着时间、跨批次、在所有序列上进行索引。
因此，我们将输入批处理的形状转置为 *(max_length，batch_size)* ，以便跨第一维的索引返回批中所有句子的时间步长。
我们在`zeroPadding`函数中隐式处理这个转置。

![batches](https://pytorch.org/tutorials/_images/seq2seq_batches.png)

函数`inputVar`处理句子到张量的转换过程，最终创建一个形状正确的零填充张量。
它还返回批次中每个序列的长度(`lengths`)的张量，它稍后会被传给编码器。

函数`outputVar`的执行过程与`inputVar`类似，但是不返回长度（`lenghts`）张量，而是返回二进制的掩码张量和目标句子的最大长度。
二进制掩码与输出目标张量具有同样的形状，但是其中的每个 *PAD_token* 元素都为0，其它所有元素都为1。

`batch2TrainData`简单的使用一堆句子对并使用上述函数返回输入和目标张量。




In [8]:
def indexesFromSentence(voc, sentence):
    return [voc.word2index[word] for word in sentence.split(' ')] + [EOS_token]


def zeroPadding(l, fillvalue=PAD_token):
    return list(itertools.zip_longest(*l, fillvalue=fillvalue))

def binaryMatrix(l, value=PAD_token):
    m = []
    for i, seq in enumerate(l):
        m.append([])
        for token in seq:
            if token == PAD_token:
                m[i].append(0)
            else:
                m[i].append(1)
    return m

# Returns padded input sequence tensor and lengths
def inputVar(l, voc):
    indexes_batch = [indexesFromSentence(voc, sentence) for sentence in l]
    lengths = torch.tensor([len(indexes) for indexes in indexes_batch])
    padList = zeroPadding(indexes_batch)
    padVar = torch.LongTensor(padList)
    return padVar, lengths

# Returns padded target sequence tensor, padding mask, and max target length
def outputVar(l, voc):
    indexes_batch = [indexesFromSentence(voc, sentence) for sentence in l]
    max_target_len = max([len(indexes) for indexes in indexes_batch])
    padList = zeroPadding(indexes_batch)
    mask = binaryMatrix(padList)
    mask = torch.ByteTensor(mask)
    padVar = torch.LongTensor(padList)
    return padVar, mask, max_target_len

# Returns all items for a given batch of pairs
def batch2TrainData(voc, pair_batch):
    pair_batch.sort(key=lambda x: len(x[0].split(" ")), reverse=True)
    input_batch, output_batch = [], []
    for pair in pair_batch:
        input_batch.append(pair[0])
        output_batch.append(pair[1])
    inp, lengths = inputVar(input_batch, voc)
    output, mask, max_target_len = outputVar(output_batch, voc)
    return inp, lengths, output, mask, max_target_len


# Example for validation
small_batch_size = 5
batches = batch2TrainData(voc, [random.choice(pairs) for _ in range(small_batch_size)])
input_variable, lengths, target_variable, mask, max_target_len = batches

print("input_variable:", input_variable)
print("lengths:", lengths)
print("target_variable:", target_variable)
print("mask:", mask)
print("max_target_len:", max_target_len)

input_variable: tensor([[ 101,  218,  197,  147,  278],
        [  37,  208,  117, 1125,   50],
        [ 659,  180,    7,    7,    6],
        [ 660,  211,   18, 2623,    2],
        [2270,   18,  386,    4,    0],
        [ 177,    4,    6,    2,    0],
        [4256,    2,    2,    0,    0],
        [   4,    0,    0,    0,    0],
        [   2,    0,    0,    0,    0]])
lengths: tensor([9, 7, 7, 6, 4])
target_variable: tensor([[ 124, 2069,  680,  147, 1644],
        [   7, 1826,   56,   92,   12],
        [  89,   98,  827,    7, 1581],
        [  12, 2807,   25, 1247,    4],
        [  79, 2240,   47,    6,    2],
        [2008,    4,   66,    2,    0],
        [  98,    2,    2,    0,    0],
        [4102,    0,    0,    0,    0],
        [   4,    0,    0,    0,    0],
        [   2,    0,    0,    0,    0]])
mask: tensor([[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 0],
        [1, 1, 1

## 定义模型

### Seq2Seq 模型

我们这个聊天机器人的大脑是序列到序列（seq2seq）模型。seq2seq模型的目标是将可变长度序列作为输入，并使用固定大小的模型将可变长度序列作为输出返回。

[Sutskever 等人](https://arxiv.org/abs/1409.3215) 发现通过使用两个独立的递归神经网络(RNN)，我们可以完成这项任务。

第一个RNN扮演**编码器**的角色，它将可变长度输入序列编码成固定长度的上下文向量。
理论上，这个上下文向量（RNN的最终隐藏层）将包含关于输入给机器人的询问句的语义信息。

第二个RNN是**解码器**，它采用输入词和上下文向量，并返回序列中后续词的猜测值，以及用于下次迭代的隐藏层。

![model](https://jeddy92.github.io/images/ts_intro/seq2seq_ts.png)

图片来源：<https://jeddy92.github.io/JEddy92.github.io/ts_seq2seq_intro/>

#### 编码器

编码器RNN一次迭代输入句子的一个标记(token)，每个时间步骤输出一个“输出”向量和一个“隐藏状态”向量。
然后将隐藏状态向量传递给下一个时间步骤，同时记录输出向量。
编码器将其在序列中的每个点看到的上下文转换为高维空间中的一组点，解码器将使用这组点来为给定任务生成有意义的输出。

我们这个编码器的的核心是多层门控单元(GRU)，由[Cho 等人](https://arxiv.org/pdf/1406.1078v3.pdf)于2014年发明。
我们将使用GRU的一种变种——双向GRU，它使用两种独立的RNN：一个以正常的顺序接收输入序列，另一个以反方向接收输入序列。
在同一时间步骤中对每个网络的输出求和。
使用双向GRU讲给我们带来对过去和未来上下文进行编码的优势。

双向RNN:

![rnn_bidir](https://colah.github.io/posts/2015-09-NN-Types-FP/img/RNN-bidirectional.png)

图片来源：<https://colah.github.io/posts/2015-09-NN-Types-FP/>

注意，一个`embedding`层用于在任意大小的特征空间中编码我们的单词索引。
对于我们的模型，这个层会将每个词映射到大小为*hidden_size*的特征空间。
训练后，这些值应该编码了近义词之间的语义相似度。

Finally, if passing a padded batch of sequences to an RNN module, we
must pack and unpack padding around the RNN pass using
最有，如果要填充后的一批序列传入RNN模块，我们必须围绕RNN进行打包和解包，这些方法分别是：
- `nn.utils.rnn.pack_padded_sequence`
- `nn.utils.rnn.pad_packed_sequence`

**计算图：**

1. 将词的索引值转为嵌入
1. 为RNN模块打包填充后的序列批次
1. 通过GRU前向传递
1. 解包填充
1. 双向GRU输出求和
1. 返回输出和最终隐藏状态

**输入：**

- `input_seq`: 输入句子批次；形状=*(max_length,batch_size)*
- `input_lengths`: 由批次中每个句子的长度的所构成的列表；形状=*(batch_size)*
- `hidden`: 隐藏状态；形状=*(n_layers x num_directions, batch_size, hidden_size)*

**输出：**

- `outputs`: 从GRN最终隐藏层的输出特征；形状=*(max_length, batch_size, hidden_size)*
- `hidden`: 从GRU更新的隐藏状态；形状=*(n_layers x num_directions, batch_size, hidden_size)*





In [9]:
class EncoderRNN(nn.Module):
    def __init__(self, hidden_size, embedding, n_layers=1, dropout=0):
        super(EncoderRNN, self).__init__()
        self.n_layers = n_layers
        self.hidden_size = hidden_size
        self.embedding = embedding

        # Initialize GRU; the input_size and hidden_size params are both set to 'hidden_size'
        #   because our input size is a word embedding with number of features == hidden_size
        self.gru = nn.GRU(hidden_size, hidden_size, n_layers,
                          dropout=(0 if n_layers == 1 else dropout), bidirectional=True)

    def forward(self, input_seq, input_lengths, hidden=None):
        # Convert word indexes to embeddings
        embedded = self.embedding(input_seq)
        # Pack padded batch of sequences for RNN module
        packed = nn.utils.rnn.pack_padded_sequence(embedded, input_lengths)
        # Forward pass through GRU
        outputs, hidden = self.gru(packed, hidden)
        # Unpack padding
        outputs, _ = nn.utils.rnn.pad_packed_sequence(outputs)
        # Sum bidirectional GRU outputs
        outputs = outputs[:, :, :self.hidden_size] + outputs[:, : ,self.hidden_size:]
        # Return output and final hidden state
        return outputs, hidden

#### 解码器

解码器RNN以一个接一个标记(token-by-token)的形式生成回复句子。
它使用编码器的上下文向量和内置隐藏状态生成序列中的后续词。
它持续的生成词，直到输出*EOS_token*——表示句子的结束。
寻常的Seq2Seq解码器常常遇到的一个问题就是，如果我们依赖于上下文向量来编码整个输入的语义，那么我们很可能丢失信息。

在处理长输入序列的时候尤其如此，这极大的限制了我们这个解码器的能力。
为了解决这个问题，[Bahdanau 等人](https://arxiv.org/abs/1409.0473)创建了“注意力机制”，允许解码器只关注输入序列的某些部分，而不是在每一步都使用整个固定的上下文。

在上层，用解码器的当前隐藏状态和编码器的输出计算注意力。
输出注意力权重和输入序列具有相同的形状，这让我们可以将它和编码器输出相乘，得到编码器输出中的要被加以注意力的部分的加权和。

[Sean Robertson](https://github.com/spro) 的图示很好的描述了这点：

![attn2](https://pytorch.org/tutorials/_images/attn2.png)

[Luong 等人](https://arxiv.org/abs/1508.04025)创建了“全局注意力”来改进[Bahdanau 等人](https://arxiv.org/abs/1409.0473)的基础工作。
“全局注意力”最关键的不同之处在于：它会考虑所有的编码器隐藏状态，而不是[Bahdanau 等人](https://arxiv.org/abs/1409.0473)的只考虑当前时间步骤中的编码器隐藏状态的“局部注意力”方式。
另一个不同之处在于，使用“全局注意力”，我们仅仅使用当前时间步骤的编码器的隐藏状态来计算注意力的权重或能量值。
[Bahdanau 等人](https://arxiv.org/abs/1409.0473)的注意力计算需要了解上一个时间步骤中编码器的状态。
Also, Luong et al. provides various methods to calculate the
attention energies between the encoder output and decoder output which
are called “score functions”:
此外，[Luong 等人](https://arxiv.org/abs/1508.04025)提供了用于计算编码器输出和解码器输出之间的注意力的的多种方法，他们被成为“得分函数”（score functions）：

![scores](https://pytorch.org/tutorials/_images/scores.png)

其中，
- $h_t$ 为当前目标解码器状的态
- $\bar{h}_s$ 为所有编码器的状态

总体而言，全局注意力机制可以通过下图来总结。
注意我们将在被称作`Attn`的分离的`nn.Module`中实现“注意力层”。
这个模块的输出是一个 softmax 标准化权重张量，其形状是 *(batch_size, 1, max_length)* 。

![global_attn](https://pytorch.org/tutorials/_images/global_attn.png)



In [10]:
# Luong attention layer
class Attn(nn.Module):
    def __init__(self, method, hidden_size):
        super(Attn, self).__init__()
        self.method = method
        if self.method not in ['dot', 'general', 'concat']:
            raise ValueError(self.method, "is not an appropriate attention method.")
        self.hidden_size = hidden_size
        if self.method == 'general':
            self.attn = nn.Linear(self.hidden_size, hidden_size)
        elif self.method == 'concat':
            self.attn = nn.Linear(self.hidden_size * 2, hidden_size)
            self.v = nn.Parameter(torch.FloatTensor(hidden_size))

    def dot_score(self, hidden, encoder_output):
        return torch.sum(hidden * encoder_output, dim=2)

    def general_score(self, hidden, encoder_output):
        energy = self.attn(encoder_output)
        return torch.sum(hidden * energy, dim=2)

    def concat_score(self, hidden, encoder_output):
        energy = self.attn(torch.cat((hidden.expand(encoder_output.size(0), -1, -1), encoder_output), 2)).tanh()
        return torch.sum(self.v * energy, dim=2)

    def forward(self, hidden, encoder_outputs):
        # Calculate the attention weights (energies) based on the given method
        if self.method == 'general':
            attn_energies = self.general_score(hidden, encoder_outputs)
        elif self.method == 'concat':
            attn_energies = self.concat_score(hidden, encoder_outputs)
        elif self.method == 'dot':
            attn_energies = self.dot_score(hidden, encoder_outputs)

        # Transpose max_length and batch_size dimensions
        attn_energies = attn_energies.t()

        # Return the softmax normalized probability scores (with added dimension)
        return F.softmax(attn_energies, dim=1).unsqueeze(1)

现在，我们已经定义了注意力子模块，可以实现实际的解码器模块了。
对于解码器，我们将手动的在每个时间步骤中提供批数据。
这意味着我们的嵌入词张量和GRU输出的形状都是 *(1, batch_size, hidden_size)* 。


**计算图：**

1. 获得当前输入词的强如嵌入。
1. 单向GRU前向。
1. 有第二步的GRU输出计算注意力权重。
1. 注意力权重与编码器输出相乘，得到“加权和”(weighted sum)上下文向量。
1. 使用 Luong 等人的方法将加权上下文向量和GRU输出相加。
1. 使用 Luong 等人的方法(不用 softmax)预测后续词。
1. 返回输出和最终隐藏层。

**输入：**

- `input_step`: 输入序列批的一个时间步骤 (一个词)；形状=*(1, batch_size)*
- `last_hidden`:  GRU的最终隐藏层；形状=(n_layers x num_directions, batch_size, hidden_size)*
- `encoder_outputs`: 编码器的模型输出; 性转=*(max_length, batch_size, hidden_size)*

**输出：**

- `输出`: softmax 正规化张量，给出了被解码序列中每个词是正确的后续词的概率; 形状=*(batch_size, voc.num_words)*
- `hidden`: GRU的最终隐藏状态; 形状=*(n_layers x num_directions, batch_size, hidden_size)*




In [11]:
class LuongAttnDecoderRNN(nn.Module):
    def __init__(self, attn_model, embedding, hidden_size, output_size, n_layers=1, dropout=0.1):
        super(LuongAttnDecoderRNN, self).__init__()

        # Keep for reference
        self.attn_model = attn_model
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers
        self.dropout = dropout

        # Define layers
        self.embedding = embedding
        self.embedding_dropout = nn.Dropout(dropout)
        self.gru = nn.GRU(hidden_size, hidden_size, n_layers, dropout=(0 if n_layers == 1 else dropout))
        self.concat = nn.Linear(hidden_size * 2, hidden_size)
        self.out = nn.Linear(hidden_size, output_size)

        self.attn = Attn(attn_model, hidden_size)

    def forward(self, input_step, last_hidden, encoder_outputs):
        # Note: we run this one step (word) at a time
        # Get embedding of current input word
        embedded = self.embedding(input_step)
        embedded = self.embedding_dropout(embedded)
        # Forward through unidirectional GRU
        rnn_output, hidden = self.gru(embedded, last_hidden)
        # Calculate attention weights from the current GRU output
        attn_weights = self.attn(rnn_output, encoder_outputs)
        # Multiply attention weights to encoder outputs to get new "weighted sum" context vector
        context = attn_weights.bmm(encoder_outputs.transpose(0, 1))
        # Concatenate weighted context vector and GRU output using Luong eq. 5
        rnn_output = rnn_output.squeeze(0)
        context = context.squeeze(1)
        concat_input = torch.cat((rnn_output, context), 1)
        concat_output = torch.tanh(self.concat(concat_input))
        # Predict next word using Luong eq. 6
        output = self.out(concat_output)
        output = F.softmax(output, dim=1)
        # Return output and final hidden state
        return output, hidden

## 定义训练过程

### 损失掩码

由于我们处理的是批量填充序列，因此在计算损失时我们不能简单地仅考虑张量的全部元素。
而是通过定义`maskNLLLoss`损失函数，基于解码器输出张量，目标张量和描述目标张量填充的二进制掩码张量，来计算损失。
该损失函数计算了对应于掩码向量中*1*的元素的负对数相似度。

In [12]:
def maskNLLLoss(inp, target, mask):
    nTotal = mask.sum()
    crossEntropy = -torch.log(torch.gather(inp, 1, target.view(-1, 1)).squeeze(1))
    loss = crossEntropy.masked_select(mask).mean()
    loss = loss.to(device)
    return loss, nTotal.item()

### 单次训练迭代

函数 ``train`` 包含一个单次训练迭代（单个输入批次）算法。

我们用一些巧妙的技巧来促进收敛：

- 第一个技巧是使用**教师强制**。
  这意味着某些情况下，通过设置`teacher_forcing_ratio`，我们使用当前目标词，而不是使用解码器的当前猜测结果，作为解码器的后续输入。
  这项技术是训练解码器的轮子，有助于更有效的训练。
  教师强制会导致模型在推理期间不稳定，这是因为解码器在训练中没有足够的机会真正的制作它自己的输出序列。
  所有，我们必须注意地如何设置`teacher_forcing_ratio`，不要被快速收敛欺骗。

- 我们要实现的第二个技巧是**梯度剪切**。
  这是一种用于对抗“梯度爆炸”的常用技术。
  本质上，通过剪切或者最大阈值，我们防止梯度爆炸性增长和溢出(NaN)，或者在成本函数中从悬崖跌落。

![grad_clip](https://pytorch.org/tutorials/_images/grad_clip.png)

图像来源: [Goodfellow 等人 *Deep Learning*. 2016.](https://www.deeplearningbook.org/)

**运算顺序：**

1. 通过编码器向前传递整个输入批次。
1. 将解码器输入初始化为`SOS_token`和编码器最终隐藏层的隐藏状态。
1. 在每个时间步骤中，通过解码器向前传递输入批的序列。
1. 如果用到了教师强制：把下一个解码器输入作为当前目标；其它：用下一个解码器输入作为当前解码器输出。
1. 计算和累积损失。
1. 进行反向传播
1. 剪切梯度。
1. 更新编码器和解码器模型的参数。


!!! note "注意":
  只需将 PyTorch RNN 模块(`RNN`, `LSTM`, `GRU`)的整个输入序列（或批次的序列）传入，它们就可以被用于任何其它类似的非递归层。
  我们在`encoder`中这样使用`GUR`层。
  实际情况是，在底层，每一个时间步骤上都有一个交互过程迭代计算隐藏状态。
  
  另外，也可以每个时间步骤运行这些模型。
  在这种情况下，我们在训练过程里手动循环遍历序列，就像对`decoder`模型做的那样。
  只要维护好这些模块正确的概念模型，实现序列模型就可以非常简单。





In [None]:
def train(input_variable, lengths, target_variable, mask, max_target_len, encoder, decoder, embedding,
          encoder_optimizer, decoder_optimizer, batch_size, clip, max_length=MAX_LENGTH):

    # Zero gradients
    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    # Set device options
    input_variable = input_variable.to(device)
    lengths = lengths.to(device)
    target_variable = target_variable.to(device)
    mask = mask.to(device)

    # Initialize variables
    loss = 0
    print_losses = []
    n_totals = 0

    # Forward pass through encoder
    encoder_outputs, encoder_hidden = encoder(input_variable, lengths)

    # Create initial decoder input (start with SOS tokens for each sentence)
    decoder_input = torch.LongTensor([[SOS_token for _ in range(batch_size)]])
    decoder_input = decoder_input.to(device)

    # Set initial decoder hidden state to the encoder's final hidden state
    decoder_hidden = encoder_hidden[:decoder.n_layers]

    # Determine if we are using teacher forcing this iteration
    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    # Forward batch of sequences through decoder one time step at a time
    if use_teacher_forcing:
        for t in range(max_target_len):
            decoder_output, decoder_hidden = decoder(
                decoder_input, decoder_hidden, encoder_outputs
            )
            # Teacher forcing: next input is current target
            decoder_input = target_variable[t].view(1, -1)
            # Calculate and accumulate loss
            mask_loss, nTotal = maskNLLLoss(decoder_output, target_variable[t], mask[t])
            loss += mask_loss
            print_losses.append(mask_loss.item() * nTotal)
            n_totals += nTotal
    else:
        for t in range(max_target_len):
            decoder_output, decoder_hidden = decoder(
                decoder_input, decoder_hidden, encoder_outputs
            )
            # No teacher forcing: next input is decoder's own current output
            _, topi = decoder_output.topk(1)
            decoder_input = torch.LongTensor([[topi[i][0] for i in range(batch_size)]])
            decoder_input = decoder_input.to(device)
            # Calculate and accumulate loss
            mask_loss, nTotal = maskNLLLoss(decoder_output, target_variable[t], mask[t])
            loss += mask_loss
            print_losses.append(mask_loss.item() * nTotal)
            n_totals += nTotal

    # Perform backpropatation
    loss.backward()

    # Clip gradients: gradients are modified in place
    _ = nn.utils.clip_grad_norm_(encoder.parameters(), clip)
    _ = nn.utils.clip_grad_norm_(decoder.parameters(), clip)

    # Adjust model weights
    encoder_optimizer.step()
    decoder_optimizer.step()

    return sum(print_losses) / n_totals

Training iterations
~~~~~~~~~~~~~~~~~~~

It is finally time to tie the full training procedure together with the
data. The ``trainIters`` function is responsible for running
``n_iterations`` of training given the passed models, optimizers, data,
etc. This function is quite self explanatory, as we have done the heavy
lifting with the ``train`` function.

One thing to note is that when we save our model, we save a tarball
containing the encoder and decoder state_dicts (parameters), the
optimizers’ state_dicts, the loss, the iteration, etc. Saving the model
in this way will give us the ultimate flexibility with the checkpoint.
After loading a checkpoint, we will be able to use the model parameters
to run inference, or we can continue training right where we left off.




In [None]:
def trainIters(model_name, voc, pairs, encoder, decoder, encoder_optimizer, decoder_optimizer, embedding, encoder_n_layers, decoder_n_layers, save_dir, n_iteration, batch_size, print_every, save_every, clip, corpus_name, loadFilename):

    # Load batches for each iteration
    training_batches = [batch2TrainData(voc, [random.choice(pairs) for _ in range(batch_size)])
                      for _ in range(n_iteration)]

    # Initializations
    print('Initializing ...')
    start_iteration = 1
    print_loss = 0
    if loadFilename:
        start_iteration = checkpoint['iteration'] + 1

    # Training loop
    print("Training...")
    for iteration in range(start_iteration, n_iteration + 1):
        training_batch = training_batches[iteration - 1]
        # Extract fields from batch
        input_variable, lengths, target_variable, mask, max_target_len = training_batch

        # Run a training iteration with batch
        loss = train(input_variable, lengths, target_variable, mask, max_target_len, encoder,
                     decoder, embedding, encoder_optimizer, decoder_optimizer, batch_size, clip)
        print_loss += loss

        # Print progress
        if iteration % print_every == 0:
            print_loss_avg = print_loss / print_every
            print("Iteration: {}; Percent complete: {:.1f}%; Average loss: {:.4f}".format(iteration, iteration / n_iteration * 100, print_loss_avg))
            print_loss = 0

        # Save checkpoint
        if (iteration % save_every == 0):
            directory = os.path.join(save_dir, model_name, corpus_name, '{}-{}_{}'.format(encoder_n_layers, decoder_n_layers, hidden_size))
            if not os.path.exists(directory):
                os.makedirs(directory)
            torch.save({
                'iteration': iteration,
                'en': encoder.state_dict(),
                'de': decoder.state_dict(),
                'en_opt': encoder_optimizer.state_dict(),
                'de_opt': decoder_optimizer.state_dict(),
                'loss': loss,
                'voc_dict': voc.__dict__,
                'embedding': embedding.state_dict()
            }, os.path.join(directory, '{}_{}.tar'.format(iteration, 'checkpoint')))

Define Evaluation
-----------------

After training a model, we want to be able to talk to the bot ourselves.
First, we must define how we want the model to decode the encoded input.

Greedy decoding
~~~~~~~~~~~~~~~

Greedy decoding is the decoding method that we use during training when
we are **NOT** using teacher forcing. In other words, for each time
step, we simply choose the word from ``decoder_output`` with the highest
softmax value. This decoding method is optimal on a single time-step
level.

To facilite the greedy decoding operation, we define a
``GreedySearchDecoder`` class. When run, an object of this class takes
an input sequence (``input_seq``) of shape *(input_seq length, 1)*, a
scalar input length (``input_length``) tensor, and a ``max_length`` to
bound the response sentence length. The input sentence is evaluated
using the following computational graph:

**Computation Graph:**

   1) Forward input through encoder model.
   2) Prepare encoder's final hidden layer to be first hidden input to the decoder.
   3) Initialize decoder's first input as SOS_token.
   4) Initialize tensors to append decoded words to.
   5) Iteratively decode one word token at a time:
       a) Forward pass through decoder.
       b) Obtain most likely word token and its softmax score.
       c) Record token and score.
       d) Prepare current token to be next decoder input.
   6) Return collections of word tokens and scores.




In [None]:
class GreedySearchDecoder(nn.Module):
    def __init__(self, encoder, decoder):
        super(GreedySearchDecoder, self).__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, input_seq, input_length, max_length):
        # Forward input through encoder model
        encoder_outputs, encoder_hidden = self.encoder(input_seq, input_length)
        # Prepare encoder's final hidden layer to be first hidden input to the decoder
        decoder_hidden = encoder_hidden[:decoder.n_layers]
        # Initialize decoder input with SOS_token
        decoder_input = torch.ones(1, 1, device=device, dtype=torch.long) * SOS_token
        # Initialize tensors to append decoded words to
        all_tokens = torch.zeros([0], device=device, dtype=torch.long)
        all_scores = torch.zeros([0], device=device)
        # Iteratively decode one word token at a time
        for _ in range(max_length):
            # Forward pass through decoder
            decoder_output, decoder_hidden = self.decoder(decoder_input, decoder_hidden, encoder_outputs)
            # Obtain most likely word token and its softmax score
            decoder_scores, decoder_input = torch.max(decoder_output, dim=1)
            # Record token and score
            all_tokens = torch.cat((all_tokens, decoder_input), dim=0)
            all_scores = torch.cat((all_scores, decoder_scores), dim=0)
            # Prepare current token to be next decoder input (add a dimension)
            decoder_input = torch.unsqueeze(decoder_input, 0)
        # Return collections of word tokens and scores
        return all_tokens, all_scores

Evaluate my text
~~~~~~~~~~~~~~~~

Now that we have our decoding method defined, we can write functions for
evaluating a string input sentence. The ``evaluate`` function manages
the low-level process of handling the input sentence. We first format
the sentence as an input batch of word indexes with *batch_size==1*. We
do this by converting the words of the sentence to their corresponding
indexes, and transposing the dimensions to prepare the tensor for our
models. We also create a ``lengths`` tensor which contains the length of
our input sentence. In this case, ``lengths`` is scalar because we are
only evaluating one sentence at a time (batch_size==1). Next, we obtain
the decoded response sentence tensor using our ``GreedySearchDecoder``
object (``searcher``). Finally, we convert the response’s indexes to
words and return the list of decoded words.

``evaluateInput`` acts as the user interface for our chatbot. When
called, an input text field will spawn in which we can enter our query
sentence. After typing our input sentence and pressing *Enter*, our text
is normalized in the same way as our training data, and is ultimately
fed to the ``evaluate`` function to obtain a decoded output sentence. We
loop this process, so we can keep chatting with our bot until we enter
either “q” or “quit”.

Finally, if a sentence is entered that contains a word that is not in
the vocabulary, we handle this gracefully by printing an error message
and prompting the user to enter another sentence.




In [None]:
def evaluate(encoder, decoder, searcher, voc, sentence, max_length=MAX_LENGTH):
    ### Format input sentence as a batch
    # words -> indexes
    indexes_batch = [indexesFromSentence(voc, sentence)]
    # Create lengths tensor
    lengths = torch.tensor([len(indexes) for indexes in indexes_batch])
    # Transpose dimensions of batch to match models' expectations
    input_batch = torch.LongTensor(indexes_batch).transpose(0, 1)
    # Use appropriate device
    input_batch = input_batch.to(device)
    lengths = lengths.to(device)
    # Decode sentence with searcher
    tokens, scores = searcher(input_batch, lengths, max_length)
    # indexes -> words
    decoded_words = [voc.index2word[token.item()] for token in tokens]
    return decoded_words


def evaluateInput(encoder, decoder, searcher, voc):
    input_sentence = ''
    while(1):
        try:
            # Get input sentence
            input_sentence = input('> ')
            # Check if it is quit case
            if input_sentence == 'q' or input_sentence == 'quit': break
            # Normalize sentence
            input_sentence = normalizeString(input_sentence)
            # Evaluate sentence
            output_words = evaluate(encoder, decoder, searcher, voc, input_sentence)
            # Format and print response sentence
            output_words[:] = [x for x in output_words if not (x == 'EOS' or x == 'PAD')]
            print('Bot:', ' '.join(output_words))

        except KeyError:
            print("Error: Encountered unknown word.")

Run Model
---------

Finally, it is time to run our model!

Regardless of whether we want to train or test the chatbot model, we
must initialize the individual encoder and decoder models. In the
following block, we set our desired configurations, choose to start from
scratch or set a checkpoint to load from, and build and initialize the
models. Feel free to play with different model configurations to
optimize performance.




In [None]:
# Configure models
model_name = 'cb_model'
attn_model = 'dot'
#attn_model = 'general'
#attn_model = 'concat'
hidden_size = 500
encoder_n_layers = 2
decoder_n_layers = 2
dropout = 0.1
batch_size = 64

# Set checkpoint to load from; set to None if starting from scratch
loadFilename = None
checkpoint_iter = 4000
#loadFilename = os.path.join(save_dir, model_name, corpus_name,
#                            '{}-{}_{}'.format(encoder_n_layers, decoder_n_layers, hidden_size),
#                            '{}_checkpoint.tar'.format(checkpoint_iter))


# Load model if a loadFilename is provided
if loadFilename:
    # If loading on same machine the model was trained on
    checkpoint = torch.load(loadFilename)
    # If loading a model trained on GPU to CPU
    #checkpoint = torch.load(loadFilename, map_location=torch.device('cpu'))
    encoder_sd = checkpoint['en']
    decoder_sd = checkpoint['de']
    encoder_optimizer_sd = checkpoint['en_opt']
    decoder_optimizer_sd = checkpoint['de_opt']
    embedding_sd = checkpoint['embedding']
    voc.__dict__ = checkpoint['voc_dict']


print('Building encoder and decoder ...')
# Initialize word embeddings
embedding = nn.Embedding(voc.num_words, hidden_size)
if loadFilename:
    embedding.load_state_dict(embedding_sd)
# Initialize encoder & decoder models
encoder = EncoderRNN(hidden_size, embedding, encoder_n_layers, dropout)
decoder = LuongAttnDecoderRNN(attn_model, embedding, hidden_size, voc.num_words, decoder_n_layers, dropout)
if loadFilename:
    encoder.load_state_dict(encoder_sd)
    decoder.load_state_dict(decoder_sd)
# Use appropriate device
encoder = encoder.to(device)
decoder = decoder.to(device)
print('Models built and ready to go!')

Run Training
~~~~~~~~~~~~

Run the following block if you want to train the model.

First we set training parameters, then we initialize our optimizers, and
finally we call the ``trainIters`` function to run our training
iterations.




In [None]:
# Configure training/optimization
clip = 50.0
teacher_forcing_ratio = 1.0
learning_rate = 0.0001
decoder_learning_ratio = 5.0
n_iteration = 4000
print_every = 1
save_every = 500

# Ensure dropout layers are in train mode
encoder.train()
decoder.train()

# Initialize optimizers
print('Building optimizers ...')
encoder_optimizer = optim.Adam(encoder.parameters(), lr=learning_rate)
decoder_optimizer = optim.Adam(decoder.parameters(), lr=learning_rate * decoder_learning_ratio)
if loadFilename:
    encoder_optimizer.load_state_dict(encoder_optimizer_sd)
    decoder_optimizer.load_state_dict(decoder_optimizer_sd)

# Run training iterations
print("Starting Training!")
trainIters(model_name, voc, pairs, encoder, decoder, encoder_optimizer, decoder_optimizer,
           embedding, encoder_n_layers, decoder_n_layers, save_dir, n_iteration, batch_size,
           print_every, save_every, clip, corpus_name, loadFilename)

Run Evaluation
~~~~~~~~~~~~~~

To chat with your model, run the following block.




In [None]:
# Set dropout layers to eval mode
encoder.eval()
decoder.eval()

# Initialize search module
searcher = GreedySearchDecoder(encoder, decoder)

# Begin chatting (uncomment and run the following line to begin)
# evaluateInput(encoder, decoder, searcher, voc)

Conclusion
----------

That’s all for this one, folks. Congratulations, you now know the
fundamentals to building a generative chatbot model! If you’re
interested, you can try tailoring the chatbot’s behavior by tweaking the
model and training parameters and customizing the data that you train
the model on.

Check out the other tutorials for more cool deep learning applications
in PyTorch!


