本教程将介绍使用PyTorch的Hybrid Frontend将seq2seq模型转换为Torch Script的过程。 我们将转换的模型是Chatbot教程中的聊天机器人模型。 您可以将本教程视为Chatbot教程的“第2部分”并部署您自己的预训练模型，也可以从本文档开始并使用我们托管的预训练模型。 在后一种情况下，您可以参考原始的Chatbot教程，以获取有关数据预处理，模型理论和定义以及模型训练的详细信息。

### What is the Hybrid Fronted?

在基于深度学习的项目的研究和开发阶段，与PyTorch等eager、必要的接口进行交互是有利的，这使用户能够编写熟悉的、符合习惯的Python代码，允许使用Python数据结构，控制流操作，打印语句和debug等实用功能。虽然eager界面是研究和实验应用程序的有益工具，但是当在生产环境中部署模型时，使用基于图形的模型表示非常有益。延迟图（the deferred graph）表示允许优化，例如无序执行，以及针对高度优化的硬件架构的能力。此外，基于图形的表示使框架无关模型的导出成为可能。 PyTorch提供了将eager模式代码逐步转换为Torch Script的机制，Torch Script是一个静态可分析且可优化的Python子集，Torch使用它来独立于Python运行时表示深度学习程序。

用于将eager模式的PyTorch程序转换为Torch Script的API可在`torch.jit`模块中找到。该模块有两种核心模式，用于将eager模式模型转换为Torch Script图形表示：`tracing`和`scripting`。 `torch.jit.trace`函数接受一个模块或函数以及一组示例输入，然后，它在跟踪遇到的计算步骤时通过函数或模块运行示例输入，并输出执行跟踪操作的基于图形的函数。`tracing`非常适用于不涉及数据相关控制流的简单模块和函数，例如标准卷积神经网络。但是，如果跟踪具有依赖于数据的if语句和循环的函数，则仅记录由示例输入执行的执行路径调用的操作，换句话说，不捕获控制流本身。为了转换包含依赖于数据的控制流的模块和函数，提供了一种scripting机制。Scripting显式将模块或函数代码转换为Torch Script，包括所有可能的控制流路径。要使用脚本模式，请确保从`torch.jit.ScriptModule`基类（而不是torch.nn.Module）继承，并将`torch.jit.script`装饰器添加到Python函数或将torch.jit.script_method装饰器中添加到模块的方法中。使用Scripting的一个警告是它只支持Python的受限子集,有关支持的功能的所有详细信息，请参阅Torch Script语言参考。为了提供最大的灵活性，可以组合Torch Script的模式来表示整个程序，并且可以逐步应用这些技术。

### Prepare Environments

首先，我们导入一些必要的库包，并且设置一些常量。如果你计划使用自己的模型，请确保`MAX_LENGTH`常量设置正确。提醒一下，它定义为在训练过程中允许的句子长度的最大值以及产生的模型的最大输出长度。

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import re
import os
import unicodedata
import numpy as np

device = torch.device("cpu")


MAX_LENGTH = 10  # Maximum sentence length

# Default word tokens
PAD_token = 0  # Used for padding short sentences
SOS_token = 1  # Start-of-sentence token
EOS_token = 2  # End-of-sentence token

### Model Overiew

我们这里使用的是seq2seq模型，在这种情况下我们的输入时变长的序列，输出也是变长的序列，它们之间不需要是一对一的映射。seq2seq模型式由两个RNN组成，一个叫做编码器，一个叫做解码器。

#### Encoder

编码器RNN在输入上进行迭代：一个时间步上迭代一个标记（例如，词），在每个时间步输出一个输出向量和一个隐藏状态向量。 然后将隐藏状态向量传递到下一个时间步，同时输出向量也被记录下来。 编码器将其在序列中的每个点处看到的上下文变换为高维空间中的一组点，解码器将使用这些数字来为给定任务生成有意义的输出。

#### Decoder

解码器RNN以逐个标记的方式生成应答语句。 它使用编码器的上下文向量和内部隐藏状态来生成序列中的下一个词。 它继续生成单词，直到它输出一个表示句子结尾的EOS标记。 我们在解码器中使用注意机制来帮助它在生成输出时“关注”输入的某些部分。 对于我们的模型，我们实现了Luong等人的“Global Attention”模块，并将其用作解码模型中的子模块。

### Data Handling

尽管我们的模型在概念上处理的是序列化的标记，但实际上它们处理的是数字，就像其他机器学习模型一样。在此情况下，词汇表中的每一个词都会在模型训练开始前，被映射为一个整数索引。我们使用类Voc来实现从单词到索引的转换。我们将会在运行模型前加载这个类。

同样的，为了能够很好的评估，我们必须提供一个工具来处理我们的字符输入。函数`normalizeString`将所有的字符转换为小写形式，并且移除了所有的非字母标识。`indexesFromSentence`函数接收句子中的词作为输入，并返回每个词对应的索引序列。

In [2]:
class Voc:
    def __init__(self, name):
        self.name = name
        self.trimmed = False
        self.word2index = {}
        self.word2count = {}
        self.index2word = {PAD_token: "PAD", SOS_token: "SOS", EOS_token: "EOS"}
        self.num_words = 3  # Count SOS, EOS, PAD

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.num_words
            self.word2count[word] = 1
            self.index2word[self.num_words] = word
            self.num_words += 1
        else:
            self.word2count[word] += 1

    # Remove words below a certain count threshold
    def trim(self, min_count):
        if self.trimmed:
            return
        self.trimmed = True
        keep_words = []
        for k, v in self.word2count.items():
            if v >= min_count:
                keep_words.append(k)

        print('keep_words {} / {} = {:.4f}'.format(
            len(keep_words), len(self.word2index), len(keep_words) / len(self.word2index)
        ))
        # Reinitialize dictionaries
        self.word2index = {}
        self.word2count = {}
        self.index2word = {PAD_token: "PAD", SOS_token: "SOS", EOS_token: "EOS"}
        self.num_words = 3 # Count default tokens
        for word in keep_words:
            self.addWord(word)


# Lowercase and remove non-letter characters
def normalizeString(s):
    s = s.lower()
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    return s


# Takes string sentence, returns sentence of word indexes
def indexesFromSentence(voc, sentence):
    return [voc.word2index[word] for word in sentence.split(' ')] + [EOS_token]

### Define Encoder

我们使用`torch.nn.GRU`模块实现编码器的RNN，我们提供一个batch大小的句子（词嵌入的向量），并在内部迭代句子，每个时间步一个标记，计算隐藏状态。我们将此模块初始化为双向的，这意味着我们有两个独立的GRU：一个按时间顺序迭代序列，另一个以相反的顺序迭代。我们最终返回这两个GRU输出的总和。由于我们的模型是使用批处理训练的，因此我们的EncoderRNN模型的正向函数需要对输入的batch进行填充。对于变长的batch的句子，我们允许句子中最多MAX_LENGTH个标记，并且批量中小于MAX_LENGTH个标记的所有句子都在末尾用我们专用的PAD_token标记填充。要使用带有PyTorch RNN模块的填充批处理，我们必须使用`torch.nn.utils.rnn.pack_padded_sequence`和`torch.nn.utils.rnn.pad_packed_sequence`数据转换来包装forward pass调用。请注意，forward函数还采用input_lengths列表，其中包含批处理中每个句子的长度。填充时，此输入由`torch.nn.utils.rnn.pack_padded_sequence`函数使用。

#### Hybrid Frontend Notes

In [3]:
class EncoderRNN(nn.Module):
    def __init__(self, hidden_size, embedding, n_layers=1, dropout=0):
        super(EncoderRNN, self).__init__()
        self.n_layers = n_layers
        self.hidden_size = hidden_size
        self.embedding = embedding

        # Initialize GRU; the input_size and hidden_size params are both set to 'hidden_size'
        #   because our input size is a word embedding with number of features == hidden_size
        self.gru = nn.GRU(hidden_size, hidden_size, n_layers,
                          dropout=(0 if n_layers == 1 else dropout), bidirectional=True)

    def forward(self, input_seq, input_lengths, hidden=None):
        # Convert word indexes to embeddings
        embedded = self.embedding(input_seq)
        # Pack padded batch of sequences for RNN module
        packed = torch.nn.utils.rnn.pack_padded_sequence(embedded, input_lengths)
        # Forward pass through GRU
        outputs, hidden = self.gru(packed, hidden)
        # Unpack padding
        outputs, _ = torch.nn.utils.rnn.pad_packed_sequence(outputs)
        # Sum bidirectional GRU outputs
        outputs = outputs[:, :, :self.hidden_size] + outputs[:, : ,self.hidden_size:]
        # Return output and final hidden state
        return outputs, hidden

### Define Decoder's Attention Module

接下来，我们将定义我们的注意力模块（Attn）。 请注意，此模块将用作解码器模型中的子模块。Luong等人考虑各种“得分函数”，其取当前解码器RNN输出和整个编码器输出，并返回注意力“能量”。 该注意能量向量与编码器输出的大小相同，并且这两者最终相乘，产生加权张量，其最大值表示在解码的特定时间步骤中查询句子的最重要部分。

In [4]:
# Luong attention layer
class Attn(torch.nn.Module):
    def __init__(self, method, hidden_size):
        super(Attn, self).__init__()
        self.method = method
        if self.method not in ['dot', 'general', 'concat']:
            raise ValueError(self.method, "is not an appropriate attention method.")
        self.hidden_size = hidden_size
        if self.method == 'general':
            self.attn = torch.nn.Linear(self.hidden_size, hidden_size)
        elif self.method == 'concat':
            self.attn = torch.nn.Linear(self.hidden_size * 2, hidden_size)
            self.v = torch.nn.Parameter(torch.FloatTensor(hidden_size))

    def dot_score(self, hidden, encoder_output):
        return torch.sum(hidden * encoder_output, dim=2)

    def general_score(self, hidden, encoder_output):
        energy = self.attn(encoder_output)
        return torch.sum(hidden * energy, dim=2)

    def concat_score(self, hidden, encoder_output):
        energy = self.attn(torch.cat((hidden.expand(encoder_output.size(0), -1, -1), encoder_output), 2)).tanh()
        return torch.sum(self.v * energy, dim=2)

    def forward(self, hidden, encoder_outputs):
        # Calculate the attention weights (energies) based on the given method
        if self.method == 'general':
            attn_energies = self.general_score(hidden, encoder_outputs)
        elif self.method == 'concat':
            attn_energies = self.concat_score(hidden, encoder_outputs)
        elif self.method == 'dot':
            attn_energies = self.dot_score(hidden, encoder_outputs)

        # Transpose max_length and batch_size dimensions
        attn_energies = attn_energies.t()

        # Return the softmax normalized probability scores (with added dimension)
        return F.softmax(attn_energies, dim=1).unsqueeze(1)

### Define Decoder

与EncoderRNN类似，我们使用`torch.nn.GRU`模块作为解码器的RNN。 但是，这一次，我们使用单向GRU。 重要的是要注意，与编码器不同，我们将一次向解码器RNN提供一个词。 我们首先获取当前单词的词嵌入并应用dropout。 接下来，我们将词嵌入和上个隐藏状态输入到GRU并获得当前GRU输出和隐藏状态。 然后我们使用我们的Attn模块作为一个层来获得注意权重，我们将其乘以编码器的输出以获得我们的attended encoder output。 我们使用这个attended encoder output作为我们的上下文向量，它表示加权和，指示编码器输出的哪些部分需要被注意。之后，我们使用线性层和softmax标准化来选择输出序列中的下一个字。

#### Hybrid Frontend Notes

与EncoderRNN类似，此模块不包含任何与数据相关的控制流。 因此，在初始化并加载其参数后，我们可以再次使用tracing将此模型转换为Torch Script。

In [5]:
class LuongAttnDecoderRNN(nn.Module):
    def __init__(self, attn_model, embedding, hidden_size, output_size, n_layers=1, dropout=0.1):
        super(LuongAttnDecoderRNN, self).__init__()

        # Keep for reference
        self.attn_model = attn_model
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers
        self.dropout = dropout

        # Define layers
        self.embedding = embedding
        self.embedding_dropout = nn.Dropout(dropout)
        self.gru = nn.GRU(hidden_size, hidden_size, n_layers, dropout=(0 if n_layers == 1 else dropout))
        self.concat = nn.Linear(hidden_size * 2, hidden_size)
        self.out = nn.Linear(hidden_size, output_size)

        self.attn = Attn(attn_model, hidden_size)

    def forward(self, input_step, last_hidden, encoder_outputs):
        # Note: we run this one step (word) at a time
        # Get embedding of current input word
        embedded = self.embedding(input_step)
        embedded = self.embedding_dropout(embedded)
        # Forward through unidirectional GRU
        rnn_output, hidden = self.gru(embedded, last_hidden)
        # Calculate attention weights from the current GRU output
        attn_weights = self.attn(rnn_output, encoder_outputs)
        # Multiply attention weights to encoder outputs to get new "weighted sum" context vector
        context = attn_weights.bmm(encoder_outputs.transpose(0, 1))
        # Concatenate weighted context vector and GRU output using Luong eq. 5
        rnn_output = rnn_output.squeeze(0)
        context = context.squeeze(1)
        concat_input = torch.cat((rnn_output, context), 1)
        concat_output = torch.tanh(self.concat(concat_input))
        # Predict next word using Luong eq. 6
        output = self.out(concat_output)
        output = F.softmax(output, dim=1)
        # Return output and final hidden state
        return output, hidden

### Define Evalution

---
~~待补充~~

#### Greedy Search Decoder

#### Hybrid Frontend Notes:

In [6]:
class GreedySearchDecoder(torch.jit.ScriptModule):
    def __init__(self, encoder, decoder, decoder_n_layers):
        super(GreedySearchDecoder, self).__init__()
        self.encoder = encoder
        self.decoder = decoder
        self._device = device
        self._SOS_token = SOS_token
        self._decoder_n_layers = decoder_n_layers

    __constants__ = ['_device', '_SOS_token', '_decoder_n_layers']

    @torch.jit.script_method
    def forward(self, input_seq : torch.Tensor, input_length : torch.Tensor, max_length : int):
        # Forward input through encoder model
        encoder_outputs, encoder_hidden = self.encoder(input_seq, input_length)
        # Prepare encoder's final hidden layer to be first hidden input to the decoder
        decoder_hidden = encoder_hidden[:self._decoder_n_layers]
        # Initialize decoder input with SOS_token
        decoder_input = torch.ones(1, 1, device=self._device, dtype=torch.long) * self._SOS_token
        # Initialize tensors to append decoded words to
        all_tokens = torch.zeros([0], device=self._device, dtype=torch.long)
        all_scores = torch.zeros([0], device=self._device)
        # Iteratively decode one word token at a time
        for _ in range(max_length):
            # Forward pass through decoder
            decoder_output, decoder_hidden = self.decoder(decoder_input, decoder_hidden, encoder_outputs)
            # Obtain most likely word token and its softmax score
            decoder_scores, decoder_input = torch.max(decoder_output, dim=1)
            # Record token and score
            all_tokens = torch.cat((all_tokens, decoder_input), dim=0)
            all_scores = torch.cat((all_scores, decoder_scores), dim=0)
            # Prepare current token to be next decoder input (add a dimension)
            decoder_input = torch.unsqueeze(decoder_input, 0)
        # Return collections of word tokens and scores
        return all_tokens, all_scores

#### Evaluating an Input

In [7]:
def evaluate(encoder, decoder, searcher, voc, sentence, max_length=MAX_LENGTH):
    ### Format input sentence as a batch
    # words -> indexes
    indexes_batch = [indexesFromSentence(voc, sentence)]
    # Create lengths tensor
    lengths = torch.tensor([len(indexes) for indexes in indexes_batch])
    # Transpose dimensions of batch to match models' expectations
    input_batch = torch.LongTensor(indexes_batch).transpose(0, 1)
    # Use appropriate device
    input_batch = input_batch.to(device)
    lengths = lengths.to(device)
    # Decode sentence with searcher
    tokens, scores = searcher(input_batch, lengths, max_length)
    # indexes -> words
    decoded_words = [voc.index2word[token.item()] for token in tokens]
    return decoded_words


# Evaluate inputs from user input (stdin)
def evaluateInput(encoder, decoder, searcher, voc):
    input_sentence = ''
    while(1):
        try:
            # Get input sentence
            input_sentence = input('> ')
            # Check if it is quit case
            if input_sentence == 'q' or input_sentence == 'quit': break
            # Normalize sentence
            input_sentence = normalizeString(input_sentence)
            # Evaluate sentence
            output_words = evaluate(encoder, decoder, searcher, voc, input_sentence)
            # Format and print response sentence
            output_words[:] = [x for x in output_words if not (x == 'EOS' or x == 'PAD')]
            print('Bot:', ' '.join(output_words))

        except KeyError:
            print("Error: Encountered unknown word.")

# Normalize input sentence and call evaluate()
def evaluateExample(sentence, encoder, decoder, searcher, voc):
    print("> " + sentence)
    # Normalize sentence
    input_sentence = normalizeString(sentence)
    # Evaluate sentence
    output_words = evaluate(encoder, decoder, searcher, voc, input_sentence)
    output_words[:] = [x for x in output_words if not (x == 'EOS' or x == 'PAD')]
    print('Bot:', ' '.join(output_words))

### Load Pretrained Parameters

In [8]:
save_dir = os.path.join("data", "save")
corpus_name = "cornell movie-dialogs corpus"

# Configure models
model_name = 'cb_model'
attn_model = 'dot'
#attn_model = 'general'
#attn_model = 'concat'
hidden_size = 500
encoder_n_layers = 2
decoder_n_layers = 2
dropout = 0.1
batch_size = 64

# If you're loading your own model
# Set checkpoint to load from
checkpoint_iter = 4000
# loadFilename = os.path.join(save_dir, model_name, corpus_name,
#                             '{}-{}_{}'.format(encoder_n_layers, decoder_n_layers, hidden_size),
#                             '{}_checkpoint.tar'.format(checkpoint_iter))

# If you're loading the hosted model

loadFilename = './data/4000_checkpoint.tar'


# Load model
# Force CPU device options (to match tensors in this tutorial)
checkpoint = torch.load(loadFilename, map_location=torch.device('cpu'))
encoder_sd = checkpoint['en']
decoder_sd = checkpoint['de']
encoder_optimizer_sd = checkpoint['en_opt']
decoder_optimizer_sd = checkpoint['de_opt']
embedding_sd = checkpoint['embedding']
voc = Voc(corpus_name)
voc.__dict__ = checkpoint['voc_dict']


print('Building encoder and decoder ...')
# Initialize word embeddings
embedding = nn.Embedding(voc.num_words, hidden_size)
embedding.load_state_dict(embedding_sd)
# Initialize encoder & decoder models
encoder = EncoderRNN(hidden_size, embedding, encoder_n_layers, dropout)
decoder = LuongAttnDecoderRNN(attn_model, embedding, hidden_size, voc.num_words, decoder_n_layers, dropout)
# Load trained model params
encoder.load_state_dict(encoder_sd)
decoder.load_state_dict(decoder_sd)
# Use appropriate device
encoder = encoder.to(device)
decoder = decoder.to(device)
# Set dropout layers to eval mode
encoder.eval()
decoder.eval()
print('Models built and ready to go!')

Building encoder and decoder ...
Models built and ready to go!


### Convert Model to Torch Script

#### Encoder

#### Decoder

#### GreedySearchDecoder

In [9]:
### Convert encoder model
# Create artificial inputs
test_seq = torch.LongTensor(MAX_LENGTH, 1).random_(0, voc.num_words)
test_seq_length = torch.LongTensor([test_seq.size()[0]])
# Trace the model
traced_encoder = torch.jit.trace(encoder, (test_seq, test_seq_length))

### Convert decoder model
# Create and generate artificial inputs
test_encoder_outputs, test_encoder_hidden = traced_encoder(test_seq, test_seq_length)
test_decoder_hidden = test_encoder_hidden[:decoder.n_layers]
test_decoder_input = torch.LongTensor(1, 1).random_(0, voc.num_words)
# Trace the model
traced_decoder = torch.jit.trace(decoder, (test_decoder_input, test_decoder_hidden, test_encoder_outputs))

### Initialize searcher module
scripted_searcher = GreedySearchDecoder(traced_encoder, traced_decoder, decoder.n_layers)

### Print Graphs

In [10]:
print('scripted_searcher graph:\n', scripted_searcher.graph)

scripted_searcher graph:
 graph(%input_seq : Tensor
      %input_length : Tensor
      %max_length : int
      %3 : Tensor
      %4 : Tensor
      %5 : Tensor
      %6 : Tensor
      %7 : Tensor
      %8 : Tensor
      %9 : Tensor
      %10 : Tensor
      %11 : Tensor
      %12 : Tensor
      %13 : Tensor
      %14 : Tensor
      %15 : Tensor
      %16 : Tensor
      %17 : Tensor
      %18 : Tensor
      %19 : Tensor
      %118 : Tensor
      %119 : Tensor
      %120 : Tensor
      %121 : Tensor
      %122 : Tensor
      %123 : Tensor
      %124 : Tensor
      %125 : Tensor
      %126 : Tensor
      %127 : Tensor
      %128 : Tensor
      %129 : Tensor
      %130 : Tensor) {
  %58 : int = prim::Constant[value=9223372036854775807](), scope: EncoderRNN
  %53 : float = prim::Constant[value=0](), scope: EncoderRNN
  %43 : float = prim::Constant[value=0.1](), scope: EncoderRNN/GRU[gru]
  %42 : int = prim::Constant[value=2](), scope: EncoderRNN/GRU[gru]
  %41 : bool = prim::Constant[value=1]

### Run Evaluation

In [11]:
# Evaluate examples
sentences = ["hello", "what's up?", "who are you?", "where am I?", "where are you from?"]
for s in sentences:
    evaluateExample(s, traced_encoder, traced_decoder, scripted_searcher, voc)

# Evaluate your input
#evaluateInput(traced_encoder, traced_decoder, scripted_searcher, voc)

> hello
Bot: hello .
> what's up?
Bot: i m going to get my car .
> who are you?
Bot: i m the owner .
> where am I?
Bot: in the house .
> where are you from?
Bot: south america .


### Save Model

In [12]:
scripted_searcher.save("scripted_chatbot.pth")