# Note

This notebook demonstrates training model to explain slang with context with the following steps:


1.   Loading .csv data which contains target word, example of the word in sentences, and the explanation of the word. While loading, it constructs languages for input and output.
2.  Set up encoders, combiner, and decoder
3.  Model Training
4.  Evaluating model

We developed this notebook to be run in Google Colab as it offers free GPU for upto 12 hours. Please make sure GPU is enabled in Edit - Notebook Settings. There are also some setup to be run in the beginning including installing pytorch and connecting to Drive.


# Colab Setup

## Install pyTorch and Download nltk resource file

In order to run this notebook in Google Colab, we must install pytorch in the instance running the notebook.

In [0]:
!pip3 install torch torchvision -U

Requirement already up-to-date: torch in /usr/local/lib/python3.6/dist-packages (1.0.0)
Requirement already up-to-date: torchvision in /usr/local/lib/python3.6/dist-packages (0.2.1)


In [0]:
from __future__ import unicode_literals, print_function, division
from io import open
import unicodedata
import string
import re
import random

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F

import nltk
nltk.download('averaged_perceptron_tagger')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


Verifiy if CUDA is enabled

In [0]:
device

device(type='cuda')

## Mount Google Drive for data

- Data file needs to be placed in Google Drive to be loaded in Colab environment. 
- Also, the Google Drive space will be used to store and reload the model.
- Override DRIVE_PATH to the directory that contains the data.


In [0]:
import os

MOUNT_PATH = '/content/drive/'
DRIVE_PATH = "My Drive/cs410-explain"
LOCAL_PATH = os.path.join(MOUNT_PATH, DRIVE_PATH)

In [0]:
from google.colab import drive
drive.mount(MOUNT_PATH)

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [0]:
!ls LOCAL_PATH

ls: cannot access 'LOCAL_PATH': No such file or directory


# Data Loading

## Language Definition
The language here contains informations such as word index, character index, POS tag index to encode the input word, character, or tag in dense vector. 

In [0]:
SOS_token = 0
EOS_token = 1


class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS
        self.char2index = {}
        self.char2count = {}
        self.index2char = {0: "SOS", 1: "EOS"}
        self.n_chars = 2
        self.tag2index = {}
        self.n_tags = 2

    def addSentence(self, sentence):
        sentence = sentence.strip()
        for word in sentence.split(' '):
            self.addWord(word)
        for char in sentence:
            if char not in self.char2index:
                self.char2index[char] = self.n_chars
                self.char2count[char] = 1
                self.index2char[self.n_chars] = char
                self.n_chars += 1
            else:
                self.char2count[char] += 1
        for word, tag in nltk.pos_tag(sentence.split(' ')):
          if tag not in self.tag2index:
            self.tag2index[tag] = self.n_tags
            self.n_tags += 1

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.n_words
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1
    
    def containsAllWords(self, sentence):
        for word in sentence.split(' '):
            if word not in self.word2index:
              return False
        return True

## Preprocessing

While loading data, it preprocesses the text data as follows:
*   Normalize string by converting to ascii, lowercase, strip, and removing any non word characters.
*   Filtering examples by word length and sentence length.



In [0]:
# Turn a Unicode string to plain ASCII, thanks to
# http://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )

# Lowercase, trim, and remove non-letter characters
def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"[^a-z]+", r" ", s)
    s = re.sub(r"\s+", r" ", s)
    s.strip()
    return s

Place this data set: https://drive.google.com/open?id=1Woa5zRbgpyNd3AA7ZMJXeSuvD4WaRaha into the corresponding path in Drive.

In [0]:
def readLangs():
    print("Reading lines...")

    # Read the file and split into lines
    lines = open(os.path.join(LOCAL_PATH, 'test.tsv'), encoding='utf-8').\
        read().strip().split('\n')

    # Split every line into pairs and normalize
    examples = [[normalizeString(s) for s in l.split('\t')[:3]] for l in lines]
    input_lang = Lang('slang')
    output_lang = Lang('explain')

    return input_lang, output_lang, examples

In [0]:
MAX_SENTENCE_LENGTH = 25
MAX_WORD_LENGTH = 10

def filterExample(p):
    return len(p) == 3 and \
        p[0].strip() and p[1].strip() and p[2].strip() and \
        len(p[2].split(' ')) < MAX_SENTENCE_LENGTH and \
        len(p[1].split(' ')) < MAX_SENTENCE_LENGTH and \
        len(p[0]) < MAX_WORD_LENGTH and \
        len(p[0].split(' ')) == 1 and \
        p[0] in p[2].split(' ')


def filterExamples(examples):
    return [example for example in examples if filterExample(example)]

## Loading
The following code should load data and show brief description of training examples.

In [0]:
MAX_WORDS_IN_LANG = 10000

def prepareData():
    input_lang, output_lang, examples = readLangs()
    print("Read %s sentence examples" % len(examples))
    examples = filterExamples(examples)
    print("Trimmed to %s sentence pairs" % len(examples))
    print("Counting words...")
    for example in examples:
        input_lang.addSentence(example[2])
        output_lang.addSentence(example[1])
    print("Counted words:")
    print(input_lang.name, input_lang.n_words)
    print(output_lang.name, output_lang.n_words)
    return input_lang, output_lang, examples
  
input_lang, output_lang, examples = prepareData()
print(random.choice(examples))

Reading lines...
Read 75670 sentence examples
Trimmed to 11045 sentence pairs
Counting words...
Counted words:
slang 19525
explain 16048
['decker', 'part donkey part unicorn ', 'unicorn decker']


## Splitting Training/Test examples

In [0]:
import random
random.shuffle(examples)
split_point = int(0.8 * len(examples))
train_examples = examples[:split_point]
test_examples = examples[split_point+1:]

## Saving Training/Test examples 
In order to reproduce the result, training/test examples are stored in Drive.

In [0]:
import pickle

with open(os.path.join(LOCAL_PATH, 'saved_train_examples'), 'wb') as f:
  pickle.dump((input_lang, output_lang, train_examples), f)
with open(os.path.join(LOCAL_PATH, 'saved_test_examples'), 'wb') as f:
  pickle.dump((input_lang, output_lang, test_examples), f)

## Loading Training/Test examples 

In [0]:
import pickle

with open(os.path.join(LOCAL_PATH, 'saved_train_examples'), 'rb') as f:
  train_examples = pickle.load(f)[2]
with open(os.path.join(LOCAL_PATH, 'saved_test_examples'), 'rb') as f:
  obj = pickle.load(f)
  test_examples = obj[2]
  input_lang = obj[0]
  output_lang = obj[1]

# Model Setup

## EncoderRNN
Encoder to encode sequence of characters or words into embedding space along with hidden context.

In [0]:
class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)

    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        output = embedded
        output, hidden = self.gru(output, hidden)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

## TagEncoderRNN
This encoder is meant for encoding sequence of word,tag pairs.

In [0]:
class TagEncoderRNN(nn.Module):
    def __init__(self, input_size, tag_size, hidden_size):
        super(TagEncoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.tag_size = tag_size
        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size + tag_size, hidden_size)

    def forward(self, input, tagTensor, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        tag_one_hot = torch.zeros(1, self.tag_size, device=device).scatter_(1, tagTensor.unsqueeze(1), 1.).view(1, 1, -1)
        concat = torch.cat((embedded, tag_one_hot), 2)
        output, hidden = self.gru(concat, hidden)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

## AttnDecoderRNN
Decoder to decode from encoder output to generate sequence of explanantion words. In the beginning, it takes combined outputs from character level encoder and word level encoder and references all outpus of both encoders to leverage attention mechanism to predict the better sequences.

In [0]:
class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.05, 
                 context_max_length=MAX_SENTENCE_LENGTH,
                 character_max_length=MAX_WORD_LENGTH):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.attn = nn.Linear(self.hidden_size * 2, context_max_length)
        self.char_attn = nn.Linear(self.hidden_size * 2, character_max_length)
        self.attn_combine = nn.Linear(self.hidden_size * 3, self.hidden_size)
        self.dropout = nn.Dropout(self.dropout_p)
        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input, hidden, encoder_outputs, char_encoder_outputs):
        embedded = self.embedding(input).view(1, 1, -1)
        embedded = self.dropout(embedded)

        attn_weights = F.softmax(
            self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)
        attn_applied = torch.bmm(attn_weights.unsqueeze(0),
                                 encoder_outputs.unsqueeze(0))
        
        char_attn_weights = F.softmax(
            self.char_attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)
        char_attn_applied = torch.bmm(char_attn_weights.unsqueeze(0),
                                      char_encoder_outputs.unsqueeze(0))

        output = torch.cat((embedded[0], attn_applied[0], char_attn_applied[0]), 1)
        output = self.attn_combine(output).unsqueeze(0)

        output = F.relu(output)
        output, hidden = self.gru(output, hidden)

        output = F.log_softmax(self.out(output[0]), dim=1)
        return output, hidden, attn_weights

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

## Combiner
Combiner to combine outputs of character level encoder and word level encoder as input of decoder.

In [0]:
class Combiner(nn.Module):
    def __init__(self, hidden_size):
        super(Combiner, self).__init__()
        self.hidden_size = hidden_size
        self.linear = nn.Linear(hidden_size * 2, hidden_size)

    def forward(self, word_hidden, char_hidden):
        output = self.linear(torch.cat((word_hidden.view(-1), char_hidden.view(-1)), 0)).view(1, 1, -1)
        return F.relu(output)

# Model Training

## Setup

The following functions will be used to create tensor variables to be used in training.

In [0]:
def indexesFromSentence(lang, sentence):
    sentence = sentence.strip()
    return [lang.word2index[word] for word in sentence.split(' ')]
  
def tagIndexesFromSentence(lang, sentence):
    sentence = sentence.strip()
    return [lang.tag2index[tag] for word, tag in nltk.pos_tag(sentence.split(' '))]


def tensorFromSentence(lang, sentence):
    indexes = indexesFromSentence(lang, sentence)
    indexes.append(EOS_token)
    return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)
  
def tagTensorFromSentence(lang, sentence):
    indexes = tagIndexesFromSentence(lang, sentence)
    indexes.append(EOS_token)
    return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)

def charIndexesFromSentence(lang, sentence):
    sentence = sentence.strip()
    return [lang.char2index[char] for char in sentence]

def charTensorFromSentence(lang, sentence):
    indexes = charIndexesFromSentence(lang, sentence)
    indexes.append(EOS_token)
    return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1) 

def tensorsFromExample(example):
    input_tensor = tensorFromSentence(input_lang, example[2])
    input_tag_tensor = tagTensorFromSentence(input_lang, example[2])
    char_input_tensor = charTensorFromSentence(input_lang, example[0])
    target_tensor = tensorFromSentence(output_lang, example[1])
    target_tag_tensor = tagTensorFromSentence(output_lang, example[1])
    return (input_tensor, char_input_tensor, target_tensor, input_tag_tensor)

## Train()

The function takes single training example, compute loss, and backprogate gradients. 

In [0]:
teacher_forcing_ratio = 0.5

def train(input_tensor, char_input_tensor, target_tensor, tag_tensor, encoder, char_encoder, decoder, combiner,
          encoder_optimizer, char_encoder_optimizer, decoder_optimizer, combiner_optimizer, 
          criterion):
    encoder_hidden = encoder.initHidden()
    char_encoder_hidden = encoder.initHidden()

    encoder_optimizer.zero_grad()
    char_encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()
    combiner_optimizer.zero_grad()

    input_length = input_tensor.size(0)
    char_input_length = char_input_tensor.size(0)
    target_length = target_tensor.size(0)

    loss = 0
    
    encoder_outputs = torch.zeros(MAX_SENTENCE_LENGTH, encoder.hidden_size, device=device)
    char_encoder_outputs = torch.zeros(MAX_WORD_LENGTH, char_encoder.hidden_size, device=device)
    
    for ei in range(input_length):
        encoder_output, encoder_hidden = encoder(
            input_tensor[ei], tag_tensor[ei], encoder_hidden)
        encoder_outputs[ei] = encoder_output[0, 0]

    for ei in range(char_input_length):
        char_encoder_output, char_encoder_hidden = char_encoder(
            char_input_tensor[ei], char_encoder_hidden)
        char_encoder_outputs[ei] = char_encoder_output[0, 0]

    decoder_input = torch.tensor([[SOS_token]], device=device)
    decoder_hidden = combiner(encoder_hidden, char_encoder_hidden)

    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    if use_teacher_forcing:
        # Teacher forcing: Feed the target as the next input
        for di in range(target_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs, char_encoder_outputs)
            loss += criterion(decoder_output, target_tensor[di])
            decoder_input = target_tensor[di]  # Teacher forcing

    else:
        # Without teacher forcing: use its own predictions as the next input
        for di in range(target_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs, char_encoder_outputs)
            topv, topi = decoder_output.topk(1)
            decoder_input = topi.squeeze().detach()  # detach from history as input

            loss += criterion(decoder_output, target_tensor[di])
            if decoder_input.item() == EOS_token:
                break

    loss.backward()
    encoder_optimizer.step()
    char_encoder_optimizer.step()
    decoder_optimizer.step()
    combiner_optimizer.step()

    return loss.item() / target_length

## writeModel()
This function writes the model's current state into Drive.


In [0]:
def writeModel(name, iter, encoder, char_encoder, decoder, combiner, encoder_optimizer, 
               char_encoder_optimizer, decoder_optimizer, combiner_optimizer, loss):
  path = os.path.join(LOCAL_PATH, 'model-%s' % name)
  torch.save({
      'epoch': iter,
      'encoder_state_dict': encoder.state_dict(),
      'encoder_optimizer_state_dict': encoder_optimizer.state_dict(),
      'char_encoder_state_dict': char_encoder.state_dict(),
      'char_encoder_optimizer_state_dict': char_encoder_optimizer.state_dict(),
      'decoder_state_dict': decoder.state_dict(),
      'decoder_optimizer_state_dict': decoder_optimizer.state_dict(),
      'combiner_state_dict': combiner.state_dict(),
      'combiner_optimizer_state_dict': combiner_optimizer.state_dict(),
      'loss': loss,
      }, path)

## loadModel()
This function load the model from Drive

In [0]:
def loadModel(name, encoder, char_encoder, decoder, combiner, encoder_optimizer, char_encoder_optimizer,
         decoder_optimizer, combiner_optimizer):
  path = os.path.join(LOCAL_PATH, 'model-%s' % name)
  checkpoint = torch.load(path)
  encoder.load_state_dict(checkpoint['encoder_state_dict'])
  encoder_optimizer.load_state_dict(checkpoint['encoder_optimizer_state_dict'])
  char_encoder.load_state_dict(checkpoint['char_encoder_state_dict'])
  char_encoder_optimizer.load_state_dict(checkpoint['char_encoder_optimizer_state_dict'])
  decoder.load_state_dict(checkpoint['decoder_state_dict'])
  decoder_optimizer.load_state_dict(checkpoint['decoder_optimizer_state_dict'])
  combiner.load_state_dict(checkpoint['combiner_state_dict'])
  combiner_optimizer.load_state_dict(checkpoint['combiner_optimizer_state_dict'])
  iter = checkpoint["epoch"]
  loss = checkpoint["loss"]
  return (iter, loss)
  

## trainIters()

The functions takes (word) encoder, char_encoder, decoder, combiner and train the them n iterations calling Train() function.

The intermediate state of models being trained will be periodically stored in Drive every "print_every" iterations.

It also supports loading the model and training n more iterations from the loaded state by setting load true and pass the model name.





In [0]:
import time
import math
import matplotlib.pyplot as plt
plt.switch_backend('agg')
import matplotlib.ticker as ticker
import numpy as np


def showPlot(points):
    plt.figure()
    fig, ax = plt.subplots()
    # this locator puts ticks at regular intervals
    loc = ticker.MultipleLocator(base=0.2)
    ax.yaxis.set_major_locator(loc)
    plt.plot(points)

def asMinutes(s):
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)


def timeSince(since, percent):
    now = time.time()
    s = now - since
    es = s / (percent)
    rs = es - s
    return '%s (- %s)' % (asMinutes(s), asMinutes(rs))
  
def trainIters(encoder, char_encoder, decoder, combiner, n_iters, print_every=1000, plot_every=100, learning_rate=0.0001, 
               load=False, name=None):
    start = time.time()
    plot_losses = []
    print_loss_total = 0  # Reset every print_every
    plot_loss_total = 0  # Reset every plot_every

    encoder_optimizer = optim.Adam(encoder.parameters(), lr=learning_rate)
    char_encoder_optimizer = optim.Adam(char_encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.Adam(decoder.parameters(), lr=learning_rate)
    combiner_optimizer = optim.Adam(combiner.parameters(), lr=learning_rate)
    criterion = nn.NLLLoss()
    
    start_iter = 0
    start_loss = 0
    
    if load:
      start_iter, start_loss = loadModel(name, encoder, char_encoder, decoder, combiner, 
                                         encoder_optimizer, char_encoder_optimizer,
                                         decoder_optimizer, combiner_optimizer)
      print('loaded model: %s start_iter: %d loss: %s' % (name, start_iter, start_loss))

    for iter in range(1, n_iters + 1):
        training_pair = tensorsFromExample(train_examples[(iter - 1) % len(train_examples)])
        input_tensor = training_pair[0]
        char_input_tensor = training_pair[1]
        target_tensor = training_pair[2]
        tag_tensor = training_pair[3]

        loss = train(input_tensor, char_input_tensor, target_tensor, tag_tensor, encoder, char_encoder, decoder, combiner,
                     encoder_optimizer, char_encoder_optimizer, decoder_optimizer, combiner_optimizer, criterion)
        print_loss_total += loss
        plot_loss_total += loss

        if iter % print_every == 0:
            print_loss_avg = print_loss_total / print_every
            print_loss_total = 0
            print('%s (%d %d%%) %.4f' % (timeSince(start, iter / n_iters),
                                         iter + start_iter, iter / n_iters * 100, print_loss_avg))
            writeModel(name, iter + start_iter, encoder, char_encoder, decoder, combiner, encoder_optimizer, 
               char_encoder_optimizer, decoder_optimizer, combiner_optimizer, print_loss_avg)

        if iter % plot_every == 0:
            plot_loss_avg = plot_loss_total / plot_every
            plot_losses.append(plot_loss_avg)
            plot_loss_total = 0
    showPlot(plot_losses)

## Train models

Currently, it trains the model with 1000 iterations for demo but should be much bigger to train good quality model.

In [0]:
hidden_size = 512
encoder1 = TagEncoderRNN(input_lang.n_words, input_lang.n_tags, hidden_size).to(device)
encoder2 = EncoderRNN(input_lang.n_chars, hidden_size).to(device)
attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words).to(device)
combiner = Combiner(hidden_size).to(device)
trainIters(encoder1, encoder2, attn_decoder1, combiner, 1000, print_every=100, load=False, name="medium_tag_rerun2")

0m 11s (- 1m 43s) (100 10%) 6.6759
0m 23s (- 1m 35s) (200 20%) 4.9694
0m 36s (- 1m 25s) (300 30%) 4.9850
0m 49s (- 1m 13s) (400 40%) 5.2649
1m 2s (- 1m 2s) (500 50%) 5.3848
1m 15s (- 0m 50s) (600 60%) 5.1692
1m 30s (- 0m 38s) (700 70%) 6.1309
1m 44s (- 0m 26s) (800 80%) 5.7058
1m 58s (- 0m 13s) (900 90%) 5.6033
2m 11s (- 0m 0s) (1000 100%) 5.5083


# Evaluation

## Setup evaluation

In [0]:
def evaluate(encoder, char_encoder, decoder, combiner, sentence, target_words,
             context_max_length=MAX_SENTENCE_LENGTH, character_max_length=MAX_WORD_LENGTH):
    with torch.no_grad():
        encoder_hidden = encoder.initHidden()
        char_encoder_hidden = encoder.initHidden()
        input_tensor = tensorFromSentence(input_lang, sentence)
        input_tag_tensor = tagTensorFromSentence(input_lang, sentence)
        input_length = input_tensor.size()[0]
        char_input_tensor = charTensorFromSentence(input_lang, target_words)
        char_input_length = char_input_tensor.size()[0]
        
        encoder_hidden = encoder.initHidden()
        encoder_outputs = torch.zeros(context_max_length, encoder.hidden_size, device=device)
        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei], input_tag_tensor[ei],
                                                     encoder_hidden)
            encoder_outputs[ei] += encoder_output[0, 0]
        
        char_encoder_hidden = char_encoder.initHidden()
        char_encoder_outputs = torch.zeros(character_max_length, char_encoder.hidden_size, device=device)
        for ei in range(char_input_length):
            char_encoder_output, char_encoder_hidden = char_encoder(char_input_tensor[ei],
                                                                    char_encoder_hidden)
            char_encoder_outputs[ei] += char_encoder_output[0, 0]

        decoder_input = torch.tensor([[SOS_token]], device=device)  # SOS
        decoder_hidden = combiner(encoder_hidden, char_encoder_hidden)

        decoded_words = []

        for di in range(context_max_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs, char_encoder_outputs)
            topv, topi = decoder_output.data.topk(1)
            if topi.item() == EOS_token:
                decoded_words.append('<EOS>')
                break
            else:
                decoded_words.append(output_lang.index2word[topi.item()])

            decoder_input = topi.squeeze().detach()

        return decoded_words

In [0]:
def evaluateRandomly(encoder, char_encoder, decoder, combiner, n=10):
    for i in range(n):
        example = test_examples[i]
        print('>', example[0])
        print('=', example[1])
        output_words = evaluate(encoder, char_encoder, decoder, combiner, example[2], example[0])
        output_sentence = ' '.join(output_words)
        print('<', output_sentence)
        print('')

## Qualtitative Evaluation

Randomly sampling examples and compare predicted explanation and actual explanation of the examples.

In [0]:
evaluateRandomly(encoder1, encoder2, attn_decoder1, combiner)

> uscg
= united states coast guard 
< the of a of a <EOS>

> slunky
= a combination of a monkey and a slut therefore creating a slunky 
< a who of a <EOS>

> pelham
= a neat town in deep southern westchester you can actually walk to pelham from the train like i did today cool place 
< the of a of a <EOS>

> droideka
= a little snot from orange county with a big nose and no tits 
< a who of a <EOS>

> chezed
= when your mind can t think straight usually from being extremely high originated from ge crew
< the of a of <EOS>

> chelfie
=  chinese selfie to take a selfie with nothing significant around or for no particular reason 
< a who of a <EOS>

> brej
= abbreviation of bredjrin a close friend 
< the of a of <EOS>

> porgie
= a weak heart person who is afraid to fight or has fear of being beaten in a fight 
< a who of a <EOS>

> babam
= a word of excitement is the response to everything
< the of a of <EOS>

> burbsies
= heavy
< the of a of <EOS>



## BLEU

BLEU score will be used to quantitatively evaluate the model.

In [0]:
import nltk

entries = []
count = 0.0
total_score = 0.0
for example in test_examples: 
  reference = example[1]
  target = example[0]
  explain = evaluate(encoder1, encoder2, attn_decoder1, combiner, example[2], example[0])
  explain = [x for x in explain[:-1] if x != '']
  score = nltk.translate.bleu_score.sentence_bleu([reference.split()], explain, weights=[1.0])
  total_score += score
  count += 1.0
  entries.append((score, (target, ' '.join(explain), reference)))
print(total_score / count)

0.061706882237824355


In [0]:
sorted(entries, reverse=True)[:10]

[(0.6, ('raydz', 'the of a of a', 'a protector of a group ')),
 (0.6, ('drullet', 'the of a of a', 'the mullet of a dragon ')),
 (0.5, ('yolology', 'the of a of', 'the study of yolo')),
 (0.5, ('shuffle', 'the of a of', ' the mixing of cards ')),
 (0.5, ('paregoric', 'the of a of', 'a tincture of opium')),
 (0.5, ('keswick', 'the of a of', 'the armpit of ontario ')),
 (0.5, ('jbone', 'a who of a', 'a joint of marijuana')),
 (0.5, ('gebs', 'the of a of', 'the feeling of pooing ')),
 (0.5, ('encrypt', 'the of a of', 'the act of encrypting ')),
 (0.5, ('crapatola', 'the of a of', 'a lot of crap'))]

# Trained Model

*   training_example: https://drive.google.com/open?id=15M1JtB8IyaT-fedyeKRPBlF71z242Az7
*   test_example: https://drive.google.com/open?id=1liTQoeaxVqxyAGWnmDYfrnwU7QiuNK3U

These training/test examples are the same dataset used in training the following models. Place them in Drive and load them to evaluate in the same test set.



In [0]:
import pickle

input_lang, output_lang, examples = prepareData()
with open(os.path.join(LOCAL_PATH, 'training/train_examples'), 'rb') as f:
  train_examples = pickle.load(f)[2]
with open(os.path.join(LOCAL_PATH, 'training/test_examples'), 'rb') as f:
  obj = pickle.load(f)
  test_examples = obj[2]

Reading lines...
Read 75670 sentence examples
Trimmed to 11045 sentence pairs
Counting words...
Counted words:
slang 19525
explain 16048


Place trained models using POS tag in Drive data path. 
trained model: https://drive.google.com/open?id=19PrgCX6-8ocGD_l4ltVVU-6YsuWz9vwA

In [0]:
hidden_size = 512
trained_tag_encoder = TagEncoderRNN(input_lang.n_words, input_lang.n_tags, hidden_size).to(device)
trained_tag_character_encoder = EncoderRNN(input_lang.n_chars, hidden_size).to(device)
trained_tag_attn_decoder = AttnDecoderRNN(hidden_size, output_lang.n_words).to(device)
trained_tag_combiner = Combiner(hidden_size).to(device)
trainIters(trained_tag_encoder, trained_tag_character_encoder, 
           trained_tag_attn_decoder, trained_tag_combiner, 0, 
           print_every=100, load=True, name="trained_tag_model")

loaded model: trained_tag_model start_iter: 301000 loss: 0.6147021469266243


In [0]:
entries = []
count = 0.0
total_score = 0.0
for example in test_examples: 
  reference = example[1]
  target = example[0]
  explain = evaluate(trained_tag_encoder, trained_tag_character_encoder, 
                     trained_tag_attn_decoder, trained_tag_combiner, example[2], example[0])
  explain = [x for x in explain[:-1] if x != '']
  score = nltk.translate.bleu_score.sentence_bleu([reference.split()], explain, weights=[1.0])
  total_score += score
  count += 1.0
  entries.append((score, (target, ' '.join(explain), reference)))
print(total_score / count)

0.6554927882965749


Place trained models without POS tag in Drive. Trained model: https://drive.google.com/open?id=1cJY-4KDsh04EBEh6HeX8G4o75sKcL5Bt

In [0]:
hidden_size = 512
trained_nontag_encoder = EncoderRNN(input_lang.n_words, hidden_size).to(device)
trained_nontag_character_encoder = EncoderRNN(input_lang.n_chars, hidden_size).to(device)
trained_nontag_attn_decoder = AttnDecoderRNN(hidden_size, output_lang.n_words).to(device)
trained_nontag_combiner = Combiner(hidden_size).to(device)
trainIters(trained_nontag_encoder, trained_nontag_character_encoder, 
           trained_nontag_attn_decoder, trained_nontag_combiner, 0, 
           print_every=100, load=True, name="medium-rerun")

loaded model: medium-rerun start_iter: 300000 loss: 0.4249174690253783


In [0]:
def evaluate_nontag(encoder, char_encoder, decoder, combiner, sentence, target_words,
             context_max_length=MAX_SENTENCE_LENGTH, character_max_length=MAX_WORD_LENGTH):
    with torch.no_grad():
        encoder_hidden = encoder.initHidden()
        char_encoder_hidden = encoder.initHidden()
        input_tensor = tensorFromSentence(input_lang, sentence)
        input_length = input_tensor.size()[0]
        char_input_tensor = charTensorFromSentence(input_lang, target_words)
        char_input_length = char_input_tensor.size()[0]
        
        encoder_hidden = encoder.initHidden()
        encoder_outputs = torch.zeros(context_max_length, encoder.hidden_size, device=device)
        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei],
                                                     encoder_hidden)
            encoder_outputs[ei] += encoder_output[0, 0]
        
        char_encoder_hidden = char_encoder.initHidden()
        char_encoder_outputs = torch.zeros(character_max_length, char_encoder.hidden_size, device=device)
        for ei in range(char_input_length):
            char_encoder_output, char_encoder_hidden = char_encoder(char_input_tensor[ei],
                                                                    char_encoder_hidden)
            char_encoder_outputs[ei] += char_encoder_output[0, 0]

        decoder_input = torch.tensor([[SOS_token]], device=device)  # SOS
        decoder_hidden = combiner(encoder_hidden, char_encoder_hidden)

        decoded_words = []

        for di in range(context_max_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs, char_encoder_outputs)
            topv, topi = decoder_output.data.topk(1)
            if topi.item() == EOS_token:
                decoded_words.append('<EOS>')
                break
            else:
                decoded_words.append(output_lang.index2word[topi.item()])

            decoder_input = topi.squeeze().detach()

        return decoded_words

In [0]:
entries = []
count = 0.0
total_score = 0.0
for example in test_examples: 
  reference = example[1]
  target = example[0]
  explain = evaluate_nontag(trained_nontag_encoder, trained_nontag_character_encoder, 
                     trained_nontag_attn_decoder, trained_nontag_combiner, example[2], example[0])
  explain = [x for x in explain[:-1] if x != '']
  score = nltk.translate.bleu_score.sentence_bleu([reference.split()], explain, weights=[1.0])
  total_score += score
  count += 1.0
  entries.append((score, (target, ' '.join(explain), reference)))
print(total_score / count)

0.4350372324751226


As evaluation of the two models show, POS tag feature helps the quality of the prediction.