# Practice: Specific Neural Architectures for NLP

**_Yuriy Guts_**

_UCU NLP Summer School, 2018_

Based on a tutorial by Sean Robertson (<https://github.com/spro/practical-pytorch>)

## Task

We will be building and training a basic character-level RNN to classify
words. A character-level RNN reads words as a series of characters -
outputting a prediction and "hidden state" at each step, feeding its
previous hidden state into each next step. We take the final prediction
to be the output, i.e. which class the word belongs to.

Specifically, we'll train on a few thousand surnames from 18 languages
of origin, and predict which language a name is from based on the
spelling:

```
> Hinton
(-0.47) Scottish
(-1.52) English
(-3.57) Irish

> Schmidhuber
(-0.19) German
(-2.48) Czech
(-2.68) Dutch
```

## Imports

In [None]:
import glob
import math
import os
import random
import string
import time
import unicodedata

import torch.nn as nn

In [None]:
%matplotlib inline

## Prepare Dataset

Included in the `data/names` directory are 18 text files named like `[Country].txt`. Each file contains a bunch of names, one name per line, mostly romanized (but we still need to convert from Unicode to ASCII).

We'll end up with a dictionary of lists of names per country, {country: [names ...]}.

In [None]:
data_files = glob.glob('../data/part2/names/*.txt')

In [None]:
unique_characters = string.ascii_letters + " .,;'"

# Turn a Unicode string to plain ASCII, thanks to http://stackoverflow.com/a/518232/2809427
def unicode_to_ascii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in unique_characters
    )

test_name = 'Ślusàrski'
print(test_name, '->', unicode_to_ascii(test_name))

In [None]:
# Build the category_lines dictionary, a list of names per country
examples_by_class = {}
unique_classes = []

def read_country_names(filename):
    with open(filename, encoding='utf-8') as f:
        lines = f.read().strip().split('\n')
        return [unicode_to_ascii(line) for line in lines]

for filename in data_files:
    country = os.path.splitext(os.path.basename(filename))[0]
    unique_classes.append(country)
    names = read_country_names(filename)
    examples_by_class[country] = names

In [None]:
unique_characters

In [None]:
unique_classes

Now we have ``examples_by_class``, a dictionary mapping each class
(country) to a list of words (names).




In [None]:
print(examples_by_class['Polish'][:5])

## Encode Dataset

Now that we have all the names organized, we need to turn them into Tensors to make any use of them.

To represent a single letter, we use a "one-hot vector" of size `<1 x len(unique_characters)>`. A one-hot vector is filled with 0-s except for a 1 at index of the current letter, e.g. `"b" = <0 1 0 0 0 ...>.`

To make a word we join a bunch of those into a 2D matrix `<word_length x 1 x len(unique_characters)>`.

That extra 1 dimension is because PyTorch assumes everything is in batches - we're just using a batch size of 1 here.

In [None]:
import torch

In [None]:
character_to_index = dict(zip(unique_characters, range(len(unique_characters))))
class_to_index = dict(zip(unique_classes, range(len(unique_classes))))
index_to_class = {v: k for k, v in class_to_index.items()}

def character_to_tensor(char):
    # TODO: Your code here.
    # Return a torch tensor for the given character.

def word_to_tensor(word):
    # TODO: Your code here.
    # Return a torch tensor for the given word.

def class_to_tensor(cls):
    # TODO: Your code here.
    # Return a torch tensor representing the supervised label for the given class.

In [None]:
print(character_to_tensor('B'))
print(character_to_tensor('B').size())

In [None]:
print(word_to_tensor('Bishop').size())

## Create Neural Network

In [None]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        # TODO: Your code here.
        # Add two linear layers (one for R and one for O function).
        # Add a LogSoftmax layer for the output.

    def forward(self, input, hidden):
        # TODO: Implement forward prop and return the output vector and the hidden state (2 return values).

    def init_hidden(self):
        # TODO: Reset the initial state of the RNN (return a tensor with all zeroes).

In [None]:
n_hidden = 128
rnn = RNN(len(unique_characters), n_hidden, len(unique_classes))

loss_func = nn.NLLLoss()
learning_rate = 0.005

To run a step of this network we need to pass an input (in our case, the
Tensor for the current character) and a previous hidden state (which we
initialize as zeros at first). We'll get back the output (probability of
each country) and a next hidden state (which we keep for the next
step).

In [None]:
input = character_to_tensor('B')
hidden = torch.zeros(1, n_hidden)
output, next_hidden = rnn(input, hidden)

print(output.size())
print(next_hidden.size())

For the sake of efficiency we don't want to be creating a new Tensor for
every step, so we will use `word_to_tensor`` instead of
``character_to_tensor`` and use slices. This could be further optimized by
pre-computing batches of Tensors.

In [None]:
input = word_to_tensor('Bishop')
hidden = torch.zeros(1, n_hidden)

output, next_hidden = rnn(input[0], hidden)
print(output.size())

As you can see the output is a ``<1 x len(unique_classes)>`` Tensor, where
every item is the likelihood of that category (higher is more likely).

In [None]:
def get_random_training_sample():
    cls = random.choice(unique_classes)
    word = random.choice(examples_by_class[cls])
    word_tensor = word_to_tensor(word)
    class_tensor = class_to_tensor(cls)
    return word, cls, word_tensor, class_tensor

Each loop of training will:

-  Create input and target tensors
-  Create a zeroed initial hidden state
-  Read each letter in and keep hidden state for next letter
-  Compare final output to target
-  Back-propagate
-  Return the output and loss

In [None]:
def train_iter(word_tensor, class_tensor):
    hidden = rnn.init_hidden()
    rnn.zero_grad()
    
    for i in range(word_tensor.size()[0]):
        output, hidden = rnn(word_tensor[i], hidden)

    loss = loss_func(output, class_tensor)
    loss.backward()

    for p in rnn.parameters():
        p.data.add_(-learning_rate, p.grad.data)

    return output, loss.item()

In [None]:
def nn_output_to_class_label(output):
    # TODO: Your code here.
    # Given the output vector of the RNN, return the class string and the class number (2 return values).

Now we just have to run that with a bunch of examples. Since the
``train`` function returns both the output and loss we can print its
guesses and also keep track of loss for plotting. Since there are 1000s
of examples we print only every ``print_every`` examples, and take an
average of the loss.




In [None]:
n_iters = 100000
print_every = 5000
plot_every = 1000

# Keep track of losses for plotting
current_loss = 0
all_losses = []

def time_since(since):
    now = time.time()
    s = now - since
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)

start = time.time()

for iter in range(1, n_iters + 1):
    word, cls, word_tensor, class_tensor = get_random_training_sample()
    output, loss = train_iter(word_tensor, class_tensor)
    current_loss += loss

    # Print iter number, loss, name and guess
    if iter % print_every == 0:
        guess, guess_i = nn_output_to_class_label(output)
        correct = '✓' if guess == cls else '✗ (%s)' % cls
        print('%d %d%% (%s) %.4f %s / %s %s' % (iter, iter / n_iters * 100, time_since(start), loss, word, guess, correct))

    # Add current loss avg to list of losses
    if iter % plot_every == 0:
        all_losses.append(current_loss / plot_every)
        current_loss = 0

## Diagnose the Results

Plotting the historical loss from ``all_losses`` shows the network
learning:




In [None]:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

plt.figure()
plt.plot(all_losses)
plt.show()

In [None]:
# Just return an output given a word tensor.
def nn_output(word_tensor):
    # TODO: Your code here.
    # Reset the hidden state of the neural network and return the output vector for the given word tensor.

In [None]:
def predict_top_k_classes(input_word, k=3):
    print()
    print('Predicting:', input_word)
    
    with torch.no_grad():
        output = nn_output(word_to_tensor(input_word))

        # Get top N categories
        topv, topi = output.topk(k, 1, True)
        predictions = []

        for i in range(k):
            value = topv[0][i].item()
            category_index = topi[0][i].item()
            print('(%.2f) %s' % (value, index_to_class[category_index]))
            predictions.append([value, index_to_class[category_index]])

Evaluating the Results
======================

To see how well the network performs on different categories, we will
create a confusion matrix, indicating for every actual language (rows)
which language the network guesses (columns). To calculate the confusion
matrix a bunch of samples are run through the network with
``nn_output()``, which is the same as ``train()`` minus the backprop.




In [None]:
# Keep track of correct guesses in a confusion matrix
confusion = torch.zeros(len(unique_classes), len(unique_classes))
n_confusion = 10000

# Go through a bunch of examples and record which are correctly guessed
for i in range(n_confusion):
    word, cls, word_tensor, class_tensor = get_random_training_sample()
    output = nn_output(word_tensor)
    guess, guess_idx = nn_output_to_class_label(output)
    class_idx = class_to_index[cls]
    confusion[class_idx][guess_idx] += 1

# Normalize by dividing every row by its sum
for i in range(len(unique_classes)):
    confusion[i] = confusion[i] / confusion[i].sum()

# Set up plot
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(confusion.numpy())
fig.colorbar(cax)

# Set up axes
ax.set_xticklabels([''] + unique_classes, rotation=90)
ax.set_yticklabels([''] + unique_classes)

# Force label at every tick
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

# sphinx_gallery_thumbnail_number = 2
plt.show()

## Try your own input!

In [None]:
predict_top_k_classes('Shevchenko')