In [1]:
# Please do not change this cell because some hidden tests might depend on it.
import os

# Otter grader does not handle ! commands well, so we define and use our
# own function to execute shell commands.
def shell(commands, warn=True):
    """Executes the string `commands` as a sequence of shell commands.
     
       Prints the result to stdout and returns the exit status. 
       Provides a printed warning on non-zero exit status unless `warn` 
       flag is unset.
    """
    file = os.popen(commands)
    print (file.read().rstrip('\n'))
    exit_status = file.close()
    if warn and exit_status != None:
        print(f"Completed with errors. Exit status: {exit_status}\n")
    return exit_status

shell("""
ls requirements.txt >/dev/null 2>&1
if [ ! $? = 0 ]; then
 rm -rf .tmp
 git clone https://github.com/cs236299-2022-spring/project2.git .tmp
 mv .tmp/requirements.txt ./
 rm -rf .tmp
fi
pip install -q -r requirements.txt
""")




In [2]:
# Initialize Otter
import otter
grader = otter.Notebook()

$$
\renewcommand{\vect}[1]{\mathbf{#1}}
\renewcommand{\cnt}[1]{\sharp(#1)}
\renewcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
\renewcommand{\softmax}{\operatorname{softmax}}
\renewcommand{\Prob}{\Pr}
\renewcommand{\given}{\,|\,}
$$

# 236299 - Introduction to Natural Language Processing
## Project 2: Sequence labeling – The slot filling task

# Introduction

The second segment of the project involves a sequence labeling task, in which the goal is to label the tokens in a text. Many NLP tasks have this general form. Most famously is the task of _part-of-speech labeling_ as you explored in lab 2-4, where the tokens in a text are to be labeled with their part of speech (noun, verb, preposition, etc.). In this project segment, however, you'll use sequence labeling to implement a system for filling the slots in a template that is intended to describe the meaning of an ATIS query. For instance, the sentence 

    What's the earliest arriving flight between Boston and Washington DC?
    
might be associated with the following slot-filled template: 

    flight_id
        fromloc.cityname: boston
        toloc.cityname: washington
        toloc.state: dc
        flight_mod: earliest arriving
    
You may wonder how this task is a sequence labeling task. We label each word in the source sentence with a tag taken from a set of tags that correspond to the slot-labels. For each slot-label, say `flight_mod`, there are two tags: `B-flight_mod` and `I-flight_mod`. These are used to mark the beginning (B) or interior (I) of a phrase that fills the given slot. In addition, there is a tag for other (O) words that are not used to fill any slot. (This technique is thus known as IOB encoding.) Thus the sample sentence would be labeled as follows:

| Token   | Label    |
| :------ | :----- | 
| `BOS` | `O` |
| `what's` | `O` |
| `the` | `O` |
| `earliest` | `B-flight_mod` |
| `arriving` | `I-flight_mod` |
| `flight` | `O` |
| `between` | `O` |
| `boston` | `B-fromloc.city_name` |
| `and` | `O` |
| `washington` | `B-toloc.city_name` |
| `dc` | `B-toloc.state_code` |
| `EOS` | `O` |

> See below for information about the `BOS` and `EOS` tokens. 

The template itself is associated with the question type for the sentence, perhaps as recovered from the sentence in the last project segment.

In this segment, you'll implement three methods for sequence labeling: a hidden Markov model (HMM) and two recurrent neural networks, a simple RNN and a long short-term memory network (LSTM). By the end of this homework, you should have grasped the pros and cons of the statistical and neural approaches.

## Goals

1. Implement an HMM-based approach to sequence labeling.
2. Implement an RNN-based approach to sequence labeling.
3. Implement an LSTM-based approach to sequence labeling.
4. Compare the performances of HMM and RNN/LSTM with different amounts of training data. Discuss the pros and cons of the HMM approach and the neural approach.

## Setup

In [3]:
import copy
import math
import matplotlib.pyplot as plt
import random

import wget
import torch
import torch.nn as nn
import torchtext.legacy as tt

from tqdm.auto import tqdm

In [4]:
# Set random seeds
seed = 1234
random.seed(seed)
torch.manual_seed(seed)

# GPU check, sets runtime type to "GPU" where available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cpu


## Loading data

We download the ATIS dataset, already presplit into training, validation (dev), and test sets.

In [5]:
# Prepare to download needed data
def download_if_needed(filename, source='./', dest='./'):
    os.makedirs(data_path, exist_ok=True) # ensure destination
    os.path.exists(f"./{dest}{filename}") or wget.download(source + filename, out=dest)

source_path = "https://raw.githubusercontent.com/nlp-course/data/master/ATIS/"
data_path = "data/"

# Download files
for filename in ["atis.train.txt",
                 "atis.dev.txt",
                 "atis.test.txt"
                ]:
    download_if_needed(filename, source_path, data_path)

## Data preprocessing

We again use `torchtext` to load data and convert words to indices in the vocabulary. We use one field `TEXT` for processing the question, and another field `TAG` for processing the sequence labels.

We treat words occurring fewer than three times in the training data as _unknown words_. They'll be replaced by the unknown word type `<unk>`.

In [6]:
MIN_FREQ = 3

TEXT = tt.data.Field(init_token="<bos>", batch_first=False) # batches are of size max_len x bsz
TAG = tt.data.Field(init_token="<bos>", batch_first=False)  # ditto
fields = (('text', TEXT), ('tag', TAG))

train, val, test = tt.datasets.SequenceTaggingDataset.splits(
  fields=fields, 
  path='./data/', 
  train='atis.train.txt',
  validation='atis.dev.txt',
  test='atis.test.txt'
)

TEXT.build_vocab(train.text, min_freq=MIN_FREQ)
TAG.build_vocab(train.tag)

We can get some sense of the datasets by looking at the size and some elements of the text and tag vocabularies.

In [7]:
print(f"Size of English vocabulary: {len(TEXT.vocab)}")
print(f"Most common English words: {TEXT.vocab.freqs.most_common(10)}\n")

print(f"Number of tags: {len(TAG.vocab)}")
print(f"Most common tags: {TAG.vocab.freqs.most_common(10)}")

Size of English vocabulary: 518
Most common English words: [('BOS', 4274), ('EOS', 4274), ('to', 3682), ('from', 3203), ('flights', 2075), ('the', 1745), ('on', 1343), ('flight', 1035), ('me', 1005), ('what', 985)]

Number of tags: 104
Most common tags: [('O', 38967), ('B-toloc.city_name', 3751), ('B-fromloc.city_name', 3726), ('I-toloc.city_name', 1039), ('B-depart_date.day_name', 835), ('I-fromloc.city_name', 636), ('B-airline_name', 610), ('B-depart_time.period_of_day', 555), ('I-airline_name', 374), ('B-depart_date.day_number', 351)]


## Special tokens and tags

You'll have already noticed the `BOS` and `EOS`, special tokens that the dataset developers used to indicate the beginning and end of the sentence; we'll leave them in the data.

We've also passed in `init_token="<bos>"` for both torchtext fields. Torchtext will prepend these to the sequence of words and tags. This relieves us from estimating the initial distribution of tags and tokens in HMMs, since we always start with a token `<bos>` whose tag is also `<bos>`. We'll be able to refer to these tags as exemplified here:

In [8]:
print(f"""
Initial tag string: {TAG.init_token}
Initial tag id:     {TAG.vocab.stoi[TAG.init_token]}
""")


Initial tag string: <bos>
Initial tag id:     2



Finally, since `torchtext` will be providing the sentences in the training corpus in "batches", `torchtext` will force the sentences within a batch to be the same length by padding them with a special token. Again, we can access that token as shown here:

In [9]:
print(f"""
Pad tag string: {TAG.pad_token}
Pad tag id:     {TAG.vocab.stoi[TAG.pad_token]}
""")


Pad tag string: <pad>
Pad tag id:     1



Now, we can iterate over the dataset using `torchtext`'s iterator. We'll use a non-trivial batch size to gain the benefit of training on multiple sentences at a shot. You'll need to be careful about the shapes of the various tensors that are being manipulated.

In [10]:
BATCH_SIZE = 20

train_iter, val_iter, test_iter = tt.data.BucketIterator.splits(
    (train, val, test), 
    batch_size=BATCH_SIZE, 
    repeat=False, 
    device=device)

Each batch will be a tensor of size `max_length x batch_size`. Let's examine a batch.

In [11]:
# Get the first batch
batch = next(iter(train_iter))

# What's its shape? Should be max_length x batch_size.
print(f'Shape of batch text tensor: {batch.text.shape}\n')

# Extract the first sentence in the batch, both text and tags
first_sentence = batch.text[:, 0]
first_tags = batch.tag[:, 0]

# Print out the first sentence, as token ids and as text
print("First sentence in batch")
print(f"{first_sentence}")
print(f"{' '.join([TEXT.vocab.itos[i] for i in first_sentence])}\n")

print("First tags in batch")
print(f"{first_tags}")
print(f"{[TAG.vocab.itos[i] for i in first_tags]}")

Shape of batch text tensor: torch.Size([22, 20])

First sentence in batch
tensor([ 2,  3, 21, 45, 88, 44,  7, 39, 28, 20, 54, 18, 22,  4,  1,  1,  1,  1,
         1,  1,  1,  1])
<bos> BOS i need information for flights leaving baltimore and arriving in atlanta EOS <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>

First tags in batch
tensor([2, 3, 3, 3, 3, 3, 3, 3, 5, 3, 3, 3, 4, 3, 1, 1, 1, 1, 1, 1, 1, 1])
['<bos>', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-fromloc.city_name', 'O', 'O', 'O', 'B-toloc.city_name', 'O', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>']


The goal of this project is to predict the sequence of tags `batch.tag` given a sequence of words `batch.text`.

# Majority class labeling

As usual, we can get a sense of the difficulty of the task by looking at a simple baseline, tagging every token with the majority tag. Here's a table of tag frequencies for the most frequent tags:

In [12]:
def count_tags(iterator):
  tag_counts = torch.zeros(len(TAG.vocab.itos), device=device)

  for batch in iterator:
    tags = batch.tag.view(-1)
    tag_counts.scatter_add_(0, tags, torch.ones(tags.shape).to(device))

  ## Alternative untensorized implementation for reference
  # for batch in iterator:                # for each batch
  #   for sent_id in range(len(batch)):   # ... each sentence in the batch
  #     for tag in batch.tag[:, sent_id]: # ... each tag in the sentence
  #       tag_counts[tag] += 1            # bump the tag count

  # Ignore paddings
  tag_counts[TAG.vocab.stoi[TAG.pad_token]] = 0
  return tag_counts

tag_counts = count_tags(train_iter)

for tag_id in range(len(TAG.vocab.itos)):
  print(f'{tag_id:3}  {TAG.vocab.itos[tag_id]:30}{tag_counts[tag_id].item():3.0f}')

  0  <unk>                           0
  1  <pad>                           0
  2  <bos>                         4274
  3  O                             38967
  4  B-toloc.city_name             3751
  5  B-fromloc.city_name           3726
  6  I-toloc.city_name             1039
  7  B-depart_date.day_name        835
  8  I-fromloc.city_name           636
  9  B-airline_name                610
 10  B-depart_time.period_of_day   555
 11  I-airline_name                374
 12  B-depart_date.day_number      351
 13  B-depart_date.month_name      340
 14  B-depart_time.time            321
 15  B-round_trip                  311
 16  I-round_trip                  303
 17  B-depart_time.time_relative   290
 18  B-cost_relative               281
 19  B-flight_mod                  264
 20  I-depart_time.time            258
 21  B-stoploc.city_name           202
 22  B-city_name                   191
 23  B-arrive_time.time            182
 24  B-class_type                  181
 25  B-arrive_time.

It looks like the `'O'` (other) tag is, unsurprisingly, the most frequent tag (except for the padding tag). The proportion of tokens labeled with that tag (ignoring the padding tag) gives us a good baseline accuracy for this sequence labeling task. To verify that intuition, we can calculate the accuracy of the majority tag on the test set:

In [13]:
tag_counts_test = count_tags(test_iter)
majority_baseline_accuracy = (
  tag_counts_test[TAG.vocab.stoi['O']] 
  / tag_counts_test.sum()
)
print(f'Baseline accuracy: {majority_baseline_accuracy:.3f}')

Baseline accuracy: 0.634


# HMM for sequence labeling

Having established the baseline to beat, we turn to implementing an HMM model.

## Notation

First, let's start with some notation. We use $\mathcal{V} = \langle \mathcal{V}_1, \mathcal{V}_2, \ldots \mathcal{V}_V \rangle$ to denote the vocabulary of word types and $Q = \langle{Q_1, Q_2, \ldots, Q_N} \rangle$ to denote the possible tags, which is the state space of the HMM. Thus $V$ is the number of word types in the vocabulary and $N$ is the number of states (tags).

We use $\vect{w} = w_1 \cdots w_T \in \mathcal{V}^T$ to denote the string of words at "time steps" $t$ (where $t$ varies from $1$ to $T$). Similarly, $\vect{q} = q_1 \cdots q_T \in Q^T$ denotes the corresponding sequence of states (tags).

## Training an HMM by counting

Recall that an HMM is defined via a transition matrix $A$, which stores the probability of moving from one state $Q_i$ to another $Q_j$, that is, 

$$A_{ij}=\Prob(q_{t+1}=Q_j  \given  q_t=Q_i)$$

and an emission matrix $B$, which stores the probability of generating word $\mathcal{V}_j$ given state $Q_i$, that is, 

$$B_{ij}= \Prob(w_t=\mathcal{V}_j  \given q_t= Q_i)$$

> As is typical in notating probabilities, we'll use abbreviations
>
\begin{align}
\Prob(q_{t+1} \given  q_t) &\equiv \Prob(q_{t+1}=Q_j  \given  q_t=Q_i) \\
\Prob(w_t  \given q_t) &\equiv \Prob(w_t=\mathcal{V}_j  \given q_t= Q_i)
\end{align}
>
> where the $i$ and $j$ are clear from context.

In our case, since the labels are observed in the training data, we can directly use counting to determine (maximum likelihood) estimates of $A$ and $B$.

### Goal 1(a): Find the transition matrix

The matrix $A$ contains the transition probabilities: $A_{ij}$ is the probability of moving from state $Q_i$ to state $Q_j$ in the training data, so that $\sum^{N}_{j = 1 } A_{ij} = 1$ for all $i$. 

We find these probabilities by counting the number of times state $Q_j$ appears right after state $Q_i$, as a proportion of all of the transitions from $Q_i$.

$$
A_{ij} = \frac{\cnt{Q_i, Q_j} + \delta}{\sum_k \left (\cnt{Q_i, Q_k}+\delta \right)}
$$

(In the above formula, we also used add-$\delta$ smoothing.)

Using the above definition, implement the method `train_A` in the `HMM` class below, which calculates and returns the $A$ matrix as a tensor of size $N \times N$.

> You'll want to go ahead and implement this part now, and test it below, before moving on to the next goal.

> Remember that the training data is being delivered to you batched.

### Goal 1(b): Find the emission matrix $B$

Similar to the transition matrix, the emission matrix contains the emission probabilities such that $B_{ij}$ is probability of word $w_t=\mathcal{V}_j$ conditioned on state $q_t=Q_i$.

We can find this by counting as well.
$$
B_{ij} = \frac{\cnt{Q_i, \mathcal{V}_j} + \delta}{\sum_k \left (\cnt{Q_i, \mathcal{V}_k} + \delta \right)}
       = \frac{\cnt{Q_i, \mathcal{V}_j} + \delta}{\cnt{Q_i} + \delta V}
$$

Using the above definitions, implement the `train_B` method in the `HMM` class below, which calculates and returns the $B$ matrix as a tensor of size $N \times V$.

> You'll want to go ahead and implement this part now, and test it below, before moving on to the next goal.

## Sequence labeling with a trained HMM

Now that you're able to train an HMM by estimating the transition matrix $A$ and the emission matrix $B$, you can apply it to the task of labeling a sequence of words $\vect{w} = w_1 \cdots w_T$. Our goal is to find the most probable sequence of tags $\vect{\hat q} \in Q^T$ given a sequence of words $\vect{w} \in \mathcal{V}^T$.

\begin{align*}
\vect{\hat q} &= \operatorname*{argmax}\limits_{\vect{q} \in Q^T}(\Prob(\vect{q} \given \vect{w})) \\
& = \operatorname*{argmax}_{\vect{q} \in Q^T}(\Prob(\vect{q},\vect{w})) \\
& = \operatorname*{argmax}_{\vect{q} \in Q^T}\left(\Pi^{T}_{t = 1} \Prob(w_{t} \given q_{t})\Prob(q_{t} \given q_{t-1})\right)
\end{align*}

where $\Prob(w_{t}=\mathcal{V}_j \given q_{t}=Q_i) = B_{ij}$, $\Prob(q_{t}=Q_j \given q_{t-1}=Q_{i})=A_{ij}$, and $q_0$ is the predefined initial tag `TAG.vocab.stoi[TAG.init_token]`.

### Goal 1(c): Viterbi algorithm

Implement the `predict` method, which should use the Viterbi algorithm to find the most likely sequence of tags for a sequence of `words`.

> Warning: It may take up to 30 minutes to tag the entire test set depending on your implementation. (A fully tensorized implementation can be much faster though.) We highly recommend that you begin by experimenting with your code using a _very small subset_ of the dataset, say two or three sentences, ramping up from there.

> Hint: Consider how to use vectorized computations where possible for speed.

## Evaluation

We've provided you with the `evaluate` function, which takes a dataset iterator and uses `predict` on each sentence in each batch, comparing against the gold tags, to determine the accuracy of the model on the test set.

In [14]:
class HMMTagger():
  def __init__ (self, text, tag):
    self.text = text
    self.tag = tag
    self.V = len(text.vocab.itos)    # vocabulary size
    self.N = len(tag.vocab.itos)     # state space size
    self.initial_state_id = tag.vocab.stoi[tag.init_token]
    self.pad_state_id = tag.vocab.stoi[tag.pad_token]
    self.pad_word_id = text.vocab.stoi[text.pad_token]
  
  def train_A(self, iterator, delta):
    """Returns A for training dataset `iterator` using add-`delta` smoothing."""
    # Create A table
    A = torch.zeros(self.N, self.N, device=device)
    sum_rows = torch.zeros(self.N, device=device)
    for batch in iterator:
      for sent_id in range(len(batch)):
        tags = batch.tag[:, sent_id]
        for i in range(len(tags)-1):
            A[tags[i]][tags[i+1]] +=1
            if(i!= len(tags)-2):
              sum_rows[tags[i]]+=1
    A+=delta
    sum_rows = sum_rows.reshape(-1, 1)
    sum_rows+=self.V*delta
    A = A/sum_rows        
    return A



  def train_B(self, iterator, delta):
    """Returns B for training dataset `iterator` using add-`delta` smoothing."""
    B = torch.zeros(self.N, self.V, device=device)
    tags_sum = torch.zeros(self.N, device=device)
    for batch in iterator:
      for sent_id in range(len(batch)):
        text = batch.text[:, sent_id]
        tags = batch.tag[:, sent_id]
        for i in range(len(tags)):
            B[tags[i]][text[i]] +=1
            tags_sum[tags[i]]+=1
    
    B+=delta
    tags_sum+=self.V*delta
    tags_sum = tags_sum.reshape(-1, 1)
    B = B/tags_sum
    return B

  def train_all(self, iterator, delta=0.01):
    """Stores A and B (actually, their logs) for training dataset `iterator`."""
    self.log_A = self.train_A(iterator, delta).log()
    self.log_B = self.train_B(iterator, delta).log()
    
  def predict(self, words):
    """Returns the most likely sequence of tags for a sequence of `words`.
    Arguments:
      words: a tensor of size (seq_len,)
    Returns:
      a list of tag ids
    """
    vit_mat = torch.zeros(self.N, len(words), device = device)
    prev_mat = torch.zeros(self.N, len(words), device = device)
    vit_mat[2,0] = 1
    vit_mat[:,0] =torch.log(vit_mat[:,0]) 
    backpointers = torch.zeros(len(words), device = device)

    for word in range(1, len(words)):
        for tag in range(self.N):
            all_results = vit_mat[:,word-1]+self.log_A[:,tag]
            vit_mat[tag][word] = torch.max(all_results).item()+self.log_B[tag][words[word]]
            prev_mat[tag][word] = torch.argmax(all_results)


    backpointers = torch.zeros(len(words), device=device)
    backpointers[len(words)-1] = (torch.argmax(vit_mat[:,len(words)-1])).long()
    for i in range(len(words)-2, -1, -1):
      backpointers[i] = prev_mat[backpointers[i+1].long()][i+1]

    return list(backpointers) 




  def evaluate(self, iterator):
    """Returns the model's token accuracy on a given dataset `iterator`."""
    correct = 0
    total = 0
    for batch in tqdm(iterator, leave=False):
      for sent_id in range(len(batch)):
        words = batch.text[:, sent_id]
        words = words[words.ne(self.pad_word_id)] # remove paddings
        tags_gold = batch.tag[:, sent_id]
        tags_pred = self.predict(words)
        for tag_gold, tag_pred in zip(tags_gold, tags_pred):
          if tag_gold == self.pad_state_id:  # stop once we hit padding
            break
          else:
            total += 1
            if tag_pred == tag_gold:
              correct += 1
    return correct/total

Putting everything together, you should now be able to train and evaluate the HMM. A correct implementation can be expected to reach above **90% test set accuracy** after running the following cell.

In [15]:
# Instantiate and train classifier
hmm_tagger = HMMTagger(TEXT, TAG)
hmm_tagger.train_all(train_iter)

# Evaluate model performance
print(f'Training accuracy: {hmm_tagger.evaluate(train_iter):.3f}\n'
      f'Test accuracy:     {hmm_tagger.evaluate(test_iter):.3f}')

  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Training accuracy: 0.915
Test accuracy:     0.907


# RNN for Sequence Labeling

HMMs work quite well for this sequence labeling task. Now let's take an alternative (and more trendy) approach: RNN/LSTM-based sequence labeling. Similar to the HMM part of this project, you will also need to train a model on the training data, and then use the trained model to decode and evaluate some testing data.

<img src="https://github.com/nlp-course/data/raw/master/Resources/rnn-unfolded-figure.png" width=600 align=right />

After unfolding an RNN, the cell at time $t$ generates the observed output $\vect{y}_t$ based on the input $\vect{x}_t$ and the hidden state of the previous cell $\vect{h}_{t-1}$, according to the following equations.

\begin{align*}
\vect{h}_t &=  \sigma(\vect{U} \vect{x}_t + \vect{V} \vect{h}_{t-1}) \\
\vect{\hat y}_t &= \softmax(\vect{W} \vect{h}_t)
\end{align*}

The parameters here are the elements of the matrices $\vect{U}$, $\vect{V}$, and $\vect{W}$. Similar to the last project segment, we will perform the forward computation, calculate the loss, and then perform the backward computation to compute the gradients with respect to these model parameters. Finally, we will adjust the parameters opposite the direction of the gradients to minimize the loss, repeating until convergence.

You've seen these kinds of neural network models before, for language modeling in lab 2-3 and sequence labeling in lab 2-5. The code there should be very helpful in implementing an `RNNTagger` class below. Consequently, we've provided very little guidance on the implementation. We do recommend you follow the steps below however.

## Goal 2(a): RNN training

Implement the forward pass of the RNN tagger and the loss function. A reasonable way to proceed is to implement the following methods:

1. `forward(self, text_batch)`: Performs the RNN forward computation over a whole `text_batch` (`batch.text` in the above data loading example). The `text_batch` will be of shape `max_length x batch_size`. You might run it through the following layers: an embedding layer, which maps each token index to an embedding of size `embedding_size` (so that the size of the mapped batch becomes `max_length x batch_size x embedding_size`); then an RNN, which maps each token embedding to a vector of `hidden_size` (the size of all outputs is `max_length x batch_size x hidden_size`); then a linear layer, which maps each RNN output element to a vector of size $N$ (which is commonly referred to as "logits", recall that $N=|Q|$, the size of the tag set).

This function is expected to return `logits`, which provides a logit for each tag of each word of each sentence in the batch (structured as a tensor of size `max_length x batch_size x N`). 

> You might find the following functions useful: 
>
> * [`nn.Embedding`](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html)
> * [`nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)
> * [`nn.RNN`](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html)

2. `compute_loss(self, logits, tags)`: Computes the loss for a batch by comparing `logits` of a batch returned by `forward` to `tags`, which stores the true tag ids for the batch. Thus `logits` is a tensor of size `max_length x batch_size x N`, and `tags` is a tensor of size `max_length x batch_size`. Note that the criterion functions in `torch` expect outputs of a certain shape, so you might need to perform some shape conversions.

> You might find [`nn.CrossEntropyLoss`](https://pytorch.org/docs/master/generated/torch.nn.CrossEntropyLoss.html) from the last project segment useful. Note that if you use `nn.CrossEntropyLoss` then you should not use a softmax layer at the end since that's already absorbed into the loss function. Alternatively, you can use [`nn.LogSoftmax`](https://pytorch.org/docs/stable/generated/torch.nn.LogSoftmax.html) as the final sublayer in the forward pass, but then you need to use [`nn.NLLLoss`](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html), which does not contain its own softmax. We recommend the former, since working in log space is usually more numerically stable.

> Be careful about the shapes/dimensions of tensors. You might find [`torch.Tensor.view`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view) useful for reshaping tensors.

3. `train_all(self, train_iter, val_iter, epochs=10, learning_rate=0.001)`: Trains the model on training data generated by the iterator `train_iter` and validation data `val_iter`.The `epochs` and `learning_rate` variables are the number of epochs (number of times to run through the training data) to run for and the learning rate for the optimizer, respectively. You can use the validation data to determine which model was the best one as the epocks go by. Notice that our code below assumes that during training the best model is stored so that `rnn_tagger.load_state_dict(rnn_tagger.best_model)` restores the parameters of the best model.

## Goal 2(b) RNN decoding

Implement a method to predict the tag sequence associated with a sequence of words:

1. `predict(self, text_batch)`: Returns the batched predicted tag sequences associated with a batch of sentences.
2. `def evaluate(self, iterator)`: Returns the accuracy of the trained tagger on a dataset provided by `iterator`.

In [16]:
class RNNTagger(nn.Module):
  def __init__(self, text, tag, embedding_size, hidden_size):
    super().__init__()
    self.text = text
    self.tag = tag
    self.N = len(tag.vocab.itos)   # tag vocab size
    self.V = len(text.vocab.itos)  # text vocab size
    self.embedding_size = embedding_size
    self.hidden_size = hidden_size

    # Create essential modules
    self.word_embeddings = nn.Embedding(self.V, embedding_size) # Lookup layer
    self.rnn = nn.RNN(input_size=embedding_size, 
                      hidden_size=hidden_size)
    self.hidden2output = nn.Linear(hidden_size, self.N)

    # Create loss function
    pad_id = self.tag.vocab.stoi[self.tag.pad_token]
    self.loss_function = nn.CrossEntropyLoss(reduction='sum', ignore_index=pad_id)

    # Initialize parameters
    self.init_parameters()

  def init_parameters(self, init_low=-0.15, init_high=0.15):
    for p in self.parameters():
      p.data.uniform_(init_low, init_high)

  def compute_loss(self, logits, tags):
    return self.loss_function(logits.view(-1, self.N), tags.view(-1))

  def train_all(self, train_iter, val_iter, epochs=10, learning_rate=0.001):
    self.train()
    optim = torch.optim.Adam(self.parameters(), lr=learning_rate)
    best_validation_accuracy = -float('inf')
    best_model = None
    for epoch in range(epochs): 
      total = 0
      running_loss = 0.0
      for batch in tqdm(train_iter):
        self.zero_grad()

        words = batch.text 
        tags = batch.tag 
        
        logits = self.forward(words)
        loss = self.compute_loss(logits, tags)
        
        (loss/words.size(1)).backward()

        optim.step()

        total += 1
        running_loss += loss.item()
        
      validation_accuracy = self.evaluate(val_iter)
      if validation_accuracy > best_validation_accuracy:
        best_validation_accuracy = validation_accuracy
        self.best_model = copy.deepcopy(self.state_dict())
      epoch_loss = running_loss / total
      print (f'Epoch: {epoch} Loss: {epoch_loss:.4f} '
             f'Validation accuracy: {validation_accuracy:.4f}')


  def forward(self, text_batch):
    """Performs forward, returns logits.
    
    Arguments: 
      text_batch: a tensor containing word ids of size (seq_len, 1) 
    Returns:
      logits: a tensor of size (seq_len, 1, self.N)
    """
    word_embeddings = self.word_embeddings(text_batch)

    hidden = None
    
    output, hidden = self.rnn(word_embeddings, hidden)
    logits = self.hidden2output(output)
    return logits

  def predict(self, text_batch):
    tag_batch = self.forward(text_batch)
    tag_batch = tag_batch.argmax(dim = 2)
    return tag_batch
  def evaluate(self, iterator):
    correct = 0
    total = 0
    true_positive_comma = 0
    predicted_positive_comma = 0
    total_positive_comma = 0
    pad_id = TAG.vocab.stoi[TAG.pad_token]
    for batch in tqdm(iterator):
      words = batch.text
      tags = batch.tag
      tags_pred = self.predict(words)
      mask = tags.ne(pad_id)
      cor = (tags == tags_pred)[mask]
      correct += cor.float().sum().item()
      total += mask.float().sum().item()
    return correct/total


Now train your tagger on the training and validation set.
Run the cell below to train an RNN, and evaluate it. A proper implementation should reach about **95%+ accuracy**.

In [17]:
# Instantiate and train classifier
rnn_tagger = RNNTagger(TEXT, TAG, embedding_size=36, hidden_size=36).to(device)
rnn_tagger.train_all(train_iter, val_iter, epochs=10, learning_rate=0.001)
rnn_tagger.load_state_dict(rnn_tagger.best_model)

# Evaluate model performance
print(f'Training accuracy: {rnn_tagger.evaluate(train_iter):.3f}\n'
      f'Test accuracy:     {rnn_tagger.evaluate(test_iter):.3f}')

  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 0 Loss: 595.9720 Validation accuracy: 0.7079


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 1 Loss: 252.0207 Validation accuracy: 0.8632


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 2 Loss: 149.0425 Validation accuracy: 0.9102


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 3 Loss: 104.2413 Validation accuracy: 0.9335


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 4 Loss: 80.2405 Validation accuracy: 0.9409


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 5 Loss: 65.2468 Validation accuracy: 0.9479


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 6 Loss: 54.9352 Validation accuracy: 0.9540


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 7 Loss: 47.2794 Validation accuracy: 0.9591


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 8 Loss: 41.3691 Validation accuracy: 0.9627


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 9 Loss: 36.8522 Validation accuracy: 0.9649


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Training accuracy: 0.971
Test accuracy:     0.962


# LSTM for slot filling

Did your RNN perform better than HMM? How much better was it? Was that expected? 

RNNs tend to exhibit the [vanishing gradient problem](https://en.wikipedia.org/wiki/Vanishing_gradient_problem). To remedy this, the Long-Short Term Memory (LSTM) model was introduced. In PyTorch, we can simply use [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html). 

In this section, you'll implement an LSTM model for slot filling. If you've got the RNN model well implemented, this should be extremely straightforward. Just copy and paste your solution, change the call to `nn.RNN` to a call to `nn.LSTM`, and make any other minor adjustments that are necessary. In particular, LSTMs have _two_ recurrent parts, `h` and `c`. You'll thus need to initialize both of these when performing forward computations.

In [18]:
class LSTMTagger(nn.Module):
  def __init__(self, text, tag, embedding_size, hidden_size):
    super().__init__()
    self.text = text
    self.tag = tag
    self.N = len(tag.vocab.itos)   
    self.V = len(text.vocab.itos)
    self.embedding_size = embedding_size
    self.hidden_size = hidden_size

    
    self.word_embeddings = nn.Embedding(self.V, embedding_size) 
    self.rnn = nn.LSTM(input_size=embedding_size, 
                      hidden_size=hidden_size)
    self.hidden2output = nn.Linear(hidden_size, self.N)

    pad_id = self.tag.vocab.stoi[self.tag.pad_token]
    self.loss_function = nn.CrossEntropyLoss(reduction='sum', ignore_index=pad_id)

    self.init_parameters()

  def init_parameters(self, init_low=-0.15, init_high=0.15):
    for p in self.parameters():
      p.data.uniform_(init_low, init_high)

  def compute_loss(self, logits, tags):
    return self.loss_function(logits.view(-1, self.N), tags.view(-1))

  def train_all(self, train_iter, val_iter, epochs=10, learning_rate=0.001):
    self.train()
    optim = torch.optim.Adam(self.parameters(), lr=learning_rate)
    best_validation_accuracy = -float('inf')
    best_model = None
    for epoch in range(epochs): 
      total = 0
      running_loss = 0.0
      for batch in tqdm(train_iter):
        self.zero_grad()

        words = batch.text 
        tags = batch.tag 
        
        logits = self.forward(words)
        loss = self.compute_loss(logits, tags)
        (loss/words.size(1)).backward()
        optim.step()
        total += 1
        running_loss += loss.item()        
      validation_accuracy = self.evaluate(val_iter)
      if validation_accuracy > best_validation_accuracy:
        best_validation_accuracy = validation_accuracy
        self.best_model = copy.deepcopy(self.state_dict())
      epoch_loss = running_loss / total
      print (f'Epoch: {epoch} Loss: {epoch_loss:.4f} '
             f'Validation accuracy: {validation_accuracy:.4f}')


  def forward(self, text_batch):
    word_embeddings = self.word_embeddings(text_batch)
    hidden = torch.zeros(1,text_batch.shape[1], self.hidden_size, device = device)
    c= torch.zeros(1,text_batch.shape[1], self.hidden_size, device = device)
    output, (hidden,c) = self.rnn(word_embeddings, (hidden, c))
    logits = self.hidden2output(output)
    return logits

  def predict(self, text_batch):
    tag_batch = self.forward(text_batch)
    tag_batch = tag_batch.argmax(dim = 2)
    return tag_batch
  def evaluate(self, iterator):
    correct = 0
    total = 0
    true_positive_comma = 0
    predicted_positive_comma = 0
    total_positive_comma = 0
    pad_id = TAG.vocab.stoi[TAG.pad_token]
    for batch in tqdm(iterator):
      words = batch.text
      tags = batch.tag
      tags_pred = self.predict(words)
      mask = tags.ne(pad_id)
      cor = (tags == tags_pred)[mask]
      correct += cor.float().sum().item()
      total += mask.float().sum().item()
    return correct/total

Run the cell below to train an LSTM, and evaluate it. A proper implementation should reach about **95%+ accuracy**.

In [19]:
# Instantiate and train classifier
lstm_tagger = LSTMTagger(TEXT, TAG, embedding_size=36, hidden_size=36).to(device)
lstm_tagger.train_all(train_iter, val_iter, epochs=10, learning_rate=0.001)
lstm_tagger.load_state_dict(lstm_tagger.best_model)

# Evaluate model performance
print(f'Training accuracy: {lstm_tagger.evaluate(train_iter):.3f}\n'
      f'Test accuracy:     {lstm_tagger.evaluate(test_iter):.3f}')

  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 0 Loss: 660.1381 Validation accuracy: 0.7080


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 1 Loss: 293.2779 Validation accuracy: 0.8046


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 2 Loss: 218.0379 Validation accuracy: 0.8426


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 3 Loss: 171.3357 Validation accuracy: 0.8715


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 4 Loss: 143.2106 Validation accuracy: 0.8941


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 5 Loss: 120.4966 Validation accuracy: 0.9120


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 6 Loss: 100.7963 Validation accuracy: 0.9259


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 7 Loss: 85.5218 Validation accuracy: 0.9355


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 8 Loss: 73.9996 Validation accuracy: 0.9450


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 9 Loss: 64.6548 Validation accuracy: 0.9516


  0%|          | 0/214 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Training accuracy: 0.956
Test accuracy:     0.947


# Goal 4: Compare HMM to RNN/LSTM with different amounts of training data

Vary the amount of training data and compare the performance of HMM to RNN or LSTM (Since RNN is similar to LSTM, picking one of them is enough.) Discuss the pros and cons of HMM and RNN/LSTM based on your experiments.

> This part is more open-ended. We're looking for thoughtful experiments and analysis of the results, not any particular result or conclusion.

The code below shows how to subsample the training set with downsample ratio `ratio`. To speedup evaluation we only use 50 test samples.

In [20]:
ratio = 0.1
test_size = 50

# Set random seeds to make sure subsampling is the same for HMM and RNN
random.seed(seed)
torch.manual_seed(seed)

train, val, test = tt.datasets.SequenceTaggingDataset.splits(
            fields=fields, 
            path='./data/', 
            train='atis.train.txt', 
            validation='atis.dev.txt',
            test='atis.test.txt')

# Subsample
random.shuffle(train.examples)
train.examples = train.examples[:int(math.floor(len(train.examples)*ratio))]
random.shuffle(test.examples)
test.examples = test.examples[:test_size]

# Rebuild vocabulary
TEXT.build_vocab(train.text, min_freq=MIN_FREQ)
TAG.build_vocab(train.tag)

In [21]:
BATCH_SIZE = 20

train_iter, val_iter, test_iter = tt.data.BucketIterator.splits(
    (train, val, test), 
    batch_size=BATCH_SIZE, 
    repeat=False, 
    device=device)

# Instantiate and train classifier
lstm_tagger = LSTMTagger(TEXT, TAG, embedding_size=36, hidden_size=36).to(device)
lstm_tagger.train_all(train_iter, test_iter, epochs=100, learning_rate=0.002)
lstm_tagger.load_state_dict(lstm_tagger.best_model)

# Evaluate model performance
print(f'Training accuracy: {lstm_tagger.evaluate(train_iter):.3f}\n'
      f'Test accuracy:     {lstm_tagger.evaluate(test_iter):.3f}')

  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 0 Loss: 1164.8005 Validation accuracy: 0.6316


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 1 Loss: 729.5562 Validation accuracy: 0.6316


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 2 Loss: 506.7814 Validation accuracy: 0.6316


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 3 Loss: 472.3360 Validation accuracy: 0.6316


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 4 Loss: 432.2247 Validation accuracy: 0.6316


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 5 Loss: 371.4174 Validation accuracy: 0.6991


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 6 Loss: 318.6070 Validation accuracy: 0.7409


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 7 Loss: 286.3848 Validation accuracy: 0.7652


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 8 Loss: 268.1429 Validation accuracy: 0.7773


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 9 Loss: 254.6668 Validation accuracy: 0.7949


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 10 Loss: 242.9254 Validation accuracy: 0.7962


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 11 Loss: 232.7315 Validation accuracy: 0.8205


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 12 Loss: 223.7985 Validation accuracy: 0.8286


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 13 Loss: 215.1780 Validation accuracy: 0.8286


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 14 Loss: 206.7925 Validation accuracy: 0.8300


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 15 Loss: 197.7423 Validation accuracy: 0.8340


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 16 Loss: 188.5821 Validation accuracy: 0.8475


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 17 Loss: 179.7433 Validation accuracy: 0.8502


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 18 Loss: 171.5978 Validation accuracy: 0.8516


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 19 Loss: 164.0235 Validation accuracy: 0.8543


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 20 Loss: 157.3221 Validation accuracy: 0.8596


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 21 Loss: 151.1860 Validation accuracy: 0.8610


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 22 Loss: 145.3470 Validation accuracy: 0.8637


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 23 Loss: 139.8874 Validation accuracy: 0.8677


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 24 Loss: 134.8102 Validation accuracy: 0.8704


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 25 Loss: 130.0609 Validation accuracy: 0.8758


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 26 Loss: 125.3967 Validation accuracy: 0.8812


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 27 Loss: 120.9450 Validation accuracy: 0.8826


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 28 Loss: 116.6900 Validation accuracy: 0.8880


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 29 Loss: 112.6832 Validation accuracy: 0.8961


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 30 Loss: 108.8954 Validation accuracy: 0.8988


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 31 Loss: 105.3147 Validation accuracy: 0.9001


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 32 Loss: 101.7199 Validation accuracy: 0.9042


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 33 Loss: 98.4640 Validation accuracy: 0.9096


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 34 Loss: 95.2042 Validation accuracy: 0.9082


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 35 Loss: 92.3466 Validation accuracy: 0.9109


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 36 Loss: 89.4929 Validation accuracy: 0.9123


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 37 Loss: 86.7048 Validation accuracy: 0.9109


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 38 Loss: 84.1865 Validation accuracy: 0.9123


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 39 Loss: 81.4566 Validation accuracy: 0.9136


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 40 Loss: 79.1149 Validation accuracy: 0.9123


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 41 Loss: 76.6893 Validation accuracy: 0.9136


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 42 Loss: 74.0494 Validation accuracy: 0.9123


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 43 Loss: 71.7315 Validation accuracy: 0.9163


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 44 Loss: 69.5203 Validation accuracy: 0.9177


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 45 Loss: 67.5094 Validation accuracy: 0.9177


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 46 Loss: 65.4131 Validation accuracy: 0.9177


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 47 Loss: 63.6653 Validation accuracy: 0.9190


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 48 Loss: 61.6023 Validation accuracy: 0.9190


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 49 Loss: 59.7663 Validation accuracy: 0.9177


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 50 Loss: 58.1398 Validation accuracy: 0.9190


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 51 Loss: 56.6294 Validation accuracy: 0.9204


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 52 Loss: 54.9457 Validation accuracy: 0.9204


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 53 Loss: 53.4618 Validation accuracy: 0.9204


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 54 Loss: 52.1739 Validation accuracy: 0.9204


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 55 Loss: 50.8187 Validation accuracy: 0.9244


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 56 Loss: 49.3869 Validation accuracy: 0.9258


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 57 Loss: 48.3183 Validation accuracy: 0.9258


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 58 Loss: 47.0839 Validation accuracy: 0.9298


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 59 Loss: 45.8452 Validation accuracy: 0.9258


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 60 Loss: 44.8226 Validation accuracy: 0.9271


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 61 Loss: 43.6517 Validation accuracy: 0.9285


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 62 Loss: 42.6159 Validation accuracy: 0.9285


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 63 Loss: 41.7087 Validation accuracy: 0.9285


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 64 Loss: 40.6981 Validation accuracy: 0.9298


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 65 Loss: 39.7243 Validation accuracy: 0.9325


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 66 Loss: 38.7607 Validation accuracy: 0.9339


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 67 Loss: 37.7556 Validation accuracy: 0.9325


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 68 Loss: 36.9387 Validation accuracy: 0.9325


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 69 Loss: 35.9307 Validation accuracy: 0.9339


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 70 Loss: 35.6094 Validation accuracy: 0.9366


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 71 Loss: 34.4298 Validation accuracy: 0.9352


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 72 Loss: 33.4261 Validation accuracy: 0.9393


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 73 Loss: 32.6874 Validation accuracy: 0.9379


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 74 Loss: 31.8913 Validation accuracy: 0.9393


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 75 Loss: 31.1680 Validation accuracy: 0.9393


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 76 Loss: 30.4244 Validation accuracy: 0.9406


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 77 Loss: 29.8241 Validation accuracy: 0.9406


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 78 Loss: 29.0185 Validation accuracy: 0.9379


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 79 Loss: 28.3789 Validation accuracy: 0.9406


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 80 Loss: 27.6265 Validation accuracy: 0.9393


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 81 Loss: 27.1242 Validation accuracy: 0.9406


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 82 Loss: 26.4179 Validation accuracy: 0.9433


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 83 Loss: 25.7785 Validation accuracy: 0.9420


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 84 Loss: 25.3069 Validation accuracy: 0.9433


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 85 Loss: 24.6430 Validation accuracy: 0.9447


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 86 Loss: 24.1062 Validation accuracy: 0.9447


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 87 Loss: 23.5442 Validation accuracy: 0.9474


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 88 Loss: 22.9910 Validation accuracy: 0.9460


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 89 Loss: 22.4965 Validation accuracy: 0.9460


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 90 Loss: 21.9725 Validation accuracy: 0.9460


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 91 Loss: 21.6667 Validation accuracy: 0.9447


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 92 Loss: 20.9367 Validation accuracy: 0.9460


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 93 Loss: 20.5621 Validation accuracy: 0.9433


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 94 Loss: 20.1831 Validation accuracy: 0.9460


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 95 Loss: 19.6391 Validation accuracy: 0.9447


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 96 Loss: 19.2707 Validation accuracy: 0.9447


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 97 Loss: 18.8481 Validation accuracy: 0.9420


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 98 Loss: 18.3798 Validation accuracy: 0.9420


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch: 99 Loss: 17.9149 Validation accuracy: 0.9420


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Training accuracy: 0.982
Test accuracy:     0.947


In [22]:
# Instantiate and train classifier
rnn_tagger = RNNTagger(TEXT, TAG, embedding_size=36, hidden_size=36).to(device)
rnn_tagger.train_all(train_iter, val_iter, epochs=80, learning_rate=0.0015)
rnn_tagger.load_state_dict(rnn_tagger.best_model)

# Evaluate model performance
print(f'Training accuracy: {rnn_tagger.evaluate(train_iter):.3f}\n'
      f'Test accuracy:     {rnn_tagger.evaluate(test_iter):.3f}')

  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 0 Loss: 1152.8002 Validation accuracy: 0.6392


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 1 Loss: 669.7629 Validation accuracy: 0.6392


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 2 Loss: 485.2954 Validation accuracy: 0.7080


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 3 Loss: 452.5795 Validation accuracy: 0.7080


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 4 Loss: 417.0802 Validation accuracy: 0.7080


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 5 Loss: 367.6282 Validation accuracy: 0.7117


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 6 Loss: 318.0256 Validation accuracy: 0.7613


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 7 Loss: 276.7628 Validation accuracy: 0.8175


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 8 Loss: 252.0122 Validation accuracy: 0.8264


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 9 Loss: 233.0329 Validation accuracy: 0.8289


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 10 Loss: 216.3448 Validation accuracy: 0.8331


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 11 Loss: 201.1860 Validation accuracy: 0.8412


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 12 Loss: 187.1882 Validation accuracy: 0.8510


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 13 Loss: 174.5900 Validation accuracy: 0.8637


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 14 Loss: 163.9683 Validation accuracy: 0.8668


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 15 Loss: 154.0600 Validation accuracy: 0.8763


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 16 Loss: 144.9821 Validation accuracy: 0.8821


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 17 Loss: 136.3436 Validation accuracy: 0.8861


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 18 Loss: 128.5245 Validation accuracy: 0.8924


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 19 Loss: 121.5193 Validation accuracy: 0.8988


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 20 Loss: 115.1537 Validation accuracy: 0.9058


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 21 Loss: 109.2503 Validation accuracy: 0.9078


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 22 Loss: 103.7760 Validation accuracy: 0.9149


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 23 Loss: 98.7082 Validation accuracy: 0.9165


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 24 Loss: 94.1448 Validation accuracy: 0.9208


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 25 Loss: 90.0012 Validation accuracy: 0.9229


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 26 Loss: 86.3028 Validation accuracy: 0.9237


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 27 Loss: 82.6139 Validation accuracy: 0.9255


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 28 Loss: 79.4081 Validation accuracy: 0.9255


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 29 Loss: 76.1397 Validation accuracy: 0.9272


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 30 Loss: 73.1251 Validation accuracy: 0.9282


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 31 Loss: 70.5936 Validation accuracy: 0.9284


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 32 Loss: 67.9586 Validation accuracy: 0.9299


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 33 Loss: 66.0022 Validation accuracy: 0.9308


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 34 Loss: 63.5215 Validation accuracy: 0.9321


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 35 Loss: 61.3880 Validation accuracy: 0.9332


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 36 Loss: 59.4960 Validation accuracy: 0.9325


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 37 Loss: 57.4869 Validation accuracy: 0.9337


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 38 Loss: 56.1244 Validation accuracy: 0.9343


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 39 Loss: 54.8657 Validation accuracy: 0.9341


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 40 Loss: 52.7875 Validation accuracy: 0.9341


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 41 Loss: 51.1209 Validation accuracy: 0.9356


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 42 Loss: 49.4725 Validation accuracy: 0.9357


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 43 Loss: 48.1085 Validation accuracy: 0.9362


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 44 Loss: 46.8881 Validation accuracy: 0.9372


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 45 Loss: 45.4710 Validation accuracy: 0.9377


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 46 Loss: 44.1752 Validation accuracy: 0.9371


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 47 Loss: 43.4787 Validation accuracy: 0.9400


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 48 Loss: 41.9208 Validation accuracy: 0.9396


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 49 Loss: 40.8281 Validation accuracy: 0.9404


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 50 Loss: 39.5801 Validation accuracy: 0.9413


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 51 Loss: 38.6107 Validation accuracy: 0.9428


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 52 Loss: 37.5272 Validation accuracy: 0.9413


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 53 Loss: 36.4110 Validation accuracy: 0.9432


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 54 Loss: 35.5190 Validation accuracy: 0.9431


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 55 Loss: 34.5852 Validation accuracy: 0.9439


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 56 Loss: 33.6875 Validation accuracy: 0.9436


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 57 Loss: 32.8358 Validation accuracy: 0.9447


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 58 Loss: 32.0085 Validation accuracy: 0.9444


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 59 Loss: 31.3591 Validation accuracy: 0.9453


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 60 Loss: 30.1043 Validation accuracy: 0.9444


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 61 Loss: 29.3062 Validation accuracy: 0.9449


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 62 Loss: 28.7053 Validation accuracy: 0.9448


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 63 Loss: 27.9665 Validation accuracy: 0.9443


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 64 Loss: 27.1021 Validation accuracy: 0.9445


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 65 Loss: 26.5790 Validation accuracy: 0.9434


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 66 Loss: 25.5676 Validation accuracy: 0.9457


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 67 Loss: 24.9866 Validation accuracy: 0.9451


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 68 Loss: 24.1541 Validation accuracy: 0.9457


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 69 Loss: 23.4566 Validation accuracy: 0.9455


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 70 Loss: 22.9045 Validation accuracy: 0.9455


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 71 Loss: 22.1878 Validation accuracy: 0.9445


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 72 Loss: 21.6879 Validation accuracy: 0.9438


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 73 Loss: 21.0574 Validation accuracy: 0.9451


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 74 Loss: 20.3370 Validation accuracy: 0.9448


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 75 Loss: 19.7363 Validation accuracy: 0.9455


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 76 Loss: 19.0956 Validation accuracy: 0.9455


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 77 Loss: 18.6547 Validation accuracy: 0.9455


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 78 Loss: 18.1163 Validation accuracy: 0.9454


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

Epoch: 79 Loss: 17.6014 Validation accuracy: 0.9466


  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Training accuracy: 0.989
Test accuracy:     0.934


In [23]:
# Instantiate and train classifier
hmm_tagger = HMMTagger(TEXT, TAG)
hmm_tagger.train_all(train_iter)

# Evaluate model performance
print(f'Training accuracy: {hmm_tagger.evaluate(train_iter):.3f}\n'
      f'Test accuracy:     {hmm_tagger.evaluate(test_iter):.3f}')

  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Training accuracy: 0.913
Test accuracy:     0.873


_Type your answer here, replacing this text._

<!-- BEGIN QUESTION -->

# Debrief

**Question:** We're interested in any thoughts you have about this project segment so that we can improve it for later years, and to inform later segments for this year. Please list any issues that arose or comments you have to improve the project segment. Useful things to comment on include the following: 

* Was the project segment clear or unclear? Which portions?
* Were the readings appropriate background for the project segment? 
* Are there additions or changes you think would make the project segment better?

<!--
BEGIN QUESTION
name: open_response_debrief
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



# Instructions for submission of the project segment

This project segment should be submitted to Gradescope at <https://rebrand.ly/project2-submit-code> and <https://rebrand.ly/project2-submit-pdf>, which will be made available some time before the due date.

Project segment notebooks are manually graded, not autograded using otter as labs are. (Otter is used within project segment notebooks to synchronize distribution and solution code however.) **We will not run your notebook before grading it.** Instead, we ask that you submit the already freshly run notebook. The best method is to "restart kernel and run all cells", allowing time for all cells to be run to completion. You should submit your code to Gradescope at the code submission assignment at <https://rebrand.ly/project2-submit-code>.

We also request that you **submit a PDF of the freshly run notebook**. The simplest method is to use "Export notebook to PDF", which will render the notebook to PDF via LaTeX. If that doesn't work, the method that seems to be most reliable is to export the notebook as HTML (if you are using Jupyter Notebook, you can do so using `File -> Print Preview`), open the HTML in a browser, and print it to a file. Then make sure to add the file to your git commit. Please name the file the same name as this notebook, but with a `.pdf` extension. (Conveniently, the methods just described will use that name by default.) You can then perform a git commit and push and submit the commit to Gradescope at <https://rebrand.ly/project2-submit-pdf>.