# NLP Homework 4 Programming Assignment

In this assignment, we will train and evaluate a neural model to tag the parts of speech in a sentence.
We will also implement several improvements to the model to test its performance.

We will be using English text from the Wall Street Journal, marked with POS tags such as `NNP` (proper noun) and `DT` (determiner).

## Building a POS Tagger


### Setup

In [167]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import random

random.seed(1)

### Preparing Data

The relevant data is present in the files `train.txt` and `test.txt`.

`train.txt`: The training data is present in this file. The file contains sequences of words and their respective tags. The data is split into 80% training and 20% development to train the model and tune the hyperparameters, respectively. See `load_tag_data` for details on how to read the training data.

`test.txt`: The test data is present in the file. Use this file only for final evaluation of the trained models. See `load_txt_data` for details on how to read the test data.

In [186]:
def load_tag_data(tag_file):
    all_sentences = []
    all_tags = []
    sent = []
    tags = []
    with open(tag_file, 'r') as f:
        for line in f:
            if line.strip() == "":
                all_sentences.append(sent)
                all_tags.append(tags)
                sent = []
                tags = []
            else:
                word, tag, _ = line.strip().split()
                sent.append(word)
                tags.append(tag)
    return all_sentences, all_tags

def load_txt_data(txt_file):
    all_sentences = []
    sent = []
    with open(txt_file, 'r') as f:
        for line in f:
            if(line.strip() == ""):
                all_sentences.append(sent)
                sent = []
            else:
                word = line.strip()
                sent.append(word)
    return all_sentences

train_sentences, train_tags = load_tag_data('train.txt')
test_sentences = load_txt_data('test.txt')

unique_tags = set([tag for tag_seq in train_tags for tag in tag_seq])

# Create train-val split from train data
train_val_data = list(zip(train_sentences, train_tags))
random.shuffle(train_val_data)
split = int(0.8 * len(train_val_data))
training_data = train_val_data[:split]
val_data = train_val_data[split:]

print("Train Data: ", len(training_data))
print("Val Data: ", len(val_data))
print("Test Data: ", len(test_sentences))
print("Total tags: ", len(unique_tags))
print(unique_tags)

Train Data:  7148
Val Data:  1788
Test Data:  2012
Total tags:  44
{'VBD', 'PDT', 'PRP$', 'VB', ',', 'NNPS', 'WP$', ')', 'RP', 'POS', 'JJS', 'CD', ':', 'CC', '$', 'MD', 'NNP', 'DT', 'FW', 'WDT', 'VBG', 'RBR', 'WRB', 'IN', 'VBN', 'EX', 'VBZ', '#', "''", 'NNS', '``', '(', 'VBP', 'RBS', 'SYM', 'NN', 'RB', 'UH', 'JJ', 'WP', 'TO', '.', 'PRP', 'JJR'}


### Word-to-Index and Tag-to-Index mapping
In order to work with text in Tensor format, we need to map each word to an index.

In [169]:
word_to_idx = {}
for sent in train_sentences:
    for word in sent:
        if word not in word_to_idx:
            word_to_idx[word] = len(word_to_idx)

for sent in test_sentences:
    for word in sent:
        if word not in word_to_idx:
            word_to_idx[word] = len(word_to_idx)
            
tag_to_idx = {}
for tag in unique_tags:
    if tag not in tag_to_idx:
        tag_to_idx[tag] = len(tag_to_idx)

idx_to_tag = {}
for tag in tag_to_idx:
    idx_to_tag[tag_to_idx[tag]] = tag

print("Total tags", len(tag_to_idx))
print("Vocab size", len(word_to_idx))

Total tags 44
Vocab size 21589


In [170]:
def prepare_sequence(sent, idx_mapping):
    idxs = [idx_mapping[word] for word in sent]
    return torch.tensor(idxs, dtype=torch.long)

### Set up model
We will build and train a Basic POS Tagger which is an LSTM model to tag the parts of speech in a given sentence.


First we need to define some default hyperparameters.

In [171]:
EMBEDDING_DIM = 6
HIDDEN_DIM = 3
LEARNING_RATE = 0.1
LSTM_LAYERS = 1
DROPOUT = 0
EPOCHS = 30
VOCAB_SIZE = len(word_to_idx)
TAGSET_SIZE = len(tag_to_idx)

### Define Model

The model takes as input a sentence as a tensor in the index space. This sentence is then converted to embedding space where each word maps to its word embedding. The word embeddings is learned as part of the model training process. 

These word embeddings act as input to the LSTM which produces a hidden state. This hidden state is then passed to a Linear layer that produces the probability distribution for the tags of every word. The model will output the tag with the highest probability for a given word.

In [172]:
class BasicPOSTagger(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size, dropout=0):
        super(BasicPOSTagger, self).__init__()
        #############################################################################
        # TODO: Define and initialize anything needed for the forward pass.
        # You are required to create a model with:
        # an embedding layer: that maps words to the embedding space
        # an LSTM layer: that takes word embeddings as input and outputs hidden states
        # a Linear layer: maps from hidden state space to tag space
        #############################################################################
        self.embedding_dim = embedding_dim
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=1)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim)        
        self.fc = nn.Linear(hidden_dim, tagset_size)
        self.dropout = nn.Dropout(dropout)
#         print('tagset_size', tagset_size)
        #############################################################################
        #                             END OF YOUR CODE                              #
        #############################################################################

    def forward(self, sentence):
        tag_scores = None
        #############################################################################
        # TODO: Implement the forward pass.
        # Given a tokenized index-mapped sentence as the argument, 
        # compute the corresponding scores for tags
        # returns:: tag_scores (Tensor)
        #############################################################################
        embedded = self.dropout(self.embedding(sentence))
        embedded = embedded.reshape(embedded.shape[0], 1, self.embedding_dim)
        outputs, (hidden, cell) = self.lstm(embedded)
        tag_scores = self.fc(outputs.squeeze(0))
#         print('output shape =>', tag_scores.shape)
#         print('sen shape =>', sentence.shape)
        #############################################################################
        #                             END OF YOUR CODE                              #
        #############################################################################
        return tag_scores

### Training

We define train and evaluate procedures that allow us to train our model using our created train-val split.

In [173]:
def train(epoch, model, loss_function, optimizer):
    train_loss = 0
    train_examples = 0
    for sentence, tags in training_data:
        #############################################################################
        # TODO: Implement the training loop
        # Hint: you can use the prepare_sequence method for creating index mappings 
        # for sentences. Find the gradient with respect to the loss and update the
        # model parameters using the optimizer.
        #############################################################################       
        tags = prepare_sequence(tags, tag_to_idx)
#         print('tags', tags.shape, tags)
        sentence = prepare_sequence(sentence, word_to_idx)
        tag_scores = model(sentence)
        tag_scores = tag_scores.view(-1, tag_scores.shape[-1])
        
#         print('tag_scores', tag_scores)
        
        loss = loss_function(tag_scores, tags)
        model.zero_grad()
        loss.backward()        
        optimizer.step()
        
        train_loss += loss.item()
        train_examples += len(sentence)
        
#         print(torch.norm(model.embedding.weight))
        #############################################################################
        #                             END OF YOUR CODE                              #
        #############################################################################
    
    avg_train_loss = train_loss / train_examples
    avg_val_loss, val_accuracy = evaluate(model, loss_function, optimizer)
        
    print("Epoch: {}/{}\tAvg Train Loss: {:.4f}\tAvg Val Loss: {:.4f}\t Val Accuracy: {:.0f}".format(epoch, 
                                                                      EPOCHS, 
                                                                      avg_train_loss, 
                                                                      avg_val_loss,
                                                                      val_accuracy))

def evaluate(model, loss_function, optimizer):
  # returns:: avg_val_loss (float)
  # returns:: val_accuracy (float)
    val_loss = 0
    correct = 0
    val_examples = 0
    with torch.no_grad():
        for sentence, tags in val_data:
            #############################################################################
            # TODO: Implement the evaluate loop
            # Find the average validation loss along with the validation accuracy.
            # Hint: To find the accuracy, argmax of tag predictions can be used.
            #############################################################################
            
            tag_scores = model(prepare_sequence(sentence, word_to_idx))
            tag_scores = tag_scores.view(-1, tag_scores.shape[-1])
            tags = prepare_sequence(tags, tag_to_idx)
            
            loss = loss_function(tag_scores, tags)
            val_loss += loss
            
            max_preds = tag_scores.argmax(dim = 1, keepdim = True).reshape(1,-1) # get the index of the max probability
            tags = tags.reshape(1, -1)
            correct += int(torch.sum(max_preds==tags))
            val_examples += max_preds.shape[1] 
            
            #############################################################################
            #                             END OF YOUR CODE                              #
            #############################################################################
    val_accuracy = 100. * correct / val_examples
    avg_val_loss = val_loss / val_examples
    return avg_val_loss, val_accuracy

In [174]:
#############################################################################
# TODO: Initialize the model, optimizer and the loss function
#############################################################################
model = BasicPOSTagger(EMBEDDING_DIM, HIDDEN_DIM, VOCAB_SIZE, TAGSET_SIZE)

def init_weights(m):
    for name, param in m.named_parameters():
        nn.init.normal_(param.data, mean = 0, std = 0.1)
        
model.apply(init_weights)

device = torch.device('cpu')
model = model.to(device)
# optimizer = optim.Adam(model.parameters(), lr=0.0001)
optimizer = optim.SGD(model.parameters(), lr = LEARNING_RATE)
loss_function = nn.CrossEntropyLoss()
loss_function = loss_function.to(device)

#############################################################################
#                             END OF YOUR CODE                              #
#############################################################################
for epoch in range(1, EPOCHS + 1): 
    train(epoch, model, loss_function, optimizer)

Epoch: 1/30	Avg Train Loss: 0.1238	Avg Val Loss: 0.1077	 Val Accuracy: 33
Epoch: 2/30	Avg Train Loss: 0.0892	Avg Val Loss: 0.0768	 Val Accuracy: 49
Epoch: 3/30	Avg Train Loss: 0.0658	Avg Val Loss: 0.0605	 Val Accuracy: 60
Epoch: 4/30	Avg Train Loss: 0.0523	Avg Val Loss: 0.0516	 Val Accuracy: 67
Epoch: 5/30	Avg Train Loss: 0.0445	Avg Val Loss: 0.0468	 Val Accuracy: 76
Epoch: 6/30	Avg Train Loss: 0.0392	Avg Val Loss: 0.0429	 Val Accuracy: 79
Epoch: 7/30	Avg Train Loss: 0.0357	Avg Val Loss: 0.0407	 Val Accuracy: 80
Epoch: 8/30	Avg Train Loss: 0.0331	Avg Val Loss: 0.0390	 Val Accuracy: 81
Epoch: 9/30	Avg Train Loss: 0.0306	Avg Val Loss: 0.0372	 Val Accuracy: 81
Epoch: 10/30	Avg Train Loss: 0.0285	Avg Val Loss: 0.0356	 Val Accuracy: 82
Epoch: 11/30	Avg Train Loss: 0.0266	Avg Val Loss: 0.0349	 Val Accuracy: 83
Epoch: 12/30	Avg Train Loss: 0.0252	Avg Val Loss: 0.0335	 Val Accuracy: 84
Epoch: 13/30	Avg Train Loss: 0.0239	Avg Val Loss: 0.0325	 Val Accuracy: 85
Epoch: 14/30	Avg Train Loss: 0.022

**Sanity Check!** After 5 epochs you should get around 53% accuracy, after 10 epochs the accuracy should be around 61%, the accuracy rises to around 70% after 20 epochs, and to around 75% accuracy after 30 epochs.

**Note1** If you run the notebook on CPU, it could take time to run all 30 epochs. So try the small number of epochs first to sanity check and then run the model for all 30 epochs. You're free to use GPU for this assignment.

**Note2** Your accuracy may not match exactly to what is reported here but as long as the trend is increasing, it should be good.

**Note3** The reported numbers are with the default hyperparameters. If you reach the desired accuracy, you can try different hyperparameter settings to improve the accuracy.

Write a method to save the predictions for the test set.

In [175]:
def test():
    val_loss = 0
    correct = 0
    val_examples = 0
    predicted_tags = []
    with torch.no_grad():
        for sentence in test_sentences:
            #############################################################################
            # TODO: Implement the test loop
            # This method saves the predicted tags for the sentences in the test set.
            # The tags are first added to a list which is then written to a file for
            # submission. An empty string is added after every sequence of tags
            # corresponding to a sentence to add a newline following file formatting
            # convention, as has been done already.
            #############################################################################
            tag_scores = model(prepare_sequence(sentence, word_to_idx))
            tag_scores = tag_scores.view(-1, tag_scores.shape[-1])
            max_preds = tag_scores.argmax(dim = 1, keepdim = True)
            for tag in max_preds:
                tag = int(tag)
                predicted_tags.append(idx_to_tag[tag])           
            #############################################################################
            #                             END OF YOUR CODE                              #
            #############################################################################
            predicted_tags.append("")

    with open('test_labels.txt', 'w+') as f:
        for item in predicted_tags:
            f.write("%s\n" % item)

In [176]:
test()


### Test accuracy

Evaluate your performance on the test data by submitting test_labels.txt generated by the method above and **report your test accuracy here**.

**Accuracy** is **87.445%**

Imitate the above method to generate prediction for validation data.
Create lists of words, tags predicted by the model and ground truth tags. 

Use these lists to carry out error analysis to find the top-10 types of errors made by the model.

In [177]:
#############################################################################
# TODO: Generate predictions from val data
# Create lists of words, tags predicted by the model and ground truth tags.
#############################################################################
def generate_predictions(model, test_sentences):
    # returns:: word_list (str list)
    # returns:: model_tags (str list)
    # returns:: gt_tags (str list)
    # Your code here
    word_list, model_tags, gt_tags = [], [], []
    for sentence, tags in test_sentences:
        tag_scores = model(prepare_sequence(sentence, word_to_idx))
        tag_scores = tag_scores.view(-1, tag_scores.shape[-1])

        max_preds = tag_scores.argmax(dim = 1, keepdim = True).reshape(1,-1) # get the index of the max probability
        model_tags += [idx_to_tag[idx] for idx in max_preds.tolist()[0]]   
        word_list += sentence
        gt_tags += tags
    
    return word_list, model_tags, gt_tags

#############################################################################
# TODO: Carry out error analysis
# From those lists collected from the above method, find the 
# top-10 tuples of (model_tag, ground_truth_tag, frequency, example words)
# sorted by frequency
#############################################################################
def error_analysis(word_list, model_tags, gt_tags):
    # returns: errors (list of tuples)
    # Your code here
    
    error_dict = dict()
    for idx, word in enumerate(word_list):
        if model_tags[idx] != gt_tags[idx]:
            tuple_error = (model_tags[idx],gt_tags[idx])
            if tuple_error in error_dict:
                error_dict[tuple_error] += [word]
            else:
                error_dict[tuple_error] = [word]
    
    return sorted(error_dict.items(), key=lambda x: len(x[1]), reverse=True)

word_list, model_tags, gt_tags = generate_predictions(model, val_data)
print("gt_tag\t| model_tag\t| freq.\t| examples")
horizontal_line = "-------------------------------------------------------------------------"
errors = error_analysis(word_list, model_tags, gt_tags)[:10]
print(horizontal_line)
for error in errors:
    tags, examples = error
    model_tag, gt_tag = tags
    num_errors, example_words = len(examples), examples
    print("{}\t| {}\t\t| {}\t| {}\n{}".format(gt_tag, model_tag, num_errors, random.sample(example_words, 5), horizontal_line))

gt_tag	| model_tag	| freq.	| examples
-------------------------------------------------------------------------
VBP	| VB		| 256	| ['say', 'expect', 'say', 'lose', 'rely']
-------------------------------------------------------------------------
VBN	| VBD		| 235	| ['signed', 'failed', 'produced', 'considered', 'measured']
-------------------------------------------------------------------------
NNP	| JJ		| 190	| ['Peoria', 'G.D.', 'Pot', 'Hollister', 'London']
-------------------------------------------------------------------------
NN	| NNP		| 182	| ['Return', 'medical-airlift', 'craft', 'laughingstock', 'minimun']
-------------------------------------------------------------------------
NN	| JJ		| 178	| ['fixed-income', 'pad', 'manuevering', 'symptom', 'ranch']
-------------------------------------------------------------------------
POS	| VBZ		| 169	| ["'s", "'s", "'s", "'s", "'s"]
-------------------------------------------------------------------------
JJ	| NNP		| 150	| ['Cautious'

### Error analysis
**Report your findings here.**  
What kinds of errors did the model make and why do you think it made them?

Answer:
The model is not good at distinguishing different types of verbs.
VBD	| VBN and VBN | VBD can happen b/c many words vbn and vbd are the same.
Also for VBP | VB, many verb's vbp and vb are the same.

JJ | NNP and NN | JJ is a type of error that we often make in real life. Many words have Ambiguity, and they can be adj and noun, it is hard for model to distinguish them.



Also NNP,JJ, NN are confusing model since their position in the sentence often similar.

In general, I think the model did a pretty decent job with the POS tagging. Different noun is very easy to be mistaken in this model, because LSTM does not have enough information to distinguish them only according to location. So we need a more accurate way to do POS tagging, we need to include the information of the word itself.



## II. Character level PoS Tagger

Use the character-level information present to augment word embeddings. Words that end with -ing or -ly give quite a bit of information about their POS tags. To incorporate this information, run a character level LSTM on every word (treated as a tensor of characters, each mapped to character-index space) to create a character-level representation of the word. Take the last hidden state from the character level LSTM as the representation and concatenate with the word embedding (as in the WordLSTMPoSTagger) to create a new word embedding that captures more information.

In [213]:
# Create char to index mapping
char_to_idx = {}
unique_chars = set()
MAX_WORD_LEN = 0

for sent in train_sentences:
    for word in sent:
        for c in word:
            unique_chars.add(c)
        if len(word) > MAX_WORD_LEN:
            MAX_WORD_LEN = len(word)

for c in unique_chars:
    char_to_idx[c] = len(char_to_idx)
char_to_idx[' '] = len(char_to_idx)

# New Hyperparameters
EMBEDDING_DIM = 6
HIDDEN_DIM = 3
LEARNING_RATE = 0.1
LSTM_LAYERS = 1
DROPOUT = 0
EPOCHS = 30
CHAR_EMBEDDING_DIM = 3
CHAR_HIDDEN_DIM = 3
print(char_to_idx)
print('MAX_WORD_LEN', MAX_WORD_LEN)

for sent in test_sentences:
    for word in sent:
        for c in word:
            unique_chars.add(c)
        if len(word) > MAX_WORD_LEN:
            MAX_WORD_LEN = len(word)
print('MAX_WORD_LEN', MAX_WORD_LEN)

{'o': 0, '&': 1, 't': 2, 'a': 3, ':': 4, '$': 5, 'B': 6, 'M': 7, 'S': 8, 'T': 9, 'W': 10, 'A': 11, '-': 12, 'i': 13, '2': 14, 'q': 15, 'G': 16, "'": 17, '5': 18, 'z': 19, 'x': 20, 'k': 21, 'h': 22, ';': 23, '9': 24, 'f': 25, 'y': 26, 'e': 27, 'F': 28, '#': 29, 'N': 30, 'V': 31, 'Y': 32, '%': 33, '.': 34, 'X': 35, 'n': 36, '7': 37, 'u': 38, ',': 39, 'r': 40, 'Z': 41, '`': 42, '4': 43, 'd': 44, 'J': 45, 'Q': 46, '8': 47, 'g': 48, '!': 49, '1': 50, 'H': 51, 'E': 52, 'O': 53, 'U': 54, 'I': 55, 'l': 56, 'C': 57, '3': 58, '?': 59, 'L': 60, 'R': 61, '0': 62, 'c': 63, '*': 64, 'w': 65, '@': 66, 'j': 67, 'D': 68, '=': 69, 'P': 70, 'K': 71, 'b': 72, 'v': 73, 'm': 74, '6': 75, 's': 76, '\\': 77, 'p': 78, '/': 79, ' ': 80}
MAX_WORD_LEN 43
MAX_WORD_LEN 54


In [179]:
# class CharPOSTagger(nn.Module):
#     def __init__(self, embedding_dim, hidden_dim, char_embedding_dim, 
#                  char_hidden_dim, char_size, vocab_size, tagset_size):
#         super(CharPOSTagger, self).__init__()
#         #############################################################################
#         # TODO: Define and initialize anything needed for the forward pass.
#         # You are required to create a model with:
#         # an embedding layer: that maps words to the embedding space
#         # an char level LSTM: that finds the character level embedding for a word
#         # an LSTM layer: that takes the combined embeddings as input and outputs hidden states
#         # a Linear layer: maps from hidden state space to tag space
#         #############################################################################
#         self.embedding_dim = embedding_dim
#         self.char_embedding_dim = char_embedding_dim
        
#         self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=1)
#         self.lstm = nn.LSTM(char_hidden_dim + embedding_dim, hidden_dim)  
        
#         self.embedding_c = nn.Embedding(MAX_WORD_LEN, char_embedding_dim, padding_idx=1) #char_size + 1, char_embedding_dim
#         self.lstm_c = nn.LSTM(char_embedding_dim, char_hidden_dim) 
        
#         self.fc = nn.Linear(hidden_dim, tagset_size)
#         #############################################################################
#         #                             END OF YOUR CODE                              #
#         #############################################################################

#     def forward(self, sentence, chars):
#         tag_scores = None
#         #############################################################################
#         # TODO: Implement the forward pass.
#         # Given a tokenized index-mapped sentence and a character sequence as the arguments, 
#         # find the corresponding scores for tags
#         # returns:: tag_scores (Tensor)
#         #############################################################################
#         embedded = self.embedding(sentence)
# #         print(sentence)
# #         chars = torch.LongTensor([char.numpy() for char in chars])
#         chars = torch.cat(chars).view(len(chars), 1, -1)
    
#         embedded_c = self.embedding_c(chars)
#         word_char_embeddings = torch.Tensor([])
#         for char in embedded_c:
#             char = char.view(embedded_c.shape[2], 1, -1)
#             outputs_c, _ = self.lstm_c(char)
#             word_char_embeddings = torch.cat((word_char_embeddings, outputs_c[-1]), dim = 0)
            
# #         print('word_char_embeddings', word_char_embeddings.shape, embedded.shape)
#         joint_emb = torch.cat((embedded, word_char_embeddings), dim=1)
#         joint_emb = joint_emb.reshape(joint_emb.shape[0], 1, -1)
        
#         outputs, _ = self.lstm(joint_emb)
        
#         tag_scores = self.fc(outputs.squeeze(0))
#         #############################################################################
#         #                             END OF YOUR CODE                              #
#         #############################################################################
#         return tag_scores

# def train_char(epoch, model, loss_function, optimizer):
#     train_loss = 0
#     train_examples = 0
#     for sentence, tags in training_data:
#         #############################################################################
#         # TODO: Implement the training loop
#         # Hint: you can use the prepare_sequence method for creating index mappings 
#         # for sentences as well as character sequences. Find the gradient with 
#         # respect to the loss and update the model parameters using the optimizer.
#         #############################################################################
#         tags = prepare_sequence(tags, tag_to_idx)
        
#         sentence_chars = []
#         for w in sentence:
#             spaces = ' ' * (MAX_WORD_LEN - len(w))
#             sentence_chars.append(list(spaces + w))
#         chars = [prepare_sequence(sentence_char, char_to_idx) for sentence_char in sentence_chars]
        
#         sentence = prepare_sequence(sentence, word_to_idx)
#         tag_scores = model(sentence, chars)
        
#         tag_scores = tag_scores.view(-1, tag_scores.shape[-1])
        
        
#         loss = loss_function(tag_scores, tags)
#         model.zero_grad()
#         loss.backward()        
#         optimizer.step()
        
#         train_loss += loss.item()
#         train_examples += len(sentence)
#         #############################################################################
#         #                             END OF YOUR CODE                              #
#         #############################################################################
    
#     avg_train_loss = train_loss / train_examples
#     avg_val_loss, val_accuracy = evaluate_char(model, loss_function, optimizer)
        
#     print("Epoch: {}/{}\tAvg Train Loss: {:.4f}\tAvg Val Loss: {:.4f}\t Val Accuracy: {:.0f}".format(epoch, 
#                                                                       EPOCHS, 
#                                                                       avg_train_loss, 
#                                                                       avg_val_loss,
#                                                                       val_accuracy))

# def evaluate_char(model, loss_function, optimizer):
#     # returns:: avg_val_loss (float)
#     # returns:: val_accuracy (float)
#     val_loss = 0
#     correct = 0
#     val_examples = 0
#     with torch.no_grad():
#         for sentence, tags in val_data:
#             #############################################################################
#             # TODO: Implement the evaluate loop
#             # Find the average validation loss along with the validation accuracy.
#             # Hint: To find the accuracy, argmax of tag predictions can be used.
#             #############################################################################
#             sentence_chars = []
#             for w in sentence:
#                 spaces = ' ' * (MAX_WORD_LEN - len(w))
#                 sentence_chars.append(list(spaces + w))
#             chars = [prepare_sequence(sentence_char, char_to_idx) for sentence_char in sentence_chars]
        
#             tag_scores = model(prepare_sequence(sentence, word_to_idx), chars)
#             tag_scores = tag_scores.view(-1, tag_scores.shape[-1])
#             tags = prepare_sequence(tags, tag_to_idx)
            
#             loss = loss_function(tag_scores, tags)
#             val_loss += loss
            
#             max_preds = tag_scores.argmax(dim = 1, keepdim = True).reshape(1,-1) # get the index of the max probability
#             tags = tags.reshape(1, -1)
#             correct += int(torch.sum(max_preds==tags))
#             val_examples += max_preds.shape[1] 
#             #############################################################################
#             #                             END OF YOUR CODE                              #
#             #############################################################################
#     val_accuracy = 100. * correct / val_examples
#     avg_val_loss = val_loss / val_examples
#     return avg_val_loss, val_accuracy

In [219]:
class CharPOSTagger(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, char_embedding_dim, 
                 char_hidden_dim, char_size, vocab_size, tagset_size):
        super(CharPOSTagger, self).__init__()
        #############################################################################
        # TODO: Define and initialize anything needed for the forward pass.
        # You are required to create a model with:
        # an embedding layer: that maps words to the embedding space
        # an char level LSTM: that finds the character level embedding for a word
        # an LSTM layer: that takes the combined embeddings as input and outputs hidden states
        # a Linear layer: maps from hidden state space to tag space
        #############################################################################
        self.embedding_dim = embedding_dim
        self.char_embedding_dim = char_embedding_dim
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=1)
        self.lstm = nn.LSTM(char_hidden_dim + embedding_dim, hidden_dim)  
        
        self.embedding_c = nn.Embedding(char_size + 1, char_embedding_dim, padding_idx=1) #char_size + 1, char_embedding_dim
        self.lstm_c = nn.LSTM(162, char_hidden_dim) 
        
        self.fc = nn.Linear(hidden_dim, tagset_size)
        #############################################################################
        #                             END OF YOUR CODE                              #
        #############################################################################

    def forward(self, sentence, chars):
        tag_scores = None
        #############################################################################
        # TODO: Implement the forward pass.
        # Given a tokenized index-mapped sentence and a character sequence as the arguments, 
        # find the corresponding scores for tags
        # returns:: tag_scores (Tensor)
        #############################################################################
        embedded = self.embedding(sentence)
#         print(sentence)
#         chars = torch.LongTensor([char.numpy() for char in chars])
        chars = torch.cat(chars).view(len(chars), 1, -1)
    
        embedded_c = self.embedding_c(chars)
        embedded_c = embedded_c.view(embedded_c.shape[0], 1, -1)
#         print('chars', chars.shape)
        outputs_c, _ = self.lstm_c(embedded_c)
#         word_char_embeddings = torch.cat((word_char_embeddings, outputs_c[-1]), dim = 0)
#         print(outputs_c.shape, embedded.shape)
            
#         print('word_char_embeddings', word_char_embeddings.shape, embedded.shape)
        joint_emb = torch.cat((embedded, outputs_c.reshape(outputs_c.shape[0], -1)), dim=1)
        joint_emb = joint_emb.reshape(joint_emb.shape[0], 1, -1)
        
        outputs, _ = self.lstm(joint_emb)
        
        tag_scores = self.fc(outputs.squeeze(0))
        #############################################################################
        #                             END OF YOUR CODE                              #
        #############################################################################
        return tag_scores

def train_char(epoch, model, loss_function, optimizer):
    train_loss = 0
    train_examples = 0
    for sentence, tags in training_data:
        #############################################################################
        # TODO: Implement the training loop
        # Hint: you can use the prepare_sequence method for creating index mappings 
        # for sentences as well as character sequences. Find the gradient with 
        # respect to the loss and update the model parameters using the optimizer.
        #############################################################################
        tags = prepare_sequence(tags, tag_to_idx)
        
        sentence_chars = []
        for w in sentence:
            spaces = ' ' * (MAX_WORD_LEN - len(w))
            sentence_chars.append(list(spaces + w))
        chars = [prepare_sequence(sentence_char, char_to_idx) for sentence_char in sentence_chars]
        
        sentence = prepare_sequence(sentence, word_to_idx)
        tag_scores = model(sentence, chars)
        
        tag_scores = tag_scores.view(-1, tag_scores.shape[-1])
        
        
        loss = loss_function(tag_scores, tags)
        model.zero_grad()
        loss.backward()        
        optimizer.step()
        
        train_loss += loss.item()
        train_examples += len(sentence)
        #############################################################################
        #                             END OF YOUR CODE                              #
        #############################################################################
    
    avg_train_loss = train_loss / train_examples
    avg_val_loss, val_accuracy = evaluate_char(model, loss_function, optimizer)
        
    print("Epoch: {}/{}\tAvg Train Loss: {:.4f}\tAvg Val Loss: {:.4f}\t Val Accuracy: {:.0f}".format(epoch, 
                                                                      EPOCHS, 
                                                                      avg_train_loss, 
                                                                      avg_val_loss,
                                                                      val_accuracy))

def evaluate_char(model, loss_function, optimizer):
    # returns:: avg_val_loss (float)
    # returns:: val_accuracy (float)
    val_loss = 0
    correct = 0
    val_examples = 0
    with torch.no_grad():
        for sentence, tags in val_data:
            #############################################################################
            # TODO: Implement the evaluate loop
            # Find the average validation loss along with the validation accuracy.
            # Hint: To find the accuracy, argmax of tag predictions can be used.
            #############################################################################
            sentence_chars = []
            for w in sentence:
                spaces = ' ' * (MAX_WORD_LEN - len(w))
                sentence_chars.append(list(spaces + w))
            chars = [prepare_sequence(sentence_char, char_to_idx) for sentence_char in sentence_chars]
        
            tag_scores = model(prepare_sequence(sentence, word_to_idx), chars)
            tag_scores = tag_scores.view(-1, tag_scores.shape[-1])
            tags = prepare_sequence(tags, tag_to_idx)
            
            loss = loss_function(tag_scores, tags)
            val_loss += loss
            
            max_preds = tag_scores.argmax(dim = 1, keepdim = True).reshape(1,-1) # get the index of the max probability
            tags = tags.reshape(1, -1)
            correct += int(torch.sum(max_preds==tags))
            val_examples += max_preds.shape[1] 
            #############################################################################
            #                             END OF YOUR CODE                              #
            #############################################################################
    val_accuracy = 100. * correct / val_examples
    avg_val_loss = val_loss / val_examples
    return avg_val_loss, val_accuracy

In [220]:
#############################################################################
# TODO: Initialize the model, optimizer and the loss function
#############################################################################
model = CharPOSTagger(EMBEDDING_DIM, HIDDEN_DIM, CHAR_EMBEDDING_DIM, CHAR_HIDDEN_DIM, len(char_to_idx), VOCAB_SIZE, TAGSET_SIZE)

def init_weights(m):
    for name, param in m.named_parameters():
        nn.init.normal_(param.data, mean = 0, std = 0.1)
        
model.apply(init_weights)

device = torch.device('cpu')
model = model.to(device)
# optimizer = optim.Adam(model.parameters(), lr=0.0001)
optimizer = optim.SGD(model.parameters(), lr = LEARNING_RATE)
loss_function = nn.CrossEntropyLoss()
loss_function = loss_function.to(device)
#############################################################################
#                             END OF YOUR CODE                              #
#############################################################################
for epoch in range(1, EPOCHS + 1): 
    train_char(epoch, model, loss_function, optimizer)

Epoch: 1/30	Avg Train Loss: 0.1233	Avg Val Loss: 0.1011	 Val Accuracy: 34
Epoch: 2/30	Avg Train Loss: 0.0882	Avg Val Loss: 0.0750	 Val Accuracy: 47
Epoch: 3/30	Avg Train Loss: 0.0657	Avg Val Loss: 0.0576	 Val Accuracy: 59
Epoch: 4/30	Avg Train Loss: 0.0496	Avg Val Loss: 0.0424	 Val Accuracy: 77
Epoch: 5/30	Avg Train Loss: 0.0373	Avg Val Loss: 0.0362	 Val Accuracy: 82
Epoch: 6/30	Avg Train Loss: 0.0317	Avg Val Loss: 0.0324	 Val Accuracy: 83
Epoch: 7/30	Avg Train Loss: 0.0281	Avg Val Loss: 0.0298	 Val Accuracy: 85
Epoch: 8/30	Avg Train Loss: 0.0256	Avg Val Loss: 0.0279	 Val Accuracy: 86
Epoch: 9/30	Avg Train Loss: 0.0236	Avg Val Loss: 0.0266	 Val Accuracy: 86
Epoch: 10/30	Avg Train Loss: 0.0217	Avg Val Loss: 0.0251	 Val Accuracy: 88
Epoch: 11/30	Avg Train Loss: 0.0204	Avg Val Loss: 0.0242	 Val Accuracy: 89
Epoch: 12/30	Avg Train Loss: 0.0192	Avg Val Loss: 0.0233	 Val Accuracy: 89
Epoch: 13/30	Avg Train Loss: 0.0181	Avg Val Loss: 0.0223	 Val Accuracy: 90
Epoch: 14/30	Avg Train Loss: 0.017

**Sanity Check!** After 5 epochs you should get around 57% accuracy, after 10 epochs the accuracy should be around 67%, the accuracy rises to around 74% after 20 epochs, and to around 77% accuracy after 30 epochs.


### Test accuracy
Also evaluate your performance on the test data by submitting test_labels.txt and **report your test accuracy here**.


**89.96%**

### Error analysis

In [231]:
#############################################################################
# TODO: Generate predictions from val data
# Create lists of words, tags predicted by the model and ground truth tags.
#############################################################################
def generate_predictions_char(model, test_sentences):
    # returns:: word_list (str list)
    # returns:: model_tags (str list)
    # returns:: gt_tags (str list)
    # Your code here
    word_list, model_tags, gt_tags = [], [], []
    for sentence, tags in test_sentences:
        sentence_chars = []
        for w in sentence:
            spaces = ' ' * (MAX_WORD_LEN - len(w))
            sentence_chars.append(list(spaces + w))
        chars = [prepare_sequence(sentence_char, char_to_idx) for sentence_char in sentence_chars]
        tag_scores = model(prepare_sequence(sentence, word_to_idx), chars)
        tag_scores = tag_scores.view(-1, tag_scores.shape[-1])

        max_preds = tag_scores.argmax(dim = 1, keepdim = True).reshape(1,-1) # get the index of the max probability
        model_tags += [idx_to_tag[idx] for idx in max_preds.tolist()[0]]   
        word_list += sentence
        gt_tags += tags
    
    return word_list, model_tags, gt_tags

#############################################################################
# TODO: Carry out error analysis
# From those lists collected from the above method, find the 
# top-10 tuples of (model_tag, ground_truth_tag, frequency, example words)
# sorted by frequency
#############################################################################
def error_analysis_char(word_list, model_tags, gt_tags):
    # returns: errors (list of tuples)
    # Your code here
    error_dict = dict()
    for idx, word in enumerate(word_list):
        if model_tags[idx] != gt_tags[idx]:
            tuple_error = (model_tags[idx],gt_tags[idx])
            if tuple_error in error_dict:
                error_dict[tuple_error] += [word]
            else:
                error_dict[tuple_error] = [word]
    
    return sorted(error_dict.items(), key=lambda x: len(x[1]), reverse=True)

word_list, model_tags, gt_tags = generate_predictions_char(model, val_data)
print("gt_tag\t| model_tag\t| freq.\t| examples")
horizontal_line = "-------------------------------------------------------------------------"
errors = error_analysis_char(word_list, model_tags, gt_tags)[:10]
print(horizontal_line)
for error in errors:
    tags, examples = error
    model_tag, gt_tag = tags
    num_errors, example_words = len(examples), examples
    print("{}\t| {}\t\t| {}\t| {}\n{}".format(gt_tag, model_tag, num_errors, random.sample(example_words, 5), horizontal_line))

gt_tag	| model_tag	| freq.	| examples
-------------------------------------------------------------------------
VBN	| VBD		| 196	| ['built', 'upheld', 'allocated', 'collapsed', 'likened']
-------------------------------------------------------------------------
IN	| VBZ		| 196	| ['that', 'that', 'that', 'that', 'that']
-------------------------------------------------------------------------
NNP	| NN		| 189	| ['Taco', 'Poll', 'Magic', 'Cellular', 'Unity']
-------------------------------------------------------------------------
NN	| JJ		| 186	| ['net', 'bank-teller', 'other', 'crude', 'stable']
-------------------------------------------------------------------------
NNP	| JJ		| 141	| ['Gamble', 'Compensation', 'Independent', 'Jacques-Francois', 'Fruehauf']
-------------------------------------------------------------------------
WDT	| VBZ		| 124	| ['that', 'which', 'that', 'that', 'which']
-------------------------------------------------------------------------
VBD	| VBN		| 120	| ['d


**Report your findings here.**  
What kinds of errors does the character-level model make as compared to the original model, and why do you think it made them? 

Compared with part(a), we can still see the model is not easy to deal with types of verbs, but it is better.
VBN, VBD is very easy to make mistake because even we consider the word itself, it is still hard to distinguish them since they are the same. Sometimes, the LSTM are not strong enough to give a correct tagging.

VB,VBP share the same situation.

Also NNP,JJ, NN are confusing model since their position in the sentence often similar.

These are all the same trend with part (a).

But we can see the confusion of different verbs has been improved.

NNP|JJ error is less because we consider chars and words at the same time now. This word is JJ, but in content, it is NNP, so when we consider more information, the model is relatively easier to tag it correctly.






## Modifications

Now implement one of the following three modifications and report the model's performance.
- Change the number of LSTM layers
- Change the number of hidden dimensions
- Change the number of word embedding dimensions

In [232]:
class CharPOSTagger_new(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, char_embedding_dim, 
                 char_hidden_dim, char_size, vocab_size, tagset_size):
        super(CharPOSTagger_new, self).__init__()
        #############################################################################
        # TODO: Define and initialize anything needed for the forward pass.
        # You are required to create a model with:
        # an embedding layer: that maps words to the embedding space
        # an char level LSTM: that finds the character level embedding for a word
        # an LSTM layer: that takes the combined embeddings as input and outputs hidden states
        # a Linear layer: maps from hidden state space to tag space
        #############################################################################
        self.embedding_dim = embedding_dim
        self.char_embedding_dim = char_embedding_dim
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=1)
        self.lstm = nn.LSTM(char_hidden_dim + embedding_dim, hidden_dim, 
                            num_layers=LSTM_LAYERS,
                            dropout = DROPOUT if LSTM_LAYERS > 1 else 0)  
        
        self.embedding_c = nn.Embedding(char_size + 1, char_embedding_dim, padding_idx=1) #char_size + 1, char_embedding_dim
        self.lstm_c = nn.LSTM(162, char_hidden_dim,num_layers=LSTM_LAYERS,dropout = DROPOUT if LSTM_LAYERS > 1 else 0)  
        
        self.fc = nn.Linear(hidden_dim, tagset_size)
        #############################################################################
        #                             END OF YOUR CODE                              #
        #############################################################################

    def forward(self, sentence, chars):
        tag_scores = None
        #############################################################################
        # TODO: Implement the forward pass.
        # Given a tokenized index-mapped sentence and a character sequence as the arguments, 
        # find the corresponding scores for tags
        # returns:: tag_scores (Tensor)
        #############################################################################
        embedded = self.embedding(sentence)
        
#         chars = torch.LongTensor([char.numpy() for char in chars])
        chars = torch.cat(chars).view(len(chars), 1, -1)
    
        embedded_c = self.embedding_c(chars)
        embedded_c = embedded_c.view(embedded_c.shape[0], 1, -1)
#         print('chars', chars.shape)
        outputs_c, _ = self.lstm_c(embedded_c)
#         word_char_embeddings = torch.cat((word_char_embeddings, outputs_c[-1]), dim = 0)
#         print(outputs_c.shape, embedded.shape)
            
#         print('word_char_embeddings', word_char_embeddings.shape, embedded.shape)
        joint_emb = torch.cat((embedded, outputs_c.reshape(outputs_c.shape[0], -1)), dim=1)
        joint_emb = joint_emb.reshape(joint_emb.shape[0], 1, -1)
        
        outputs, _ = self.lstm(joint_emb)
        
        tag_scores = self.fc(outputs.squeeze(0))
        #############################################################################
        #                             END OF YOUR CODE                              #
        #############################################################################
        return tag_scores

def train_char(epoch, model, loss_function, optimizer):
    train_loss = 0
    train_examples = 0
    for sentence, tags in training_data:
        #############################################################################
        # TODO: Implement the training loop
        # Hint: you can use the prepare_sequence method for creating index mappings 
        # for sentences as well as character sequences. Find the gradient with 
        # respect to the loss and update the model parameters using the optimizer.
        #############################################################################
        tags = prepare_sequence(tags, tag_to_idx)
        
        sentence_chars = []
        for w in sentence:
            spaces = ' ' * (MAX_WORD_LEN - len(w))
            sentence_chars.append(list(spaces + w))
        chars = [prepare_sequence(sentence_char, char_to_idx) for sentence_char in sentence_chars]
        
        sentence = prepare_sequence(sentence, word_to_idx)
        tag_scores = model(sentence, chars)
        
        tag_scores = tag_scores.view(-1, tag_scores.shape[-1])
        
        
        loss = loss_function(tag_scores, tags)
        model.zero_grad()
        loss.backward()        
        optimizer.step()
        
        train_loss += loss.item()
        train_examples += len(sentence)
        #############################################################################
        #                             END OF YOUR CODE                              #
        #############################################################################
    
    avg_train_loss = train_loss / train_examples
    avg_val_loss, val_accuracy = evaluate_char(model, loss_function, optimizer)
        
    print("Epoch: {}/{}\tAvg Train Loss: {:.4f}\tAvg Val Loss: {:.4f}\t Val Accuracy: {:.0f}".format(epoch, 
                                                                      EPOCHS, 
                                                                      avg_train_loss, 
                                                                      avg_val_loss,
                                                                      val_accuracy))

def evaluate_char(model, loss_function, optimizer):
    # returns:: avg_val_loss (float)
    # returns:: val_accuracy (float)
    val_loss = 0
    correct = 0
    val_examples = 0
    with torch.no_grad():
        for sentence, tags in val_data:
            #############################################################################
            # TODO: Implement the evaluate loop
            # Find the average validation loss along with the validation accuracy.
            # Hint: To find the accuracy, argmax of tag predictions can be used.
            #############################################################################
            sentence_chars = []
            for w in sentence:
                spaces = ' ' * (MAX_WORD_LEN - len(w))
                sentence_chars.append(list(spaces + w))
            chars = [prepare_sequence(sentence_char, char_to_idx) for sentence_char in sentence_chars]
        
            tag_scores = model(prepare_sequence(sentence, word_to_idx), chars)
            tag_scores = tag_scores.view(-1, tag_scores.shape[-1])
            tags = prepare_sequence(tags, tag_to_idx)
            
            loss = loss_function(tag_scores, tags)
            val_loss += loss
            
            max_preds = tag_scores.argmax(dim = 1, keepdim = True).reshape(1,-1) # get the index of the max probability
            tags = tags.reshape(1, -1)
            correct += int(torch.sum(max_preds==tags))
            val_examples += max_preds.shape[1] 
            #############################################################################
            #                             END OF YOUR CODE                              #
            #############################################################################
    val_accuracy = 100. * correct / val_examples
    avg_val_loss = val_loss / val_examples
    return avg_val_loss, val_accuracy

In [239]:
# Change the number of LSTM layers
# Change the number of hidden dimensions
# Change the number of word embedding dimensions
# 89.99%
# EMBEDDING_DIM = 200
# HIDDEN_DIM = 32
# LEARNING_RATE = 0.01
# LSTM_LAYERS = 2
# DROPOUT = 0.1
# EPOCHS = 30

EMBEDDING_DIM = 256
HIDDEN_DIM = 64
LEARNING_RATE = 0.01
LSTM_LAYERS = 2
DROPOUT = 0.1
EPOCHS = 30

In [240]:
#############################################################################
# TODO: Initialize the model, optimizer and the loss function
#############################################################################
model_new = CharPOSTagger_new(EMBEDDING_DIM, HIDDEN_DIM, CHAR_EMBEDDING_DIM, CHAR_HIDDEN_DIM, char_to_idx[' '], VOCAB_SIZE, TAGSET_SIZE)

def init_weights(m):
    for name, param in m.named_parameters():
        nn.init.normal_(param.data, mean = 0, std = 0.1)
        
model_new.apply(init_weights)

device = torch.device('cpu')
model_new = model_new.to(device)
# optimizer = optim.Adam(model.parameters(), lr=0.0001)
optimizer = optim.SGD(model_new.parameters(), lr = LEARNING_RATE)
loss_function = nn.CrossEntropyLoss()
loss_function = loss_function.to(device)
#############################################################################
#                             END OF YOUR CODE                              #
#############################################################################
for epoch in range(1, EPOCHS + 1): 
    train_char(epoch, model_new, loss_function, optimizer)

Epoch: 1/30	Avg Train Loss: 0.1308	Avg Val Loss: 0.1263	 Val Accuracy: 15
Epoch: 2/30	Avg Train Loss: 0.1249	Avg Val Loss: 0.1196	 Val Accuracy: 23
Epoch: 3/30	Avg Train Loss: 0.1046	Avg Val Loss: 0.0913	 Val Accuracy: 40
Epoch: 4/30	Avg Train Loss: 0.0809	Avg Val Loss: 0.0727	 Val Accuracy: 52
Epoch: 5/30	Avg Train Loss: 0.0660	Avg Val Loss: 0.0609	 Val Accuracy: 59
Epoch: 6/30	Avg Train Loss: 0.0556	Avg Val Loss: 0.0516	 Val Accuracy: 66
Epoch: 7/30	Avg Train Loss: 0.0469	Avg Val Loss: 0.0438	 Val Accuracy: 71
Epoch: 8/30	Avg Train Loss: 0.0396	Avg Val Loss: 0.0376	 Val Accuracy: 76
Epoch: 9/30	Avg Train Loss: 0.0334	Avg Val Loss: 0.0329	 Val Accuracy: 79
Epoch: 10/30	Avg Train Loss: 0.0284	Avg Val Loss: 0.0288	 Val Accuracy: 82
Epoch: 11/30	Avg Train Loss: 0.0245	Avg Val Loss: 0.0260	 Val Accuracy: 84
Epoch: 12/30	Avg Train Loss: 0.0214	Avg Val Loss: 0.0239	 Val Accuracy: 85
Epoch: 13/30	Avg Train Loss: 0.0189	Avg Val Loss: 0.0223	 Val Accuracy: 87
Epoch: 14/30	Avg Train Loss: 0.016

### Modification choice
Which modification did you use and why?

In [242]:
def test_new():
    val_loss = 0
    correct = 0
    val_examples = 0
    predicted_tags = []
    with torch.no_grad():
        for sentence in test_sentences:
            #############################################################################
            # TODO: Implement the test loop
            # This method saves the predicted tags for the sentences in the test set.
            # The tags are first added to a list which is then written to a file for
            # submission. An empty string is added after every sequence of tags
            # corresponding to a sentence to add a newline following file formatting
            # convention, as has been done already.
            ############################################################################        
            sentence_chars = []
            for w in sentence:
                spaces = ' ' * (MAX_WORD_LEN - len(w))
                sentence_chars.append(list(spaces + w))
            chars = [prepare_sequence(sentence_char, char_to_idx) for sentence_char in sentence_chars]
            tag_scores = model_new(prepare_sequence(sentence, word_to_idx), chars)
            tag_scores = tag_scores.view(-1, tag_scores.shape[-1])
            max_preds = tag_scores.argmax(dim = 1, keepdim = True)
            for tag in max_preds:
                tag = int(tag)
                predicted_tags.append(idx_to_tag[tag])           
            #############################################################################
            #                             END OF YOUR CODE                              #
            #############################################################################
            predicted_tags.append("")

    with open('test_labels.txt', 'w+') as f:
        for item in predicted_tags:
            f.write("%s\n" % item)
test_new()

### Test accuracy
Also evaluate your performance on the test data by submitting test_labels.txt and **report your test accuracy here**.

**90.66%**

### Error analysis
**Report your findings here.**  
Compare the top-10 errors made by this modified model with the errors made by the model from part (a). 
What errors does the original model make as compared to the modified model, and why do you think it made them? 

Feel free to reuse the methods defined above for this purpose.

In [243]:
#############################################################################
# TODO: Generate predictions from val data
# Create lists of words, tags predicted by the model and ground truth tags.
#############################################################################
def generate_predictions_char(model, test_sentences):
    # returns:: word_list (str list)
    # returns:: model_tags (str list)
    # returns:: gt_tags (str list)
    # Your code here
    word_list, model_tags, gt_tags = [], [], []
    for sentence, tags in test_sentences:
        sentence_chars = []
        for w in sentence:
            spaces = ' ' * (MAX_WORD_LEN - len(w))
            sentence_chars.append(list(spaces + w))
        chars = [prepare_sequence(sentence_char, char_to_idx) for sentence_char in sentence_chars]
        tag_scores = model(prepare_sequence(sentence, word_to_idx), chars)
        tag_scores = tag_scores.view(-1, tag_scores.shape[-1])

        max_preds = tag_scores.argmax(dim = 1, keepdim = True).reshape(1,-1) # get the index of the max probability
        model_tags += [idx_to_tag[idx] for idx in max_preds.tolist()[0]]   
        word_list += sentence
        gt_tags += tags
    
    return word_list, model_tags, gt_tags

#############################################################################
# TODO: Carry out error analysis
# From those lists collected from the above method, find the 
# top-10 tuples of (model_tag, ground_truth_tag, frequency, example words)
# sorted by frequency
#############################################################################
def error_analysis_char(word_list, model_tags, gt_tags):
    # returns: errors (list of tuples)
    # Your code here
    error_dict = dict()
    for idx, word in enumerate(word_list):
        if model_tags[idx] != gt_tags[idx]:
            tuple_error = (model_tags[idx],gt_tags[idx])
            if tuple_error in error_dict:
                error_dict[tuple_error] += [word]
            else:
                error_dict[tuple_error] = [word]
    
    return sorted(error_dict.items(), key=lambda x: len(x[1]), reverse=True)

word_list, model_tags, gt_tags = generate_predictions_char(model_new, val_data)
print("gt_tag\t| model_tag\t| freq.\t| examples")
horizontal_line = "-------------------------------------------------------------------------"
errors = error_analysis_char(word_list, model_tags, gt_tags)[:10]
print(horizontal_line)
for error in errors:
    tags, examples = error
    model_tag, gt_tag = tags
    num_errors, example_words = len(examples), examples
    print("{}\t| {}\t\t| {}\t| {}\n{}".format(gt_tag, model_tag, num_errors, random.sample(example_words, 5), horizontal_line))

gt_tag	| model_tag	| freq.	| examples
-------------------------------------------------------------------------
NN	| JJ		| 218	| ['drinking', 'worth', 'principal', 'contempt', 'DEPOSIT']
-------------------------------------------------------------------------
NNP	| NN		| 170	| ['Wharton', 'Quantum', 'Schumacher', 'Concorde', 'Elvekrog']
-------------------------------------------------------------------------
NNP	| JJ		| 151	| ['WAVE', 'Aid', 'Works', 'Bessemer', 'Allied-Lyons']
-------------------------------------------------------------------------
JJ	| NN		| 142	| ['desperate', 'ever-narrowing', 'thermal', 'self-indulgent', '2-to-1']
-------------------------------------------------------------------------
JJ	| NNP		| 103	| ['fragmented', 'terse', 'London', 'preferred-stock', 'East']
-------------------------------------------------------------------------
NN	| NNP		| 98	| ['veteran', 'Silver', 'introduction', 'mid-1992', 'Power']
--------------------------------------------------

I chose the model with 92% validation accuracy with an embedding dimension of 256 and hidden dimension of 64. The test accuracy is 90.66%.

The accuracy is improved from 88% to 90%
The error between noun is obviously less because we consider the word's structure and sentence's structure at the same time.

The same types of errors persist in this model but almost all kinds of error is less.

But in part(a), types of verbs is a big problem for the model, but here, only one error include verb. 
It might be because we consider the word info and char info with a good balance to tag verbs

NN | JJ error is higher because now we not only consider the content and the word location, but also consider the word itself. For example, "drinking" maybe considered as NN in part(a), when we only cosider its location. But now when we consider the POS of this word (drinking), we need to balance two properties, its content and the word's POS without content. That's why we made more mistake here.