# LSTM Bot

## Project Overview

In this project, you will build a chatbot that can converse with you at the command line. The chatbot will use a Sequence to Sequence text generation architecture with an LSTM as it's memory unit. You will also learn to use pretrained word embeddings to improve the performance of the model. At the conclusion of the project, you will be able to show your chatbot to potential employers.

Additionally, you have the option to use pretrained word embeddings in your model. We have loaded Brown Embeddings from Gensim in the starter code below. You can compare the performance of your model with pre-trained embeddings against a model without the embeddings.



---



A sequence to sequence model (Seq2Seq) has two components:
- An Encoder consisting of an embedding layer and LSTM unit.
- A Decoder consisting of an embedding layer, LSTM unit, and linear output unit.

The Seq2Seq model works by accepting an input into the Encoder, passing the hidden state from the Encoder to the Decoder, which the Decoder uses to output a series of token predictions.

## Dependencies

- Pytorch
- Numpy
- Pandas
- NLTK
- Gzip
- Gensim


Please choose a dataset from the Torchtext website. We recommend looking at the Squad dataset first. Here is a link to the website where you can view your options:

- https://pytorch.org/text/stable/datasets.html





In [1]:
!pip install torchdata==0.3.0

Defaulting to user installation because normal site-packages is not writeable


In [2]:
import gensim
import gensim.downloader
import math
import nltk
import numpy as np
import pandas as pd
import gzip
import re
import random
import torch
import torch.nn as nn
import torch.nn.utils.rnn
import torch.utils.data
import requests
from nltk.tokenize import word_tokenize
from gensim.models import KeyedVectors
import torchtext.datasets

In [3]:
# Flags to avoid repeat work

get_embeddings = False
get_input_data = True
augment_embed_data = True
preprocess_input_data = True

In [4]:
# Global constants
embedding_name = 'glove-twitter-100'
validation_index = 87599 # Index where validation set starts
sosToken = 'soseq'
eosToken = 'eoseq'
loader_qty = 1 # Data loading thread quantity
layer_count = 1
hidden_unit_dim = 512
rep_int = 1000 # Samples per status printout
val_int = 5000 # Batches per validation (with printout)

In [5]:
# Pre-trained embeddings

if (get_embeddings) == True:
    
    base_embeddings = gensim.downloader.load(embedding_name)
    base_embeddings.save(embedding_name+'.kv')
    
else:
    
    base_embeddings = KeyedVectors.load(embedding_name+'.kv')

In [6]:
if get_input_data == True:

    train_squad, dev_squad = torchtext.datasets.SQuAD1()
    base_data = []
    for dP in train_squad:
        for dpAns in dP[2]:
            base_data.append((" ".join([sosToken,dP[1],eosToken])," ".join([sosToken,dpAns,eosToken])))
    
    for dP in dev_squad:
        for dpAns in dP[2]:
            base_data.append((" ".join([sosToken,dP[1],eosToken])," ".join([sosToken,dpAns,eosToken])))
    
    qa_df = pd.DataFrame(base_data,columns = ['qTxt','aTxt'])
    qa_df.to_pickle("rawQuestAnsData.pkl")

else:
    qa_df = pd.read_pickle("rawQuestAnsData.pkl")

In [7]:
# Add sequence boundary tokens, make all keywords lowercase, rebuild and save keyedVectors

if augment_embed_data == True:
    
    # Prepare numpy array to hold new embedding matrix with sosToken and eosToken added as one-hot
    aug_words = []
    aug_embed = np.zeros((len(base_embeddings.index_to_key)+2,len(base_embeddings[0])+2))
    for i in range(len(base_embeddings.index_to_key)):
        aug_words.append(base_embeddings.index_to_key[i].lower())
        aug_embed[i,:-2] = base_embeddings[base_embeddings.index_to_key[i]]
    
    # Add sosToken and eosToken
    aug_words.append(sosToken)
    aug_embed[-2,-2:] = np.array([1,0])
    aug_words.append(eosToken)
    aug_embed[-1,-2:] = np.array([0,1])
    
    # Create new KeyedVectors instance with the extra dimensions for sos, eos
    aug_kv = KeyedVectors(aug_embed.shape[1],aug_embed.shape[0])
    aug_kv.add_vectors(aug_words,aug_embed)
    
    # aug_kv.unit_normalize_all()
    
    # Save
    aug_kv.save(embedding_name+'-aug.kv')
    
else:
    aug_kv = KeyedVectors.load(embedding_name+'-aug.kv')

In [8]:
# Preprocessing functions

# Pre-Reqs
nltk.download('punkt')

# Remove tokens from list not present in embedding
def scrubTokens(inTokenList,emb_kv):
    outList = [ token for token in inTokenList if (emb_kv.has_index_for(token)) ]
    return outList

# Tokenization
def prepare_text(sentence,emb_kv):
    tokens = scrubTokens(word_tokenize(sentence),emb_kv)
    return tokens

# Prepend to token list
def token_prepend(inList,preItem):
    return [preItem] + inList

# Append to token list
def token_append(inList,postItem):
    return inList + [postItem]

# Transform list of tokens to their indices in embedding
def tokensToIndices(tokens,emb_kv):
    tokenInds = [emb_kv.get_index(token) for token in tokens]
    return tokenInds
    

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [9]:
# Tokenize strings
# Remove tokens that are not in the embedding
# Perform training and validation split
# Prepare new KeyedVectors that does not have tokens absent from training data

if preprocess_input_data == True:
    
    proc_qa_df = qa_df.copy()
    proc_qa_df['qTxtClean'] = proc_qa_df['qTxt'].str.lower().apply(prepare_text, emb_kv = aug_kv)
    proc_qa_df['aTxtClean'] = proc_qa_df['aTxt'].str.lower().apply(prepare_text, emb_kv = aug_kv)
    
    # Remove rows with no answers in vocab
    proc_qa_df['aTxtLen'] = proc_qa_df['aTxtClean'].apply(len)
    proc_qa_df = proc_qa_df[proc_qa_df['aTxtLen']>2]    
    
    print("Cleaned Text")
    
    pr_kv = KeyedVectors(aug_kv[sosToken].size)
    
    keyDict = {}
    for series in [proc_qa_df['qTxtClean'],proc_qa_df['aTxtClean']]:
        for index,tokenList in series.items():
            for token in tokenList:
                if not token in keyDict.keys():
                    keyDict[token] = aug_kv[token]
    
    keyList = []
    valList = []
    for token in keyDict.keys():
        keyList.append(token)
        valList.append(keyDict[token])
    
    pr_kv.add_vectors(keyList,valList)
    
    print("Pruned vocabulary")
    
    # Add dataframe columns with tokens by their numerical indices
    proc_qa_df['qIdxs'] = proc_qa_df['qTxtClean'].apply(tokensToIndices, emb_kv = pr_kv)
    proc_qa_df['aIdxs'] = proc_qa_df['aTxtClean'].apply(tokensToIndices, emb_kv = pr_kv)
    
    print("Converted to indices")
    
    # Save tokenized dataframes with training and validation split
    train_qa_df = proc_qa_df[:validation_index]
    train_qa_df.to_pickle('tokenizedTrainingData.pkl')
    validation_qa_df = proc_qa_df[validation_index:]
    validation_qa_df.to_pickle('tokenizedValidationData.pkl')
    pr_kv.save(embedding_name+'-prn.kv')
    print("Saved preprocessed data")
    
else:
    
    train_qa_df = pd.read_pickle('tokenizedTrainingData.pkl')
    validation_qa_df = pd.read_pickle('tokenizedValidationData.pkl')
    pr_kv = KeyedVectors.load(embedding_name+'-prn.kv') 

Cleaned Text
Pruned vocabulary
Converted to indices
Saved preprocessed data


In [10]:
# Dataset object

class qaWithContextDataset(torch.utils.data.Dataset):
    
    def __init__(self, questionAndAnswer_df):
        self.qa_df = questionAndAnswer_df
        self.length = questionAndAnswer_df['qIdxs'].count()
    
    def __len__(self):
        return self.length

    def __getitem__(self, idx):
        # Return tuple of (question,answer)
        return (torch.tensor(self.qa_df['qIdxs'].iat[idx]),torch.tensor(self.qa_df['aIdxs'].iat[idx]))
    
# Batch collation - Pad sequences as tensors, add sequence lengths as lists with nested tuple
# ( (questionTensor,questionLengthList), (answerTensor,answerLengthList) )
def collate_qa_samples(batch):
    
    # Sort batch tuples by decreasing question length
    batch_sorted = batch.copy()
    batch_sorted.sort(reverse = True, key = lambda qa_p: qa_p[0].size()[0])
    
    questionLengths = [qa_pair[0].size()[0] for qa_pair in batch_sorted]
    answerLengths = [qa_pair[1].size()[0] for qa_pair in batch_sorted]
    
    # Pad sequences with 1's.
    # during forward() computations
    questions = torch.ones([len(questionLengths),max(questionLengths)], dtype=torch.int64)
    answers = torch.ones([len(answerLengths),max(answerLengths)], dtype=torch.int64)
    
    # Copy sequences into output tensor
    for i in range(len(batch_sorted)):
        questions[i,0:questionLengths[i]] = batch_sorted[i][0]
        answers[i,0:answerLengths[i]] = batch_sorted[i][1]
    
    return ( (questions,questionLengths) , (answers,answerLengths) )

In [11]:
# Initialize training and validation datasets
# Truncate to 5000 to try and manage training time
train_dataset = qaWithContextDataset(train_qa_df[0:5000])
validation_dataset = qaWithContextDataset(validation_qa_df[0:5000])

In [12]:
# Dataloader creator so batch_size may be varied
def make_dataloader(d_set,batch_qty):
    return torch.utils.data.DataLoader( \
                       d_set, \
                       batch_qty, \
                       shuffle = True, \
                       num_workers = loader_qty, \
                       collate_fn = collate_qa_samples, \
                       drop_last = True, \
                       persistent_workers = True)

In [13]:
class Encoder(nn.Module):
    
    def __init__(self, hidden_size, layer_qty, pretrained_embed):
        
        super(Encoder, self).__init__()
        
        adjusted_hidden = hidden_size if hidden_size > pretrained_embed.size(dim=1) else pretrained_embed.size(dim=1)
        
        # self.embedding provides a vector representation of the inputs to our model
        self.embedding = nn.Embedding( \
                                     num_embeddings = pretrained_embed.size(dim=0), \
                                     embedding_dim = adjusted_hidden, \
                                     )
        
        # initialize weights for encoder embedding, loading pretrained, expand hidden size if less than pretrained dim
        
        init_weights = torch.randn(pretrained_embed.size(dim=0),adjusted_hidden)
        init_weights[:,:pretrained_embed.size(dim=1)] = pretrained_embed
        self.embedding.weight = torch.nn.Parameter(init_weights)
        
        # self.lstm, accepts the vectorized input and passes a hidden state
        if (layer_qty > 1):
            self.lstm = \
                nn.LSTM(adjusted_hidden,adjusted_hidden,num_layers = layer_qty,batch_first=True,dropout=0.3)
        else:
            self.lstm = \
                nn.LSTM(adjusted_hidden,adjusted_hidden,num_layers = layer_qty,batch_first=True)
    
    def forward(self, i):
        
        '''
        Inputs: i, the src tuple (questions, questionLengths) for batch
        Outputs: o, the encoder outputs
                h, the hidden state
                c, the cell state
        '''
        
        # Shape of i[0] is [batch_size,sequence_length]
        
        max_question_length = max(i[1])-1 
        
        # Get permutation order for sorting questions by decreasing length
        sorted_ques_lengths = []
        for idx in range(len(i[1])):
            sorted_ques_lengths.append((i[1][idx],idx))
        sorted_ques_lengths.sort(reverse = True, key = lambda ansLen : ansLen[0])
        
        # Split up for easier downstream ops
        sorted_dec_lengths = [quesLen[0] for quesLen in sorted_ques_lengths]
        sorted_idx_list = [quesLen[1] for quesLen in sorted_ques_lengths]
        
        # Sort the answer padded tensor by decreasing answer length
        sorted_idx_tensor = i[0].new_tensor(sorted_idx_list,dtype=torch.int64)
        decreasing_length_questions = torch.index_select(i[0],0,sorted_idx_tensor)
        
        # Shape of embed_rslt is [batch_size,sequence_length,embedding_dim]
        embed_rslt = self.embedding(decreasing_length_questions)
        
        # Encoder does not require online data substitution.  Can use sequence packing functionality.
        packed_questions = torch.nn.utils.rnn.pack_padded_sequence(embed_rslt,i[1],batch_first=True,enforce_sorted=True)
        
        o,(h,c) = self.lstm(packed_questions)
        
        # Need to undo permutation so questions and answers remain aligned. Output of encoder is unused
        inverted_sort_list = [sorted_idx_list.index(old_idx) for old_idx in range(len(sorted_idx_list))]
        inverted_sort_tensor = i[0].new_tensor(inverted_sort_list,dtype=torch.int64)
        h = torch.index_select(h,1,inverted_sort_tensor)
        c = torch.index_select(c,1,inverted_sort_tensor)
        
        # h.shape == c.shape == [num_layers, batch_size, hidden_size]
        return o, (h, c)
    

class Decoder(nn.Module):
      
    def __init__(self, hidden_size, layer_qty, pretrained_embed):
        
        super(Decoder, self).__init__()
        
        adjusted_hidden = hidden_size if hidden_size > pretrained_embed.size(dim=1) else pretrained_embed.size(dim=1)
        
        # self.embedding provides a vector representation of the inputs to our model
        self.embedding = nn.Embedding( \
                                     num_embeddings = pretrained_embed.size(dim=0), \
                                     embedding_dim = adjusted_hidden, \
                                     )
        
        # initialize weights for encoder embedding, loading pretrained, expand hidden size if less than pretrained dim
        init_weights = torch.randn(pretrained_embed.size(dim=0),adjusted_hidden)
        init_weights[:,:pretrained_embed.size(dim=1)] = pretrained_embed
        self.embedding.weight = torch.nn.Parameter(init_weights)
        
        # Output dimension used to construct outputs
        self.out_dim = pretrained_embed.size()[0]
        
        # self.lstm, accepts the embeddings and outputs a hidden state
        if (layer_qty > 1):
            self.lstm = \
                nn.LSTM(adjusted_hidden,adjusted_hidden,num_layers = layer_qty,batch_first=True,dropout=0.3)
        else:
            self.lstm = \
                nn.LSTM(adjusted_hidden,adjusted_hidden,num_layers = layer_qty,batch_first=True)
        
        # self.output, predicts on the LSTM output with linear layer
        self.output = nn.Linear(adjusted_hidden,pretrained_embed.size()[0])
        self.lsftmx = nn.LogSoftmax(dim=2)
        
    def forward(self, i, enc_state, teach_freq):
        
        '''
        Inputs: i, the target tuple (answers, answerLengths) for batch
        Outputs: o, the prediction
        '''

        are_teaching = True if random.random() < teach_freq else False
        max_answer_length = max(i[1])-1 
        
        # Get permutation order for sorting answers by decreasing length
        # Each answer length is reduced by one since the eos token is only used in loss calculation
        sorted_ans_lengths = []
        for idx in range(len(i[1])):
            sorted_ans_lengths.append((i[1][idx]-1,idx))
        sorted_ans_lengths.sort(reverse = True, key = lambda ansLen : ansLen[0])
        
        # Split up for easier downstream ops
        sorted_dec_lengths = [ansLen[0] for ansLen in sorted_ans_lengths]
        sorted_idx_list = [ansLen[1] for ansLen in sorted_ans_lengths]
        
        # Sort the answer padded tensor by decreasing answer length
        sorted_idx_tensor = i[0].new_tensor(sorted_idx_list,dtype=torch.int64)
        decreasing_length_answers = torch.index_select(i[0],0,sorted_idx_tensor)
        
        # Allocate tensor for predictions
        out_preds = i[0].new_ones([len(i[1]),max_answer_length,self.out_dim])
        
        # Encoder states need to be permuted in same manner as answers (decreasing answer length)
        h_enc_decreasing_ans_length = torch.index_select(enc_state[0],1,sorted_idx_tensor)
        c_enc_decreasing_ans_length = torch.index_select(enc_state[1],1,sorted_idx_tensor)
        
        # Teaching: 
        #  -Can pack answer sequence and run full sequence with single lstm call
        # Not Teaching: 
        #  -Step one token at a time, feeding previous prediction tokens as input.
        #  -Don't run sequences through decoder beyond their labeled answer length
        if are_teaching:
               
            embed_rslt = self.embedding(decreasing_length_answers[:,:-1]) # Do not pass eos token
            packed_answers = torch.nn.utils.rnn.pack_padded_sequence( \
                embed_rslt,sorted_dec_lengths,batch_first=True,enforce_sorted=True)
            teach_pred_out,(h_decode,c_decode) = self.lstm( \
                                                            packed_answers, \
                                                            (h_enc_decreasing_ans_length,c_enc_decreasing_ans_length) \
                                                          )
            unpacked_ans, unpacked_len = torch.nn.utils.rnn.pad_packed_sequence(teach_pred_out,batch_first=True)
            
            # Pass through linear layer
            out_preds = self.output(unpacked_ans)
            out_preds = self.lsftmx(out_preds)
            
        else:
            
            prev_h = h_enc_decreasing_ans_length
            prev_c = c_enc_decreasing_ans_length
            
            # feed in sos token on first iteration to prime
            prev_o = torch.unsqueeze(self.embedding(decreasing_length_answers[:,0]),1)
            
            # Accumulator list and alias
            out_pred_list = []
            total_seq = len(sorted_dec_lengths)
            
            for step in range(max_answer_length):
                
                # Ensure prepped for gpu run
                prev_h = prev_h.contiguous()
                prev_c = prev_c.contiguous()
                prev_o = prev_o.contiguous()
                
                # Determine number of sequences that still have remaining predictions to make
                seq_left = sum([(1 if step <= ans_len else 0) for ans_len in sorted_dec_lengths])
                ltsm_out, (prev_h,prev_c) = \
                    self.lstm( \
                        prev_o[:seq_left,:,:].contiguous(), \
                        (prev_h[:,:seq_left,:].contiguous(),prev_c[:,:seq_left,:].contiguous()))
                net_out = self.output(ltsm_out)
                net_out = self.lsftmx(net_out)
                prev_o = self.embedding(torch.argmax(net_out,dim=2))
                
                # Add tensor with outputs for seq_left sequences padded 
                out_pred_list.append( \
                     torch.cat([net_out,i[0].new_ones([total_seq-seq_left,1,self.out_dim])], dim = 0) \
                )
        
            # Concatenate all the sequence outputs from every sequence step
            out_preds = torch.cat(out_pred_list,dim=1)
        
        # Reorder predictions to original order for loss computation
        inverted_sort_list = [sorted_idx_list.index(old_idx) for old_idx in range(len(sorted_idx_list))]
        o = torch.index_select(out_preds,0,i[0].new_tensor(inverted_sort_list,dtype=torch.int64))
        
        return o

class Seq2Seq(nn.Module):
    
    def __init__(self, hidden_size, layer_qty, pretrained_kv):
        
        super(Seq2Seq, self).__init__()
        
        # Convert keyedvector's numpy array to tensor
        pretrained_embed = torch.tensor(pretrained_kv.vectors,dtype=torch.float32)
        
        self.seq2seqEncoder = Encoder(hidden_size, layer_qty, pretrained_embed)
        self.seq2seqDecoder = Decoder(hidden_size, layer_qty, pretrained_embed)
    
    def forward(self, src, trg, teacher_forcing_ratio = 0.5):      
        
        o, enc_state = self.seq2seqEncoder.forward(src)
        o = self.seq2seqDecoder.forward(trg, enc_state, teacher_forcing_ratio)
        
        return o
    
    # Pass tensor to embedding
    def embed_tensor(self,inTensor):
        return self.seq2seqEncoder.embedding(inTensor)

In [14]:
# Initialize network object

seqToseqNet = Seq2Seq(hidden_unit_dim,layer_count,pr_kv)

In [15]:
# Function to compute loss for variable length sequence data
# net_output = (batch size, max sequence length, vocab dimension)
# target_seqs = (batch size, max sequence length) indices in vocabulary space
# target_lengths = list of lengths for each target in batch
# loss_criterion = Loss which can be computed on pair of 1d tensors
def computeMaskedLoss(net_output,target_seqs,target_lengths,loss_criterion):
    
    preds = torch.flatten(net_output,start_dim=0,end_dim=1)
    targets = target_seqs[:,1:] # Skip sos token
    for i in range(len(target_lengths)):
        targets[i,(target_lengths[i]-1):] = (-1)*targets.new_ones((1,targets.size(dim=1)-(target_lengths[i]-1)))
    
    targets = torch.flatten(targets,start_dim=0,end_dim=1)
    loss = loss_criterion(preds,targets) 
    
    return loss

In [16]:
# Training routine
def train_model(net, train_dset, val_dset,ses_lrn = 0.01,ses_tea = 0.5,ses_epochs = 1,ses_batch_size = 16):

    train_loader = make_dataloader(train_dset,ses_batch_size)
    val_loader = make_dataloader(val_dset,ses_batch_size)
    report_interval = rep_int // ses_batch_size
    validation_interval = val_int // ses_batch_size
    
    least_validation_loss = float("inf")
    report_interval_counter = 0
    validation_interval_counter = 0
    val_iter = iter(val_loader)
    s = next(val_iter)
    
    gpu_avail = torch.cuda.is_available()
    
    if (gpu_avail):
        net.cuda()
    
    loss_criterion = nn.NLLLoss(ignore_index=-1) # Batches padded with -1's
    
    optimizer = torch.optim.Adam(net.parameters(),lr=ses_lrn)
    
    for epoch in range(ses_epochs):
        
        net.train()
        train_loss = 0.0
        
        for i, train_data in enumerate(train_loader):
            
            train_inputs, train_labels = train_data
            
            if (gpu_avail):
                train_inputs = (train_inputs[0].cuda(), train_inputs[1]) 
                train_labels = (train_labels[0].cuda(), train_labels[1])
            
            # Zero out the gradients of the optimizer
            optimizer.zero_grad()

            # Get model outputs
            train_outputs = net(train_inputs,train_labels,teacher_forcing_ratio=ses_tea)
            
            # Compute loss
            train_loss = computeMaskedLoss(train_outputs,train_labels[0],train_labels[1],loss_criterion)
            
            # Compute the loss gradient using the backward method and have the optimizer take a step
            train_loss.backward()
            optimizer.step()
            
            # Report status if is time
            report_interval_counter += 1
            if report_interval_counter >= report_interval:
                report_interval_counter = 0
                print(f"{i+1} batches of epoch {epoch+1} completed.  Last Training Loss: {train_loss: .6f}")
            
            # Perform validation run if is time
            validation_interval_counter += 1
            if validation_interval_counter >= validation_interval:
                validation_interval_counter = 0
                val_loss = 0.0
                net.eval()
                
                # Get validation batch and evaluate model
                # Need try block to handle iterator terminating. See
                # https://github.com/pytorch/pytorch/issues/1917#issuecomment-433698337
                try:
                    val_inputs, val_labels = next(val_iter)
                except StopIteration:
                    val_iter = iter(val_loader)
                    val_inputs, val_labels = next(val_iter)
                
                if (gpu_avail):
                    val_inputs = (val_inputs[0].cuda(), val_inputs[1]) 
                    val_labels = (val_labels[0].cuda(), val_labels[1])
                
                # Evaluate validation batch outputs against labels
                val_outputs = net(val_inputs, val_labels,teacher_forcing_ratio=0)
                
                # Compute loss
                val_loss = computeMaskedLoss(val_outputs,val_labels[0],val_labels[1],loss_criterion)
                
                # Update min val loss
                if val_loss < least_validation_loss:
                    least_validation_loss = val_loss
                    print("Saving model . . .")
                    torch.save(net,"Min-Validation-Loss-Model-512.pt")
    
                # Report
                print(f"Last Validation Loss: {val_loss: .6f}, Lowest Validation Loss: {least_validation_loss: .6f}")
        
                # Cleanup after validation
                net.train()

In [17]:
# Perform first training run
train_model(seqToseqNet, \
            train_dataset,validation_dataset, \
            ses_lrn = 0.01, ses_tea = 0.5, ses_epochs = 40, ses_batch_size = 128)
torch.save(seqToseqNet,'afterRunOne.pt')

7 batches of epoch 1 completed.  Last Training Loss:  8.109483
14 batches of epoch 1 completed.  Last Training Loss:  6.917394
21 batches of epoch 1 completed.  Last Training Loss:  7.024382
28 batches of epoch 1 completed.  Last Training Loss:  7.071349
35 batches of epoch 1 completed.  Last Training Loss:  6.413694
Saving model . . .
Last Validation Loss:  7.718582, Lowest Validation Loss:  7.718582
3 batches of epoch 2 completed.  Last Training Loss:  6.091068
10 batches of epoch 2 completed.  Last Training Loss:  5.347987
17 batches of epoch 2 completed.  Last Training Loss:  6.402633
24 batches of epoch 2 completed.  Last Training Loss:  5.960339
31 batches of epoch 2 completed.  Last Training Loss:  5.320562
38 batches of epoch 2 completed.  Last Training Loss:  6.176763
Last Validation Loss:  8.157598, Lowest Validation Loss:  7.718582
6 batches of epoch 3 completed.  Last Training Loss:  4.570956
13 batches of epoch 3 completed.  Last Training Loss:  4.742160
20 batches of epoc

In [18]:
# Continue training lower learning rate to 0.005
train_model(seqToseqNet, \
            train_dataset,validation_dataset, \
            ses_lrn = 0.005, ses_tea = 0.5, ses_epochs = 40, ses_batch_size = 128)

7 batches of epoch 1 completed.  Last Training Loss:  1.808320
14 batches of epoch 1 completed.  Last Training Loss:  4.769550
21 batches of epoch 1 completed.  Last Training Loss:  1.495469
28 batches of epoch 1 completed.  Last Training Loss:  4.609407
35 batches of epoch 1 completed.  Last Training Loss:  1.767477
Saving model . . .
Last Validation Loss:  9.578876, Lowest Validation Loss:  9.578876
3 batches of epoch 2 completed.  Last Training Loss:  1.358416
10 batches of epoch 2 completed.  Last Training Loss:  1.490441
17 batches of epoch 2 completed.  Last Training Loss:  4.237613
24 batches of epoch 2 completed.  Last Training Loss:  1.578929
31 batches of epoch 2 completed.  Last Training Loss:  1.438875
38 batches of epoch 2 completed.  Last Training Loss:  1.537268
Last Validation Loss:  9.810240, Lowest Validation Loss:  9.578876
6 batches of epoch 3 completed.  Last Training Loss:  4.073042
13 batches of epoch 3 completed.  Last Training Loss:  1.257350
20 batches of epoc

In [19]:
torch.save(seqToseqNet,'afterRunTwo.pt')

# See if can make training loss less than one without any teaching
train_model(seqToseqNet, \
            train_dataset,validation_dataset, \
            ses_lrn = 0.005, ses_tea = 0.0, ses_epochs = 40, ses_batch_size = 128)

7 batches of epoch 1 completed.  Last Training Loss:  0.447459
14 batches of epoch 1 completed.  Last Training Loss:  0.679614
21 batches of epoch 1 completed.  Last Training Loss:  0.504116
28 batches of epoch 1 completed.  Last Training Loss:  0.636513
35 batches of epoch 1 completed.  Last Training Loss:  0.665016
Saving model . . .
Last Validation Loss:  11.611863, Lowest Validation Loss:  11.611863
3 batches of epoch 2 completed.  Last Training Loss:  0.431535
10 batches of epoch 2 completed.  Last Training Loss:  0.524961
17 batches of epoch 2 completed.  Last Training Loss:  0.180326
24 batches of epoch 2 completed.  Last Training Loss:  0.552641
31 batches of epoch 2 completed.  Last Training Loss:  0.617662
38 batches of epoch 2 completed.  Last Training Loss:  0.667711
Last Validation Loss:  12.049320, Lowest Validation Loss:  11.611863
6 batches of epoch 3 completed.  Last Training Loss:  0.223446
13 batches of epoch 3 completed.  Last Training Loss:  0.269266
20 batches of 

In [20]:
# Return inference given model, keyvectors vocabulary,prompt, maximum response length
def return_inference(model,emb_kv,prompt,maxLen):
    
    # Bounded prompt
    bounded_prompt = prompt + ' ' + eosToken
    
    # Convert prompt to embedding vectors
    prompt_model_input = torch.tensor(tokensToIndices(prepare_text(bounded_prompt,emb_kv),emb_kv), dtype=torch.int64)
    prompt_model_input = prompt_model_input.unsqueeze(dim=0)
    prompt_question_lengths = torch.tensor([prompt_model_input.size(dim=1)], dtype=torch.int64)
    
    # Prepare unused target for forward method, except to feed in soseq token at start
    unused_target = emb_kv.get_index(sosToken)*torch.ones(maxLen,dtype=torch.int64).unsqueeze(dim=0)
    unused_lengths = torch.tensor([maxLen],dtype=torch.int64)
        
    # Get output tensor
    output = model.to("cpu")((prompt_model_input,prompt_question_lengths), \
                                     (unused_target,unused_lengths), \
                                     teacher_forcing_ratio=0)
    outString = ""
    
    # Convert embeddings to words
    for i in range(output.size(dim=1)):
        next_word = emb_kv.index_to_key[torch.argmax(output[0][i])]
        if next_word == eosToken:
            outString += " " + next_word
            break
        else:
            outString += " " + next_word
    
    return outString

In [22]:

prompt_one = "What was the size of the notre dame endowment when theodore hesburgh became president?"
prompt_two = "Who won the most music awards?"
prompt_three = "Where was the last war?"

for prompt in [prompt_one, prompt_two, prompt_three]:
    print(return_inference(seqToseqNet,pr_kv,prompt,40))
    

 $ million eoseq
 polish eoseq
 september eoseq
