# LSTM Bot

## Project Overview

In this project, you will build a chatbot that can converse with you at the command line. The chatbot will use a Sequence to Sequence text generation architecture with an LSTM as it's memory unit. You will also learn to use pretrained word embeddings to improve the performance of the model. At the conclusion of the project, you will be able to show your chatbot to potential employers.

Additionally, you have the option to use pretrained word embeddings in your model. We have loaded Brown Embeddings from Gensim in the starter code below. You can compare the performance of your model with pre-trained embeddings against a model without the embeddings.



---



A sequence to sequence model (Seq2Seq) has two components:
- An Encoder consisting of an embedding layer and LSTM unit.
- A Decoder consisting of an embedding layer, LSTM unit, and linear output unit.

The Seq2Seq model works by accepting an input into the Encoder, passing the hidden state from the Encoder to the Decoder, which the Decoder uses to output a series of token predictions.

## Dependencies

- Pytorch
- Numpy
- Pandas
- NLTK
- Gzip
- Gensim


Please choose a dataset from the Torchtext website. We recommend looking at the Squad dataset first. Here is a link to the website where you can view your options:

- https://pytorch.org/text/stable/datasets.html





# Use the Cornell Movie Dialog dataset

The Utterances dataset has been uploaded to the /data folder.

We do not need the metadata, so extract only utterance text and ID, conversation ID, and the "reply-to" field.

In [1]:
import json
import re
from nltk.tokenize import RegexpTokenizer
import pickle
import pandas as pd

# Expand contractions, tokenize utterance text and trim to MAX_LENGTH

In [2]:
tokenizer = RegexpTokenizer(r'\w+\'s|\w+')

In [3]:
max_length = 24

In [4]:
def clean_token_list(text):
    
    clean_text = text.lower()
    
    clean_text = re.sub('can\'t', 'can not', clean_text)
    clean_text = re.sub('won\'t', 'will not', clean_text)
    clean_text = re.sub('n\'t', ' not', clean_text)
    clean_text = re.sub('\'ll', ' will', clean_text)
    clean_text = re.sub('\'m', ' am', clean_text)
    clean_text = re.sub('he\'s', 'he is', clean_text)
    clean_text = re.sub('she\'s', 'she is', clean_text)
    clean_text = re.sub('it\'s', 'it is', clean_text)
    clean_text = re.sub('how\'s', 'how is', clean_text)
    clean_text = re.sub('that\'s', 'that is', clean_text)
    clean_text = re.sub('what\'s', 'what is', clean_text)
    clean_text = re.sub('here\'s', 'here is', clean_text)
    clean_text = re.sub('there\'s', 'there is', clean_text)
    clean_text = re.sub('let\'s', 'let us', clean_text)
    clean_text = re.sub('\'re', ' are', clean_text)
    clean_text = re.sub('\'ve', ' have', clean_text)
    clean_text = re.sub('\'d', ' would', clean_text)
    
    return tokenizer.tokenize(clean_text)[:max_length]

# Get utterances from file and apply preprocessing

In [5]:
raw_data = []

input_path = './data/utterances.jsonl'

with open(input_path, 'r', encoding='utf-8') as f:
    
    for line in f:
        
        line_data = json.loads(line.rstrip('\n|\r'))
        
        line_data_dict = {}
        
        line_data_dict['id'] = line_data['id']
        line_data_dict['conversation_id'] = line_data['conversation_id']
        line_data_dict['token_list'] = clean_token_list(line_data['text'])
        line_data_dict['reply_to'] = line_data['reply-to']
        
        raw_data.append(line_data_dict)

# Create a Pandas dataframe for ease of processing

In [6]:
lines_df = pd.DataFrame(raw_data)

In [7]:
lines_df.head(10)

Unnamed: 0,id,conversation_id,token_list,reply_to
0,L1045,L1044,"[they, do, not]",L1044
1,L1044,L1044,"[they, do, to]",
2,L985,L984,"[i, hope, so]",L984
3,L984,L984,"[she, okay]",
4,L925,L924,"[let, us, go]",L924
5,L924,L924,[wow],
6,L872,L870,"[okay, you, are, gonna, need, to, learn, how, ...",L871
7,L871,L870,[no],L870
8,L870,L870,"[i, am, kidding, you, know, how, sometimes, yo...",
9,L869,L866,"[like, my, fear, of, wearing, pastels]",L868


# Create a list of exchanges and convert to a dataframe

Each entry will contain a "call" and a "response". For a given entry, we identify its "call" by using the "reply_to" field. This way, we get an appropriate response for each utterance (except the first one in each dialogue, for which the "response_to" field is NONE.

I will limit the size of the list to 1,000 conversations for demonstration purposes, as the full list takes too long to process.

In [8]:
conversation_id_list = list(lines_df.conversation_id.unique())[:1000]

In [9]:
exchange_list = []
index = 0

for conversation_id in conversation_id_list:
    
    index += 1

    temp_df = lines_df.loc[lines_df['conversation_id'] == conversation_id]
    
    utterance_list = temp_df.to_dict('records')
    
    utterance_dict = {}

    for utterance in utterance_list:

        temp_dict = {}
        temp_dict['token_list'] = utterance['token_list']
        temp_dict['reply_to'] = utterance['reply_to']
        utterance_dict[utterance['id']] = temp_dict
        
    for utterance in utterance_dict.keys():
    
        call_id = utterance_dict[utterance]['reply_to']

        if call_id != None:
            
            try:
                
                exchange_list.append({'CONVERSATION':conversation_id, 'EXCHANGE': call_id + '->' + utterance, 'CALL':utterance_dict[call_id]['token_list'], 'RESPONSE':utterance_dict[utterance]['token_list']})
            
            except KeyError:
                
                pass
            
    if index % 500 == 0:
        
        print('Procesed', index, 'conversations')

Procesed 500 conversations
Procesed 1000 conversations


In [10]:
exchange_df = pd.DataFrame.from_dict(exchange_list)

In [11]:
exchange_df.head()

Unnamed: 0,CONVERSATION,EXCHANGE,CALL,RESPONSE
0,L1044,L1044->L1045,"[they, do, to]","[they, do, not]"
1,L984,L984->L985,"[she, okay]","[i, hope, so]"
2,L924,L924->L925,[wow],"[let, us, go]"
3,L870,L871->L872,[no],"[okay, you, are, gonna, need, to, learn, how, ..."
4,L870,L870->L871,"[i, am, kidding, you, know, how, sometimes, yo...",[no]


# Pickle the dataframe, so we do not have to rerun every time

In [12]:
with open("exchange_df.bin", "wb") as f:
    
    pickle.dump(exchange_df, f)

# Now construct the vocabulary

Load exchanges dataframe

In [5]:
with open("exchange_df.bin", "rb") as f:
    
    exchange_df = pickle.load(f)

In [6]:
def add_to_vocab(word_list, vocab):
    
    vocab.extend(word_list)

In [7]:
raw_word_list = []

exchange_df['CALL'].apply(add_to_vocab, vocab=raw_word_list)
exchange_df['RESPONSE'].apply(add_to_vocab, vocab=raw_word_list)

print('RAW WORD LIST LENGTH =', len(raw_word_list))

RAW WORD LIST LENGTH = 47624


In [8]:
from collections import Counter

In [9]:
word_counter = Counter(raw_word_list)

unique_word_list = sorted(word_counter, key=word_counter.get, reverse=True)

# Add tokens for start_of_sentence, end_of_sentence, padding
#
unique_word_list.insert(0, '<sos>')
unique_word_list.insert(1, '<eos>')
unique_word_list.insert(2, '<pad>')

#  This is our vocabulary size
#
vocab_size = len(unique_word_list)

print('VOCAB SIZE =', vocab_size)

VOCAB SIZE = 3778


# Create mappings from words to IDs and back

In [10]:
# Create mappings from words to IDs and from IDs to words
#
word_to_id = {word:id for id, word in enumerate(unique_word_list)}

id_to_word = {value:key for key,value in word_to_id.items()}

# Convert calls and responses to IDs and pad

In [11]:
#  Function to convert a text to IDs and to pad to LENGTH with zeros
#  If TEXT is longer than LENGTH, TEXT will be truncated
#
def text_to_ids(word_list):
    
    #  Initialize to all <pad> tokens
    #
    padded_seq = [2] * max_length
    
    padded_seq[0] = 0  #  index for <sos>
    
    for index, word in enumerate(word_list):
        
        try:
        
            padded_seq[index+1] = word_to_id[word]
        
        except KeyError:
            
            print('Key Error:', word)
            
        except IndexError:
            
            break
            
    eos_index = min(max_length-1, len(word_list)+1)
    
    padded_seq[eos_index] = 1  #  index for <eos>
    
    return padded_seq

In [20]:
# Convert contexts and questions to IDs
#
exchange_df['call_to_ids'] = exchange_df.CALL.apply(text_to_ids)
exchange_df['response_to_ids'] = exchange_df.RESPONSE.apply(text_to_ids)

In [21]:
exchange_df.head()

Unnamed: 0,CONVERSATION,EXCHANGE,CALL,RESPONSE,call_to_ids,response_to_ids
0,L1044,L1044->L1045,"[they, do, to]","[they, do, not]","[0, 42, 11, 6, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2...","[0, 42, 11, 7, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2..."
1,L984,L984->L985,"[she, okay]","[i, hope, so]","[0, 51, 78, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2...","[0, 4, 271, 48, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, ..."
2,L924,L924->L925,[wow],"[let, us, go]","[0, 2533, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[0, 70, 84, 56, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, ..."
3,L870,L871->L872,[no],"[okay, you, are, gonna, need, to, learn, how, ...","[0, 36, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,...","[0, 78, 3, 12, 88, 102, 6, 507, 44, 6, 535, 1,..."
4,L870,L870->L871,"[i, am, kidding, you, know, how, sometimes, yo...",[no],"[0, 4, 19, 574, 3, 25, 44, 455, 3, 43, 1189, 2...","[0, 36, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,..."


# Define Encoder, Decoder, and Seq2Seq model

I will use a single-layer LSTM for both the Encoder and the Decoder.

In [12]:
import torch
import torch.nn as nn

In [13]:
import random

In [14]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Encoder

In [15]:
class Encoder(nn.Module):
    """
    Input :
        - source batch
    Layer : 
        source batch -> Embedding -> LSTM
    Output :
        - LSTM hidden state
        - LSTM cell state

    Parmeters
    ---------
    vocab_size : int
        Input dimension, should equal to the source vocab size.
    
    embedding_size : int
        Embedding layer's dimension.
        
    hidden_size : int
        LSTM Hidden/Cell state's dimension.
        
    """

    def __init__(self, vocab_size: int, embedding_size: int, hidden_size: int):

        super().__init__()

        self.embedding_size = embedding_size
        self.hidden_size = hidden_size
        self.vocab_size = vocab_size

        self.embedding = nn.Embedding(vocab_size, embedding_size)
        self.lstm = nn.LSTM(embedding_size, hidden_size, 1)

    def forward(self, source_batch: torch.LongTensor):
        """

        Parameters
        ----------
        source_batch : 2d torch.LongTensor
            Batched tokenized source sentence of shape [sent len, batch size].

        Returns
        -------
        hidden, cell : 3d torch.LongTensor
            Hidden and cell state of the LSTM layer. Each state's shape
            [n layers * n directions, batch size, hidden dim]
        """
        embedding = self.embedding(source_batch) # [sent len, batch size, emb dim]
        outputs, (hidden, cell) = self.lstm(embedding)
        # outputs -> [sent len, batch size, hidden dim * n directions]
        return hidden, cell

# Decoder

In [16]:
class Decoder(nn.Module):
    """
    Input :
        - first token in the target batch
        - LSTM hidden state from the encoder
        - LSTM cell state from the encoder
    Layer :
        target batch -> Embedding -- 
                                   |
        encoder hidden state ------|--> LSTM -> Linear
                                   |
        encoder cell state   -------
        
    Output :
        - prediction
        - LSTM hidden state
        - LSTM cell state

    Parmeters
    ---------
    output : int
        Output dimension, should equal to the target vocab size.
    
    embedding_size : int
        Embedding layer's dimension.
        
    hidden_size : int
        LSTM Hidden/Cell state's dimension.
        
    """

    def __init__(self, vocab_size: int, embedding_size: int, hidden_size: int):

        super().__init__()

        self.embedding_size = embedding_size
        self.hidden_size = hidden_size
        self.vocab_size = vocab_size

        self.embedding = nn.Embedding(vocab_size, embedding_size)
        self.lstm = nn.LSTM(embedding_size, hidden_size, 1)

        self.out = nn.Linear(hidden_size, vocab_size)

    def forward(self, source: torch.LongTensor, hidden: torch.FloatTensor, cell: torch.FloatTensor):
        """

        Parameters
        ----------
        source : 1d torch.LongTensor
            Batched tokenized source sentence of shape [batch size].
            
        hidden, cell : 3d torch.FloatTensor
            Hidden and cell state of the LSTM layer. Each state's shape
            [n layers * n directions, batch size, hidden dim]

        Returns
        -------
        prediction : 2d torch.LongTensor
            For each token in the batch, the predicted target vobulary.
            Shape [batch size, output dim]

        hidden, cell : 3d torch.FloatTensor
            Hidden and cell state of the LSTM layer. Each state's shape
            [n layers * n directions, batch size, hidden dim]
        """
        # [1, batch size, emb dim], the 1 serves as sent len
        
        embedding = self.embedding(source.unsqueeze(0))
        outputs, (hidden, cell) = self.lstm(embedding, (hidden, cell))
        prediction = self.out(outputs.squeeze(0))
        
        return prediction, hidden, cell

# Seq2Seq

In [18]:
class Seq2Seq(nn.Module):

    def __init__(self, encoder: Encoder, decoder: Decoder):

        super().__init__()

        self.encoder = encoder
        self.decoder = decoder

    def forward(self, source_batch: torch.LongTensor, target_batch: torch.LongTensor,
                tf_ratio: float=0.5):

        max_len, batch_size = target_batch.shape
        target_vocab_size = self.decoder.vocab_size

        # tensor to store decoder's output
        outputs = torch.zeros(max_len, batch_size, target_vocab_size).to(device)

        # last hidden & cell state of the encoder is used as the decoder's initial hidden state
        hidden, cell = self.encoder(source_batch)

        target = target_batch[0]
        
        for i in range(1, max_len):
            
            prediction, hidden, cell = self.decoder(target, hidden, cell)
            outputs[i] = prediction

            if random.random() < tf_ratio:
                target = target_batch[i]
            else:
                target = prediction.argmax(1)

        return outputs

# Define hyperparameters and set up the training loop

In [17]:
import torch.optim as optim

In [19]:
tf_ratio = 0.5

learning_rate = 0.01

embedding_size = 300

hidden_size = 1024

encoder = Encoder(vocab_size, embedding_size, hidden_size).to(device)

decoder = Decoder(vocab_size, embedding_size, hidden_size).to(device)

model = Seq2Seq(encoder, decoder).to(device)

#  Ignore the padding token, which has index 2 in our vocab
#
criterion = nn.CrossEntropyLoss(ignore_index=2)   #.to(device)

optimizer = optim.Adam(model.parameters(), lr = learning_rate)

In [30]:
model

Seq2Seq(
  (encoder): Encoder(
    (embedding): Embedding(3778, 300)
    (lstm): LSTM(300, 1024)
  )
  (decoder): Decoder(
    (embedding): Embedding(3778, 300)
    (lstm): LSTM(300, 1024)
    (out): Linear(in_features=1024, out_features=3778, bias=True)
  )
)

In [31]:
def train(model, dataloader, optimizer, criterion):
    
    model.train()

    epoch_loss = 0
    
    for _, data in enumerate(dataloader):
        
        optimizer.zero_grad()
        
        calls, responses = data
        
        calls, responses = calls.type(torch.LongTensor), responses.type(torch.LongTensor)
        
        outputs = model(calls, responses)
        
        # 1. as mentioned in the seq2seq section, we will
        # cut off the first element when performing the evaluation
        # 2. the loss function only works on 2d inputs
        # with 1d targets we need to flatten each of them
        
        outputs_flatten = outputs[1:].view(-1, outputs.shape[-1])
        responses_flatten = responses[1:].view(-1)
        
        loss = criterion(outputs_flatten, responses_flatten)

        loss.backward()
        optimizer.step()

        epoch_loss += loss.item()

    return epoch_loss / len(dataloader)

In [32]:
def evaluate(model, dataloader, criterion):
    
    model.eval()

    epoch_loss = 0
    
    with torch.no_grad():
        
        for _, data in enumerate(dataloader):
            
            calls, responses = data
            
            calls, responses = calls.type(torch.LongTensor), responses.type(torch.LongTensor)
            
            # turn off teacher forcing
            #
            outputs = model(calls, responses, tf_ratio=0) 
            
            outputs_flatten = outputs[1:].view(-1, outputs.shape[-1])
            responses_flatten = responses[1:].view(-1)
            
            loss = criterion(outputs_flatten, responses_flatten)
            
            epoch_loss += loss.item()

    return epoch_loss / len(dataloader)

# Create training, validation, and testing datasets

In [33]:
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

In [34]:
#  Split the dataframe
#

#  Use 20% of the entire dataframe for testing
#
test_df = exchange_df.sample(frac=0.2)

#  Use 80% for training
#
train_df = exchange_df.drop(test_df.index)

#  Use 20% of the training df for validation
#
validation_df = train_df.sample(frac=0.2)

train_df = train_df.drop(validation_df.index)

In [35]:
#  Dataset class
#
class ExchangeDataset(Dataset):
    
    def __init__(self, df):
        
        x = df.iloc[:,4].values.tolist()
        y = df.iloc[:,5].values.tolist()
        
        self.x = torch.tensor(x, dtype=torch.float32)
        self.y = torch.tensor(y, dtype=torch.float32)
        
    def __len__(self):
        
        return len(self.y)
    
    def __getitem__(self, index):
        
        return self.x[index], self.y[index]

In [36]:
train_dataset = ExchangeDataset(train_df)

validation_dataset = ExchangeDataset(validation_df)

test_dataset = ExchangeDataset(test_df)

# Create dataloaders

In [37]:
batch_size = 10

In [38]:
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=False)

validation_dataloader = DataLoader(validation_dataset, batch_size=batch_size, shuffle=False)

test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Run the training loop

In [39]:
import time
import math

In [40]:
def epoch_time(start_time, end_time):
    
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

In [41]:
num_epochs = 5
best_valid_loss = float('inf')

for epoch in range(num_epochs):
    
    start_time = time.time()
    train_loss = train(model, train_dataloader, optimizer, criterion)
    valid_loss = evaluate(model, validation_dataloader, criterion)
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:

        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'best-model.pt')

    print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. PPL: {math.exp(valid_loss):7.3f}')

Epoch: 01 | Time: 6m 27s
	Train Loss: 5.725 | Train PPL: 306.408
	 Val. Loss: 5.505 |  Val. PPL: 245.980
Epoch: 02 | Time: 6m 7s
	Train Loss: 5.372 | Train PPL: 215.237
	 Val. Loss: 5.671 |  Val. PPL: 290.393
Epoch: 03 | Time: 7m 7s
	Train Loss: 5.293 | Train PPL: 198.993
	 Val. Loss: 5.850 |  Val. PPL: 347.141
Epoch: 04 | Time: 8m 0s
	Train Loss: 5.198 | Train PPL: 180.820
	 Val. Loss: 5.908 |  Val. PPL: 367.940
Epoch: 05 | Time: 9m 2s
	Train Loss: 5.103 | Train PPL: 164.594
	 Val. Loss: 6.037 |  Val. PPL: 418.588


# Set up the loop to interact with the model

Adapting code from PyTorch chatbot tutorial: https://pytorch.org/tutorials/beginner/chatbot_tutorial.html

In [93]:
class GreedySearchDecoder(nn.Module):
    
    def __init__(self, encoder, decoder):
        
        super(GreedySearchDecoder, self).__init__()
        
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, input_batch):
        
        #  Forward input through encoder model
        #
        encoder_outputs, encoder_hidden = self.encoder(input_batch)
        encoder_outputs = encoder_outputs.squeeze(0)
        print('encoder_outputs shape is', encoder_outputs.size())
        
        #  Prepare the encoder's final hidden layer to be first hidden input to the decoder
        #
        decoder_hidden = encoder_hidden.squeeze(0) # [:decoder.n_layers]
        print('decoder_hidden shape is', decoder_hidden.size())
        
        #  Initialize decoder input with 0s, index of the SOS token
        #
        decoder_input = torch.ones(1, 1, device=device, dtype=torch.long) * 0
        print('decoder_input shape is', decoder_input.size())
        
        #  Initialize tensors to append decoded words to
        #
        all_tokens = torch.zeros([0], device=device, dtype=torch.long)
        all_scores = torch.zeros([0], device=device)
        
        #  Iteratively decode one word token at a time
        #
        for _ in range(max_length):
            
            #  Forward pass through decoder
            #
            decoder_output, decoder_hidden = self.decoder(decoder_input, decoder_hidden, encoder_outputs)
            
            #  Obtain most likely word token and its softmax score
            #
            decoder_scores, decoder_input = torch.max(decoder_output, dim=1)
            
            #  Record token and score
            #
            all_tokens = torch.cat((all_tokens, decoder_input), dim=0)
            all_scores = torch.cat((all_scores, decoder_scores), dim=0)
            
            #  Prepare current token to be next decoder input (add a dimension)
            #
            decoder_input = torch.unsqueeze(decoder_input, 0)
            
        #  Return collections of word tokens and scores
        #
        return all_tokens, all_scores

In [98]:
def evaluate(encoder, decoder, searcher, sentence, max_length=max_length):
    
    decoded_words = []
    
    try:
        
        ### Format input sentence as a batch
        #  words -> indexes
        #
        indexes_batch = [text_to_ids(sentence)]

        #  Transpose dimensions of batch to match models' expectations
        #
        input_batch = torch.LongTensor(indexes_batch).transpose(0, 1)

        #  Use appropriate device
        #
        input_batch = input_batch.to(device)

        #  Decode sentence with searcher
        #
        tokens, scores = searcher(input_batch)

        #  indexes -> words
        #
        decoded_words = [id_to_word[token.item()] for token in tokens]
    
    except KeyError:
            
            print("Error: Encountered unknown word.")
            
    return decoded_words

In [99]:
def evaluateInput(encoder, decoder, searcher):
    
    input_sentence = ''
    
    while(1):
        
        #  Get input sentence
        #
        input_sentence = input('> ')

        #  Check if it is quit case
        #
        if input_sentence == 'goodbye':

            print('OK, catch you later!')
            break

        #  Preprocess sentence and convert to a list of tokens
        #
        user_tokens = clean_token_list(input_sentence)

        #  Evaluate sentence
        #
        output_words = evaluate(encoder, decoder, searcher, user_tokens)

        #  Format and print response sentence
        #
        output_words[:] = [x for x in output_words if not (x == '<eos>' or x == '<pad>')]

        print('Bot:', ' '.join(output_words))

Load the saved model from file

In [96]:
model.load_state_dict(torch.load('best-model.pt'))

<All keys matched successfully>

# Chat loop; type 'goodbye' to end the conversation

In [100]:
#  Set encoder and decoder to eval mode
#
encoder.eval()
decoder.eval()

#  Initialize search module
#
searcher = GreedySearchDecoder(encoder, decoder)

#  Begin chatting
#
evaluateInput(encoder, decoder, searcher)

> hello
encoder_outputs shape is torch.Size([1, 1024])
decoder_hidden shape is torch.Size([1, 1024])
decoder_input shape is torch.Size([1, 1])


RuntimeError: input must have 3 dimensions, got 5