<a href="https://colab.research.google.com/github/vat99/makemore/blob/main/HW_1_Language_Modeling_Problem.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Project 1: Language Modeling

In this project, you will implement several different types of language models for text.  We'll start with n-gram models, then move on to neural n-gram and LSTM language models.

**Warning: Do not start this project the day before it is due!**
Some parts require 20 minutes or more to run, so debugging and tuning can take a significant amount of time.

Our dataset for this project will be the WikiText2 language modeling dataset.  We provide some of the basic preprocessing, such as tokenization and rare word filtering (using the `<unk>` token).
Therefore, we can assume that all word types in the val/test set appear at least once in the training set.

In [1]:
!pip install datasets

Collecting datasets
  Downloading datasets-3.0.1-py3-none-any.whl.metadata (20 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.0.1-py3-none-any.whl (471 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (39.9 MB)
[2K  

In [2]:
# This block handles some imports and defines some constants.
# You shouldn't need to edit this, but if you want to
# import other standard python packages, that is fine.

# imports
from collections import Counter, defaultdict
import copy
import numpy as np
import math
import tqdm
import random
import pdb
from typing import List, Optional, Tuple, Union

from datasets import load_dataset
import torch
from torch import nn
import torch.nn.functional as F

# Some constants
UNK_TOK = "<unk>"
PAD_TOK = "<pad>"
EOS_TOK = "<eos>"

In [3]:
# This block defines the Vocabulary class we need later.
# You shouldn't need to edit this.

class Vocab:
    def __init__(self, train_text: List[str], min_freq=0):
        """
        We collect counts from train_text.
        train_text: a list of tokens.
        min_freq: if a token appears strictly less than this, it will not be
            added to vocab.
        """
        special_tokens = [UNK_TOK, PAD_TOK, EOS_TOK]

        counter = Counter(train_text)
        # Note that the order is fixed as long as the training text is the same.
        # it's sorted by frequency.
        all_tokens = [
            t for t, c in counter.most_common()
            if c >= min_freq and t not in special_tokens
        ]

        self.all_tokens = special_tokens + all_tokens
        self.str_to_id = {s: i for i, s in enumerate(self.all_tokens)}

        self.unk_tok = UNK_TOK
        self.pad_tok = PAD_TOK
        self.eos_tok = EOS_TOK

    def size(self) -> int:
        return len(self.all_tokens)


    def ids_to_strs(self, indices: List[int]) -> List[str]:
        return [self.all_tokens[ii] for ii in indices]


    def strs_to_ids(self, strings: List[str]) -> List[int]:
        return [self.str_to_id[s] for s in strings]


    def __contains__(self, token: str) -> bool:
        return token in self.str_to_id

In [4]:
# This block downloads and processes the data.
# You shouldn't need to edit this.

wikitext2_dataset = load_dataset("Salesforce/wikitext", "wikitext-2-raw-v1")
print(f"Raw train examples: {wikitext2_dataset['train']['text'][:10]}")

# just use the simplest one for now
tokenizer = lambda x: x.split()

# tokenize datatsets
def preprocess(_dataset: List[str]) -> List[str]:
    """
    Each sentence in _dataset is tokenized into a list of strings.
    _dataset: List[str]. Each string is a sentence.
    """
    ret = []
    for sent in _dataset:
        sent = sent.rstrip('\n')
        # skip empty sentences
        if not sent:
            continue
        # add EOS to the end of sentence
        ret += tokenizer(sent) + [EOS_TOK]
    return ret

tok_train_dataset = preprocess(wikitext2_dataset['train']['text'])
tok_validation_dataset = preprocess(wikitext2_dataset['validation']['text'])
tok_test_dataset = preprocess(wikitext2_dataset['test']['text'])
print(f"Dataset size (#tokens) - Train: {len(tok_train_dataset)}; Validation: {len(tok_validation_dataset)}; Test: {len(tok_test_dataset)}.")

# build vocabulary: use `min_freq` to model UNK in training
### You'll need this vocab throughout this HW.
vocab = Vocab(tok_train_dataset, min_freq=2)
print(f"Vocab size: {vocab.size()}. Examples: {vocab.ids_to_strs(list(range(20)))}")

# handle UNKs properly
def replace_unseen_with_unk(_dataset: List[str]) -> List[str]:
    """
    We replace the unseen tokens in _dataset with vocab.unk_tok.
    """
    new_data = []
    for tok in _dataset:
        if tok in vocab:
            new_data.append(tok)
        else:
            new_data.append(vocab.unk_tok)
    return new_data

### You'll need these three datasets throughout this HW.
tok_train_dataset = replace_unseen_with_unk(tok_train_dataset)
tok_validation_dataset = replace_unseen_with_unk(tok_validation_dataset)
tok_test_dataset = replace_unseen_with_unk(tok_test_dataset)
print(f"Final train examples: {tok_train_dataset[:40]}")
print(f"Final val examples: {tok_validation_dataset[:40]}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/733k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/6.36M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/657k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/4358 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/36718 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/3760 [00:00<?, ? examples/s]

Raw train examples: ['', ' = Valkyria Chronicles III = \n', '', ' Senjō no Valkyria 3 : Unrecorded Chronicles ( Japanese : 戦場のヴァルキュリア3 , lit . Valkyria of the Battlefield 3 ) , commonly referred to as Valkyria Chronicles III outside Japan , is a tactical role @-@ playing video game developed by Sega and Media.Vision for the PlayStation Portable . Released in January 2011 in Japan , it is the third game in the Valkyria series . Employing the same fusion of tactical and real @-@ time gameplay as its predecessors , the story runs parallel to the first game and follows the " Nameless " , a penal military unit serving the nation of Gallia during the Second Europan War who perform secret black operations and are pitted against the Imperial unit " Calamaty Raven " . \n', " The game began development in 2010 , carrying over a large portion of the work done on Valkyria Chronicles II . While it retained the standard features of the series , it also underwent multiple adjustments , such as making

We've implemented a unigram model here as a demonstration.

In [None]:
class UnigramModel:
    def __init__(self, train_text: List[str]):
        self.counts = Counter(train_text)
        self.total_count = len(train_text)

    def probability(self, word: str) -> float:
        return self.counts[word] / self.total_count

    def next_word_probabilities(self, text_prefix: List[str]) -> List[str]:
        """
        Return a list of probabilities for each word in the vocabulary.
        In unigram model, `text_prefix` doesn't matter as we are not using any
            context at all.
        """
        return [self.probability(word) for word in vocab.all_tokens]

    def perplexity(self, full_text: List[str]) -> float:
        """Return the perplexity of the model on a text as a float.

        full_text -- a list of string tokens
        """
        log_probabilities = []
        for word in full_text:
            # Note that the base of the log doesn't matter
            # as long as the log and exp use the same base.
            log_probabilities.append(math.log(self.probability(word), 2))
        return 2 ** -np.mean(log_probabilities)

unigram_demonstration_model = UnigramModel(tok_train_dataset)
print('unigram validation perplexity:',
      unigram_demonstration_model.perplexity(tok_test_dataset))

unigram validation perplexity: 1057.2131456213988


In [None]:
def check_validity(model):
    """
    Performs several sanity checks on your model:
      1) That `next_word_probabilities` returns a valid distribution
      2) That perplexity matches a perplexity calculated from `next_word_probabilities`

    Although it is possible to calculate perplexity from `next_word_probabilities`,
      it is still good to have a separate more efficient method that only computes
      the probabilities of observed words.
    """

    log_probabilities = []
    for i in range(10):
        prefix = tok_validation_dataset[:i]
        probs = model.next_word_probabilities(prefix)
        assert min(probs) >= 0, "Negative value in next_word_probabilities"
        assert max(probs) <= 1 + 1e-8, "Value larger than 1 in next_word_probabilities"
        assert abs(sum(probs)-1) < 1e-4, "next_word_probabilities do not sum to 1"

        word_id = vocab.str_to_id[tok_validation_dataset[i]]
        selected_prob = probs[word_id]
        log_probabilities.append(math.log(selected_prob))

    perplexity = math.exp(-np.mean(log_probabilities))
    your_perplexity = model.perplexity(tok_validation_dataset[:10])
    assert abs(perplexity-your_perplexity) < 0.1, "your perplexity does not " + \
    "match the one we calculated from `next_word_probabilities`,\n" + \
    "at least one of `perplexity` or `next_word_probabilities` is incorrect.\n" + \
    f"we calcuated {perplexity} from `next_word_probabilities`,\n" + \
    f"but your perplexity function returned {your_perplexity} (on a small sample)."

In [None]:
check_validity(unigram_demonstration_model)

To generate from a language model, we can sample one word at a time conditioning on the words we have generated so far.

In [None]:
def generate_text(model, n=20, prefix=('<eos>', '<eos>')):
    prefix = list(prefix)
    for _ in range(n):
        probs = model.next_word_probabilities(prefix)
        word = random.choices(vocab.all_tokens, probs)[0]
        prefix.append(word)
    return ' '.join(prefix)

# unigram model does not utilize prefix
print(generate_text(unigram_demonstration_model, prefix=""))

72 000 visible pieces Nixon against a battalions April a by , by Antonio in sheet <unk> nine send coniferous


TODO: Copy the printed output to your report.

In fact there are many strategies to get better-sounding samples, such as only sampling from the top-k words or sharpening the distribution with a temperature.  You can read more about sampling from a language model in this recent paper: https://arxiv.org/pdf/1904.09751.pdf.

You will need to submit some outputs from the models you implement for us to grade.  The following function will be used to generate the required output files.

In [5]:
!wget https://cal-cs288.github.io/sp21/project_files/proj_1/eval_prefixes.txt
!wget https://cal-cs288.github.io/sp21/project_files/proj_1/eval_output_vocab.txt
!wget https://cal-cs288.github.io/sp21/project_files/proj_1/eval_prefixes_short.txt
!wget https://cal-cs288.github.io/sp21/project_files/proj_1/eval_output_vocab_short.txt

def save_truncated_distribution(model, filename, short=True):
    """Generate a file of truncated distributions.

    Probability distributions over the full vocabulary are large,
    so we will truncate the distribution to a smaller vocabulary.

    Please do not edit this function
    """
    vocab_name = 'eval_output_vocab'
    prefixes_name = 'eval_prefixes'

    if short:
      vocab_name += '_short'
      prefixes_name += '_short'

    with open(f'{vocab_name}.txt', 'r') as eval_vocab_file:
        eval_vocab = [w.strip() for w in eval_vocab_file]
    eval_vocab_ids = sorted(list(set([vocab.str_to_id[s] if s in vocab else vocab.str_to_id[vocab.unk_tok]
                      for s in eval_vocab])))

    all_selected_probabilities = []
    with open(f'{prefixes_name}.txt', 'r') as eval_prefixes_file:
        lines = eval_prefixes_file.readlines()
        for line in tqdm.notebook.tqdm(lines, leave=False):
            prefix = line.strip().split(' ')
            probs = model.next_word_probabilities(prefix)
            selected_probs = np.array([probs[i] for i in eval_vocab_ids], dtype=np.float32)
            all_selected_probabilities.append(selected_probs)

    all_selected_probabilities = np.stack(all_selected_probabilities)
    np.save(filename, all_selected_probabilities)
    print('saved', filename)

--2024-09-29 23:04:02--  https://cal-cs288.github.io/sp21/project_files/proj_1/eval_prefixes.txt
Resolving cal-cs288.github.io (cal-cs288.github.io)... 185.199.108.153, 185.199.109.153, 185.199.111.153, ...
Connecting to cal-cs288.github.io (cal-cs288.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 519055 (507K) [text/plain]
Saving to: ‘eval_prefixes.txt’


2024-09-29 23:04:02 (10.6 MB/s) - ‘eval_prefixes.txt’ saved [519055/519055]

--2024-09-29 23:04:02--  https://cal-cs288.github.io/sp21/project_files/proj_1/eval_output_vocab.txt
Resolving cal-cs288.github.io (cal-cs288.github.io)... 185.199.108.153, 185.199.109.153, 185.199.111.153, ...
Connecting to cal-cs288.github.io (cal-cs288.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12497 (12K) [text/plain]
Saving to: ‘eval_output_vocab.txt’


2024-09-29 23:04:02 (19.1 MB/s) - ‘eval_output_vocab.txt’ saved [12497/12497]

--2024-09-29

In [None]:
save_truncated_distribution(unigram_demonstration_model,
                            'unigram_demonstration_predictions.npy')

  0%|          | 0/1000 [00:00<?, ?it/s]

saved unigram_demonstration_predictions.npy


### N-gram Model

Now it's time to implement an n-gram language model.

Because not every n-gram will have been observed in training, use add-alpha smoothing to make sure no output word has probability 0.

This is an example of bigram model with smoothing:
$$P(w_2|w_1)=\frac{C(w_1,w_2)+\alpha}{C(w_1)+N\alpha}$$

where $N$ is the vocab size and $C$ is the count for the given unigram/bigram.  An alpha value around `3e-3`  should work.  Later, we'll replace this smoothing with model backoff.

One **edge case** you will need to handle is at the beginning of the text where you don't have `n-1` prior words.  You may handle this by using a uniform distribution over the vocabulary.

A properly implemented bi-gram model should get a perplexity about/below **635** on the validation set.

**Note**: Do not change the signature of the `next_word_probabilities` and `perplexity` functions.  We will use these as a common interface for all of the different model types.  Make sure these two functions call `n_gram_probability`, because later we are going to override `n_gram_probability` in a subclass.
Also, we suggest pre-computing and caching the counts $C$ when you initialize `NGramModel` for efficiency.

In [None]:
sequences, sequences_tokenized = [], []
sequence, sequence_tokens = "", []
for i, s in enumerate(tok_train_dataset[:226]):
  token = vocab.str_to_id[s]
  print(i, s, token)
  sequence += s
  sequence_tokens.append(token)
  if s == "<eos>":
    sequences.append(sequence)
    sequences_tokenized.append(sequence_tokens)
    sequence = ""
    sequence_tokens = []
print(sequences)
print(sequences_tokenized)

print(tok_train_dataset[-1])

0 = 11
1 Valkyria 3909
2 Chronicles 4420
3 III 851
4 = 11
5 <eos> 2
6 Senjō 20629
7 no 128
8 Valkyria 3909
9 3 91
10 : 44
11 <unk> 0
12 Chronicles 4420
13 ( 24
14 Japanese 753
15 : 44
16 戦場のヴァルキュリア3 27247
17 , 4
18 lit 7037
19 . 5
20 Valkyria 3909
21 of 6
22 the 3
23 Battlefield 23381
24 3 91
25 ) 23
26 , 4
27 commonly 1905
28 referred 981
29 to 9
30 as 17
31 Valkyria 3909
32 Chronicles 4420
33 III 851
34 outside 690
35 Japan 955
36 , 4
37 is 25
38 a 10
39 tactical 7255
40 role 294
41 @-@ 15
42 playing 592
43 video 280
44 game 78
45 developed 428
46 by 22
47 Sega 14448
48 and 7
49 Media.Vision 27248
50 for 20
51 the 3
52 PlayStation 1739
53 Portable 8242
54 . 5
55 Released 10861
56 in 8
57 January 229
58 2011 339
59 in 8
60 Japan 955
61 , 4
62 it 30
63 is 25
64 the 3
65 third 255
66 game 78
67 in 8
68 the 3
69 Valkyria 3909
70 series 107
71 . 5
72 Employing 33279
73 the 3
74 same 153
75 fusion 4590
76 of 6
77 tactical 7255
78 and 7
79 real 927
80 @-@ 15
81 time 63
82 gameplay 2699
83 a

In [None]:
sample_train_dataset = tok_train_dataset[:226]
n_gram_length = 3

sequences, sequences_tokenized = [], []
sequence, sequence_tokens = "", []
for i, s in enumerate(sample_train_dataset):
  token = vocab.str_to_id[s]
  sequence += s
  sequence_tokens.append(token)
  if s == "<eos>":
    sequences.append(sequence)
    sequences_tokenized.append(sequence_tokens)
    sequence = ""
    sequence_tokens = []

n_gram_token_sequences = []
for n, st in enumerate(sequences_tokenized[:1]):
  print(f"sequence {n}: {list(map(lambda x: vocab.all_tokens[x], st))} {st}")
  for i in range(len(st)-n_gram_length+1):
    n_gram_tokens = st[i:i+n_gram_length]
    n_gram = vocab.ids_to_strs(n_gram_tokens)
    n_gram_token_sequences.append(n_gram_tokens)
    print(n_gram_tokens, n_gram)
    assert len(n_gram_tokens) == n_gram_length
  print("\n")

sequence 0: ['=', 'Valkyria', 'Chronicles', 'III', '=', '<eos>'] [11, 3909, 4420, 851, 11, 2]
[11, 3909, 4420] ['=', 'Valkyria', 'Chronicles']
[3909, 4420, 851] ['Valkyria', 'Chronicles', 'III']
[4420, 851, 11] ['Chronicles', 'III', '=']
[851, 11, 2] ['III', '=', '<eos>']




In [None]:
# getting the context
print(n_gram_token_sequences)
print(n_gram_token_sequences[1])
print(n_gram_token_sequences[1][:n_gram_length-1])

[[11, 3909, 4420], [3909, 4420, 851], [4420, 851, 11], [851, 11, 2]]
[3909, 4420, 851]
[3909, 4420]


In [None]:
from collections import defaultdict

class NGramModel:
    def __init__(self, train_text: List[str], n: int = 2, alpha: float = 3e-3):
        # get counts and perform any other setup
        self.n = n
        self.smoothing = alpha

        self.n_gram_counts = defaultdict(int)
        self.n_minus_1_gram_counts = defaultdict(int)

        # get sequences from train text
        # sequences_tokenized = []
        # sequence_tokens = []
        # for _, s in enumerate(train_text):
        #   token = vocab.str_to_id[s]
        #   sequence_tokens.append(token)
        #   if s == "<eos>":
        #     sequences_tokenized.append(sequence_tokens)
        #     sequence_tokens = []

        # for st_i, st in enumerate(sequences_tokenized):
        #   #print(f"sequence {st_i}: {list(map(lambda x: vocab.all_tokens[x], st))} {st}")
        #   for i in range(len(st)-n+1):
        #     n_gram_tokens = tuple(st[i:i+n])
        #     self.n_gram_counts[n_gram_tokens] += 1
        #     self.n_minus_1_gram_counts[n_gram_tokens[:-1]] += 1
        #     assert len(n_gram_tokens) == n

        tokenized_train_text = list(map(lambda s: vocab.str_to_id[s] if vocab.__contains__(s) else vocab.str_to_id[vocab.unk_tok], train_text))
        for i in range(len(tokenized_train_text)-self.n+1):
          n_gram_tokens = tuple(tokenized_train_text[i:i+self.n])
          self.n_gram_counts[n_gram_tokens] += 1
          self.n_minus_1_gram_counts[n_gram_tokens[:-1]] += 1
          assert len(n_gram_tokens) == self.n
          assert len(n_gram_tokens[:-1]) == self.n-1

    def n_gram_probability(self, n_gram: Tuple[str, ...]):
        """Return the probability of the last word in an n-gram.

        n_gram -- a list of string tokens
        returns the conditional probability of the last token given the rest.
        """

        assert len(n_gram) == self.n, n_gram
        n_gram_count = self.n_gram_counts[n_gram]
        n_minus_1_gram_count = self.n_minus_1_gram_counts[n_gram[:-1]]
        numerator = n_gram_count + self.smoothing
        denominator = n_minus_1_gram_count + (self.smoothing * vocab.size())

        return numerator/denominator

    def next_word_probabilities(self, text_prefix: List[str]):
        """Return a list of probabilities for each word in the vocabulary."""

        # YOUR CODE HERE
        # use your function n_gram_probability
        # vocab.all_tokens contains a list of words to return probabilities for

        #print(text_prefix)
        tokenized_prefix = list(map(lambda s: vocab.str_to_id[s] if vocab.__contains__(s) else vocab.str_to_id[vocab.unk_tok], text_prefix))
        #print(tokenized_prefix)

        if len(tokenized_prefix) <= self.n-1:
          return [1.0 / vocab.size() for _ in range(vocab.size())]

        probabilities = []
        for token in vocab.strs_to_ids(vocab.all_tokens):
          context = []
          if self.n > 1:
            context = tokenized_prefix[-(self.n-1):]
          n_gram_current = tuple(context + [token])
          assert len(n_gram_current) == self.n
          probabilities.append(self.n_gram_probability(n_gram_current))

        return probabilities

    def perplexity(self, full_text: List[str]):
        """ full_text is a list of string tokens
        return perplexity as a float """

        # YOUR CODE HERE
        # use your function n_gram_probability
        # This method should differ a bit from the example unigram model because
        # the first n-1 words of full_text must be handled as a special case.
        token_count = 0
        log_prob_sum = 0
        tokenized_full_text = list(map(lambda s: vocab.str_to_id[s] if vocab.__contains__(s) else vocab.str_to_id[vocab.unk_tok], full_text))
        for i in range(len(tokenized_full_text)):
          if i <= self.n-1:
            log_prob_sum += math.log(1 / vocab.size())
          else:
            n_gram = tuple(tokenized_full_text[i-self.n+1:i+1])
            log_prob_sum += math.log(self.n_gram_probability(n_gram))
          token_count += 1

        average_log_prob = log_prob_sum / token_count
        return math.exp(-average_log_prob)


unigram_model = NGramModel(tok_train_dataset, 1)
check_validity(unigram_model)
print('unigram validation perplexity:', unigram_model.perplexity(tok_validation_dataset)) # this should be the almost the same as our unigram model perplexity above

bigram_model = NGramModel(tok_train_dataset, n=2)
check_validity(bigram_model)
print('bigram validation perplexity:', bigram_model.perplexity(tok_validation_dataset))

trigram_model = NGramModel(tok_train_dataset, n=3)
check_validity(trigram_model)
print('trigram validation perplexity:', trigram_model.perplexity(tok_validation_dataset)) # this won't do very well...

unigram validation perplexity: 1096.29446153107
bigram validation perplexity: 635.6121589105167
trigram validation perplexity: 4287.937643068801


In [None]:
save_truncated_distribution(bigram_model, 'bigram_predictions.npy') # this might take a few minutes

  0%|          | 0/1000 [00:00<?, ?it/s]

saved bigram_predictions.npy


Please download `bigram_predictions.npy` once you finish this section so that you can submit it.

In the block below, please report your bigram validation perplexity.  (We will use this to help us calibrate our scoring on the test set.)

TODO: Report the perplexity in your report.

Bigram validation perplexity: ***fill in here***

We can also generate samples from the model to get an idea of how it is doing.

In [None]:
print(generate_text(bigram_model))

<eos> <eos> = = = = <eos> King <unk> de Janeiro instated crucial Takashi commissions Nexon concise endurance misdemeanour Ansari Dr. Crimint


We now free up some RAM, **it is important to run the cell below, otherwise you will likely run out of RAM in the Colab runtime.**

In [None]:
# Free up some RAM.
del bigram_model
del trigram_model

This basic model works okay for bigrams, but a better strategy (especially for higher-order models) is to use backoff.  Implement backoff with absolute discounting.
$$P\left(w_i|w_{i-n+1}^{i-1}\right)=\frac{max\left\{C(w_{i-n+1}^i)-\delta,0\right\}}{\sum_{w_i} C(w_{i-n+1}^i)} + \alpha(w_{i-n+1}^{i-1}) P(w_i|w_{i-n+2}^{i-1})$$

$$\alpha\left(w_{i-n+1}^{i-1}\right)=\frac{\delta N_{1+}(w_{i-n+1}^{i-1})}{{\sum_{w_i} C(w_{i-n+1}^i)}}$$
where $N_{1+}$ is the number of words that appear after the previous $n-1$ words (the number of times the max will select something other than 0 in the first equation).  If $\sum_{w_i} C(w_{i-n+1}^i)=0$, use the lower order model probability directly (the above equations would have a division by 0).

We found a discount $\delta$ of 0.9 to work well based on validation performance.  A trigram model with this discount value should get a validation perplexity around/below **310**.

In [None]:
from typing import Dict

class DiscountBackoffModel(NGramModel):
    def __init__(self, train_text: List[str],
                 lower_order_model: Union[NGramModel, "DiscountBackoffModel"],
                 n: int = 2,
                 delta: float = 0.9):
        """We only use n>=2"""
        assert n >= 2, n
        super().__init__(train_text, n=n)
        self.lower_order_model = lower_order_model
        self.discount = delta

        self.n_gram_counts = self._build_n_gram_counts(train_text, n)
        self.context_gram_counts = self._build_n_gram_counts(train_text, n-1)

        # Additional counts for Kneser-Ney smoothing
        self.continuation_counts = self._build_continuation_counts()
        self.total_continuations = sum(self.continuation_counts.values())

    def _build_n_gram_counts(self, text: List[str], n: int) -> Dict[Tuple[str, ...], int]:
        counts = defaultdict(int)
        for i in range(len(text)-n+1):
            n_gram = tuple(text[i:i+n])
            counts[n_gram] += 1
        return counts

    def _build_continuation_counts(self) -> Dict[str, int]:
        continuations = defaultdict(set)
        for n_gram in self.n_gram_counts:
            context, word = n_gram[:-1], n_gram[-1]
            continuations[word].add(context)
        return {word: len(contexts) for word, contexts in continuations.items()}

    def n_gram_probability(self, n_gram: Tuple[str, ...]) -> float:
        assert len(n_gram) == self.n

        context = n_gram[:-1]
        word = n_gram[-1]

        n_gram_count = self.n_gram_counts.get(n_gram, 0)
        context_gram_count = self.context_gram_counts.get(context, 0)

        if context_gram_count == 0:
            return self.lower_order_model.n_gram_probability(n_gram[1:])

        discounted_prob = max(n_gram_count - self.discount, 0) / context_gram_count

        # Kneser-Ney probability for lower-order model
        if isinstance(self.lower_order_model, DiscountBackoffModel):
            continuation_count = self.continuation_counts.get(word, 0)
            lower_order_prob = continuation_count / self.total_continuations
        else:
            lower_order_prob = self.lower_order_model.n_gram_probability(n_gram[1:])

        distinct_continuations = len([ngram for ngram in self.n_gram_counts if ngram[:-1] == context])
        alpha = (self.discount * distinct_continuations) / context_gram_count

        return discounted_prob + alpha * lower_order_prob

# class DiscountBackoffModel(NGramModel):
#     def __init__(self, train_text: List[str],
#                  lower_order_model: Union[NGramModel, "DiscountBackoffModel"],
#                  n: int = 2,
#                  delta: float = 0.9):
#         """We only use n>=2"""
#         assert n >= 2, n
#         super().__init__(train_text, n=n)
#         self.lower_order_model = lower_order_model
#         self.discount = delta

#         # YOUR CODE HERE
#         self.n_gram_counts = self._build_n_gram_counts(train_text, n)
#         self.context_gram_counts = self._build_n_gram_counts(train_text, n-1)

#         self.n_gram_counts_updated = defaultdict(int)
#         self.context_gram_counts_updated = defaultdict(int)
#         for i in range(n - 1, len(train_text)):
#             n_gram = tuple(train_text[i - n + 1: i + 1])
#             context = n_gram[:-1]
#             self.n_gram_counts_updated[n_gram] += 1
#             self.context_gram_counts_updated[context] += 1

#     def _build_n_gram_counts(self, text: List[str], n: int) -> Dict[Tuple[str, ...], int]:
#         counts = defaultdict(int)
#         for i in range(len(text)-n+1):
#           n_gram = tuple(text[i:i+n])
#           counts[n_gram] += 1
#         return counts

#     def n_gram_probability(self, n_gram: Tuple[str, ...]) -> float:
#         assert len(n_gram) == self.n

#         # YOUR CODE HERE
#         # back off to the lower_order model with n'=n-1 using its n_gram_probability function
#         context = n_gram[:-1]
#         word = n_gram[-1]

#         n_gram_count = self.n_gram_counts.get(n_gram, 0)
#         context_gram_count = self.context_gram_counts.get(context, 0)
#         if context_gram_count == 0:
#           return self.lower_order_model.n_gram_probability(n_gram[1:])

#         discounted_prob = max(n_gram_count - self.discount, 0) / context_gram_count
#         backoff_prob = self.lower_order_model.n_gram_probability(n_gram[1:])
#         distinct_continuations = len([ngram for ngram in self.n_gram_counts if ngram[:-1] == context])
#         alpha = (self.discount * distinct_continuations) / context_gram_count
#         return discounted_prob + alpha * backoff_prob


In [None]:
bigram_backoff_model = DiscountBackoffModel(tok_train_dataset, unigram_model, 2)

In [None]:
sample_n_gram = list(bigram_backoff_model.n_gram_counts.keys())[1]
print(sample_n_gram)
print(bigram_backoff_model.n_gram_counts[sample_n_gram])#, bigram_backoff_model.n_gram_counts_updated[sample_n_gram])
sample_context_gram = sample_n_gram[:-1]
print(sample_context_gram)
print(bigram_backoff_model.context_gram_counts[sample_context_gram])#, bigram_backoff_model.context_gram_counts_updated[sample_context_gram])
print(len([ngram for ngram in bigram_backoff_model.n_gram_counts if ngram[:-1] == sample_context_gram]))

('Valkyria', 'Chronicles')
36
('Valkyria',)
54
10


In [None]:
bigram_backoff_model = DiscountBackoffModel(tok_train_dataset, unigram_model, 2)
check_validity(bigram_backoff_model)
print('bigram backoff validation perplexity:', bigram_backoff_model.perplexity(tok_validation_dataset))

trigram_backoff_model = DiscountBackoffModel(tok_train_dataset, bigram_backoff_model, 3)
check_validity(trigram_backoff_model)
print('trigram backoff validation perplexity:', trigram_backoff_model.perplexity(tok_validation_dataset))

bigram backoff validation perplexity: 1096.2887206509392
trigram backoff validation perplexity: 1096.2748284683664


In [None]:
save_truncated_distribution(trigram_backoff_model, 'trigram_pbackoff_redictions.npy') # this might take a few minutes

  0%|          | 0/1000 [00:00<?, ?it/s]

saved trigram_pbackoff_redictions.npy


TODO: Report your trigram backoff model perplexity.

Trigram backoff validation perplexity: ***fill in here***

Free up RAM.

In [None]:
# Release models we don't need any more.
del unigram_model
del bigram_backoff_model
del trigram_backoff_model

### Neural N-gram Model

In this section, you will implement a neural version of an n-gram model.  The model will use a simple feedforward neural network that takes the previous `n-1` words and outputs a distribution over the next word.

You will use PyTorch to implement the model.  We've provided a little bit of code to help with the data loading using PyTorch's data loaders (https://pytorch.org/docs/stable/data.html)

A model with the following architecture and hyperparameters should reach a validation perplexity around/below **240**.
* embed the words with dimension 128, then flatten into a single embedding for $n-1$ words (with size $(n-1)*128$)
* run 2 hidden layers with 1024 hidden units, then project down to size 128 before the final layer (ie. 4 layers total).
* use weight tying for the embedding and final linear layer (this made a very large difference in our experiments); you can do this by creating the output layer with `nn.Linear`, then using `F.embedding` with the linear layer's `.weight` to embed the input
* rectified linear activation (ReLU) and dropout 0.1 after first 2 hidden layers. **Note: You will likely find a performance drop if you add a nonlinear activation function after the dimension reduction layer.**
* train for 10 epochs with the Adam optimizer (should take around 15-20 minutes)
* do early stopping based on validation set perplexity.


We encourage you to try other architectures and hyperparameters, and you will likely find some that work better than the ones listed above.  A proper implementation with these should be enough to receive full credit on the assignment, though.

In [None]:
class NeuralNgramDataset(torch.utils.data.Dataset):
    def __init__(self, text_token_ids: List[int], n: int):
        self.text_token_ids = text_token_ids
        self.n = n

    def __len__(self):
        return len(self.text_token_ids)

    def __getitem__(self, i: int):
        if i < self.n - 1:
            prev_token_ids = [vocab.str_to_id[vocab.eos_tok]] * (self.n - i - 1) + \
                              self.text_token_ids[:i]
        else:
            prev_token_ids = self.text_token_ids[i - self.n + 1 : i]

        assert len(prev_token_ids) == self.n - 1, prev_token_ids

        x = torch.tensor(prev_token_ids, dtype=torch.long)
        y = torch.tensor(self.text_token_ids[i], dtype=torch.long)
        return x, y

class NeuralNGramNetwork(nn.Module):
    # a PyTorch Module that holds the neural network for your model

    def __init__(
            self, n: int,
            embed_dim: int = 128,
            hidden_dim: int = 1024,
            dropout_rate: float = 0.1
        ):
        super().__init__()
        self.n = n

        # YOUR CODE HERE
        self.embed_dim = embed_dim
        self.hidden_dim = hidden_dim

        # Embedding layer
        self.embedding = nn.Embedding(vocab.size(), embed_dim)

        # Hidden layers
        self.hidden1 = nn.Linear((n-1) * embed_dim, hidden_dim)
        self.hidden2 = nn.Linear(hidden_dim, hidden_dim)
        self.hidden3 = nn.Linear(hidden_dim, embed_dim)

        # Output layer (tied with embedding)
        self.output = nn.Linear(embed_dim, vocab.size())
        self.output.weight = self.embedding.weight

        # Dropout
        self.dropout = nn.Dropout(dropout_rate)

        # ReLU activation
        self.relu = nn.ReLU()


    def forward(self, x):
        # x is a tensor of inputs with shape (batch, n-1)
        # this function returns a tensor of log probabilities with shape (batch, vocab_size)

        # YOUR CODE HERE
        # x shape: (batch, n-1)
        embedded = self.embedding(x)  # (batch, n-1, embed_dim)
        flattened = embedded.view(embedded.size(0), -1)  # (batch, (n-1)*embed_dim)

        hidden1 = self.relu(self.hidden1(flattened))
        hidden1 = self.dropout(hidden1)

        hidden2 = self.relu(self.hidden2(hidden1))
        hidden2 = self.dropout(hidden2)

        hidden3 = self.hidden3(hidden2)  # No ReLU here as per instructions

        output = self.output(hidden3)

        return F.log_softmax(output, dim=-1)



class NeuralNGramModel:
    # a class that wraps NeuralNGramNetwork to handle training and evaluation
    # it's ok if this doesn't work for unigram modeling
    def __init__(self, n: int, device: str = "cpu", **model_configs):
        self.n = n
        self.device = device

        if "cuda" in self.device:
            assert torch.cuda.is_available(), "no GPU found, in Co4lab go to 'Edit->Notebook settings' and choose a GPU hardware accelerator"

        self.network = NeuralNGramNetwork(n, **model_configs).to(self.device)

    def train(
        self,
        n_epoch: int = 10, lr: float = 0.001, batch_size: int = 128
    ):
        train_dataset = NeuralNgramDataset(vocab.strs_to_ids(tok_train_dataset), self.n)
        train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

        # iterating over train_dataloader with a for loop will return a 2-tuple of batched tensors
        # the first tensor will be previous token ids with size (batch, n-1),
        # and the second will be the current token id with size (batch, )
        # you will need to move these tensors to GPU, e.g. by using the Tensor.to() function.

        # this will take some time to run; use tqdm.notebook.tqdm to get a progress bar

        # YOUR CODE HERE
        val_dataset = NeuralNgramDataset(vocab.strs_to_ids(tok_validation_dataset), self.n)
        val_dataloader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size)

        optimizer = torch.optim.Adam(self.network.parameters(), lr=lr)
        scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)
        criterion = nn.NLLLoss()

        best_val_perplexity = float('inf')
        best_model = None

        for epoch in tqdm.notebook.tqdm(range(n_epoch)):
            self.network.train()
            for x, y in tqdm.notebook.tqdm(train_dataloader):
                x, y = x.to(self.device), y.to(self.device)
                optimizer.zero_grad()
                output = self.network(x)
                loss = criterion(output, y)
                loss.backward()
                optimizer.step()

            val_perplexity = self.perplexity(tok_validation_dataset)
            print(f"Epoch {epoch+1}/{n_epoch}, Validation Perplexity: {val_perplexity:.2f}")

            # if val_perplexity < best_val_perplexity:
            #     best_val_perplexity = val_perplexity
            #     best_model = copy.deepcopy(self.network.state_dict())
            # else:
            #     print("Early stopping")
            #     break

        #self.network.load_state_dict(best_model)

    def next_word_probabilities(self, text_prefix: List[str]) -> List[float]:
        # YOUR CODE HERE
        # Don't forget self.network.eval().
        # You will need to convert text_prefix from strings to numbers with the `vocab.strs_to_ids` function.
        # If your `perplexity` function below is based on a NeuralNgramDataset DataLoader, you will need to use the same strategy for prefixes with less than n-1 tokens to pass the validity check.
        # The data loader appends extra "<eos>" (end of sentence) tokens to the start of the input so there are always enough to run the network
        self.network.eval()
        with torch.no_grad():
            ids = vocab.strs_to_ids(text_prefix)
            if len(ids) < self.n - 1:
                ids = [vocab.str_to_id[vocab.eos_tok]] * (self.n - 1 - len(ids)) + ids
            else:
                ids = ids[-(self.n-1):]
            x = torch.tensor(ids, dtype=torch.long).unsqueeze(0).to(self.device)
            log_probs = self.network(x)
            return torch.exp(log_probs).squeeze(0).tolist()

    def perplexity(self, text: List[str]) -> float:
        # You may want to use a DataLoader here with a NeuralNgramDataset
        # Don't forget self.network.eval()

        # YOUR CODE HERE
        self.network.eval()
        dataset = NeuralNgramDataset(vocab.strs_to_ids(text), self.n)
        dataloader = torch.utils.data.DataLoader(dataset, batch_size=128)

        total_loss = 0
        total_tokens = 0

        with torch.no_grad():
            for x, y in dataloader:
                x, y = x.to(self.device), y.to(self.device)
                log_probs = self.network(x)
                total_loss += F.nll_loss(log_probs, y, reduction='sum').item()
                total_tokens += y.numel()

        return torch.exp(torch.tensor(total_loss / total_tokens)).item()

In [None]:
# it's probabily better to first debug with cpu so you don't waste the limited GPU time.
# then you use device="cuda" to train on GPU.
neural_trigram_model = NeuralNGramModel(3, device="cuda")
check_validity(neural_trigram_model)
neural_trigram_model.train(lr=5e-4)
print('neural trigram validation perplexity:', neural_trigram_model.perplexity(tok_validation_dataset))

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/16217 [00:00<?, ?it/s]

Epoch 1/10, Validation Perplexity: 449.42


  0%|          | 0/16217 [00:00<?, ?it/s]

Epoch 2/10, Validation Perplexity: 378.88


  0%|          | 0/16217 [00:00<?, ?it/s]

Epoch 3/10, Validation Perplexity: 345.40


  0%|          | 0/16217 [00:00<?, ?it/s]

Epoch 4/10, Validation Perplexity: 330.36


  0%|          | 0/16217 [00:00<?, ?it/s]

Epoch 5/10, Validation Perplexity: 319.79


  0%|          | 0/16217 [00:00<?, ?it/s]

Epoch 6/10, Validation Perplexity: 315.43


  0%|          | 0/16217 [00:00<?, ?it/s]

Epoch 7/10, Validation Perplexity: 311.22


  0%|          | 0/16217 [00:00<?, ?it/s]

Epoch 8/10, Validation Perplexity: 305.97


  0%|          | 0/16217 [00:00<?, ?it/s]

Epoch 9/10, Validation Perplexity: 305.55


  0%|          | 0/16217 [00:00<?, ?it/s]

Epoch 10/10, Validation Perplexity: 301.97
neural trigram validation perplexity: 301.97100830078125


In [None]:
save_truncated_distribution(neural_trigram_model, 'neural_trigram_predictions.npy', short=False)

  0%|          | 0/5000 [00:00<?, ?it/s]

saved neural_trigram_predictions.npy


TODO: Fill in your neural trigram perplexity in the report.

<!-- Do not remove this comment, it is used by the autograder: RqYJKsoTS6 -->

Neural trigram validation perplexity: ***fill in here***

Free up RAM.

In [None]:
# Delete model we don't need.
del neural_trigram_model

### LSTM Model

For this stage of the project, you will implement an LSTM language model.

For recurrent language modeling, the data batching strategy is a bit different from what is used in some other tasks.  Sentences are concatenated together so that one sentence starts right after the other, and an unfinished sentence will be continued in the next batch.
To properly deal with this input format, you should **save the last state of the LSTM from a batch to feed in as the first state of the next batch**.  When you save state across different batches, you should call `.detach()` on the state tensors before the next batch to tell PyTorch not to backpropagate gradients through the state into the batch you have already finished (which will cause a runtime error).

We expect your model to reach a validation perplexity around/below **214**.
The following architecture and hyperparameters should be sufficient to get there.
* 3 LSTM layers with 512 units
* dropout of 0.5 after each LSTM layer
* instead of projecting directly from the last LSTM output to the vocabulary size for softmax, project down to a smaller size first (e.g. 512->128->vocab_size). **NOTE: You may find that adding nonlinearities between these layers can hurt performance, try without first.**
* use the same weights for the embedding layer and the pre-softmax layer; dimension 128
* train with Adam (using default learning rates) for at least 20 epochs


In [6]:
# ref: https://github.com/pytorch/text/blob/0.5.0/torchtext/data/iterator.py#L173

class LstmDataIterator:
    def __init__(self, dataset: List[int], batch_size: int = 64, seq_len: int = 32, device: str = "cpu"):
        self.batch_size = batch_size
        self.seq_len = seq_len
        self.device = device

        # pad the dataset so that it is divisible by batch_size
        dataset = dataset + [vocab.str_to_id[vocab.pad_tok]] * (math.ceil(len(dataset) / batch_size) * batch_size - len(dataset))

        self.n_samples = math.ceil(
            (len(dataset) // batch_size - 1) / seq_len
        )

        dataset = torch.tensor(dataset, dtype=torch.long)
        self.dataset = dataset.view(batch_size, -1).t().contiguous()

    def __len__(self):
        return self.n_samples

    def __getitem__(self, i: int):
        start = i * self.seq_len
        end = min(start + self.seq_len, self.dataset.shape[0] - 1)

        inputs = self.dataset[start : end]
        outputs = self.dataset[start + 1 : end + 1]
        assert inputs.shape == outputs.shape, f"{i}: {inputs.shape} {outputs.shape}"
        # (seq_len, batch_size)
        return inputs.to(self.device), outputs.to(self.device)

In [48]:
import torch
from torch import nn
import torch.nn.functional as F

class LSTMNetwork(nn.Module):
    # a PyTorch Module that holds the neural network for your model

    def __init__(self, embed_dim: int = 128, n_layer: int = 3, hidden_dim: int = 512, dropout_rate: float = 0.5, embedding_type: str = "default"):
        super().__init__()

        # YOUR CODE HERE

        self.vocab_size = vocab.size()
        self.embed_dim = embed_dim
        self.hidden_dim = hidden_dim
        self.n_layer = n_layer

        # Embedding layer
        self.embedding = nn.Embedding(self.vocab_size, embed_dim)
        if embedding_type == "kaiming_normal":
          torch.nn.init.kaiming_normal_(self.embedding.weight)
        elif embedding_type == "xavier_normal":
          torch.nn.init.xavier_normal_(self.embedding.weight)

        # LSTM layers
        self.lstm = nn.LSTM(embed_dim, hidden_dim, n_layer, dropout=dropout_rate)

        # Output layers
        self.fc1 = nn.Linear(hidden_dim, embed_dim)
        self.fc2 = nn.Linear(embed_dim, self.vocab_size)

        # Tie weights
        self.fc2.weight = self.embedding.weight

    def forward(self, x: torch.Tensor, state: Optional[Tuple[torch.Tensor, torch.Tensor]]=None):
        """Compute the output of the network.

        Note: In the Pytorch LSTM tutorial, the state variable is named "hidden":
        https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html

        The torch.nn.LSTM documentation is quite helpful:
        https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

        x - a tensor of int64 inputs with shape (seq_len, batch)
        state - a tuple of two tensors with shape (num_layers, batch, hidden_size)
                representing the hidden state and cell state of the of the LSTM.
        returns a tuple with two elements:
          - a tensor of log probabilities with shape (seq_len, batch, vocab_size)
          - a state tuple returned by applying the LSTM.
        """

        # Note that the nn.LSTM module expects inputs with the sequence
        # dimension before the batch by default.
        # In this case the dimensions are already in the right order,
        # but watch out for this since sometimes people put the batch first

        # YOUR CODE HERE
        # Embed the input
        embedded = self.embedding(x)

        # Pass through LSTM
        output, state = self.lstm(embedded, state)

        # Project down to embed_dim
        output = self.fc1(output)

        # Project to vocab_size
        logits = self.fc2(output)

        # Compute log probabilities
        log_probs = F.log_softmax(logits, dim=-1)

        return log_probs, state


class LSTMModel:
    "A class that wraps LSTMNetwork to handle training and evaluation."

    def __init__(self, device: str = "cpu", model: str = "default", **model_configs):
        self.device = device
        if "cuda" in self.device:
            assert torch.cuda.is_available(), "no GPU found, in Colab go to 'Edit->Notebook settings' and choose a GPU hardware accelerator"

        self.params = model_configs
        self.model_name = model
        if model == "regularized_full":
          self.network = RegularizedLSTM(**model_configs).to(self.device)
        else:
          self.network = LSTMNetwork(**model_configs).to(self.device)

    def train(
        self,
        n_epoch: int = 20, lr: float = 1e-3, batch_size: int = 64, seq_len: int = 32, patience: int = 3,
    ):
        # You can fetch the data by
        #     inputs, targets = train_data_iter[i]
        #train_data_iter = LstmDataIterator(vocab.strs_to_ids(tok_train_dataset), batch_size, seq_len, self.device)

        # The initial state passed into the LSTM should be set to zero.
        # Also note that, we don't need to compute loss on <PAD>. You can achieve
        #   this by `ignore_index=index of <PAD>` when using NLLLoss.

        # YOUR CODE HERE
        train_data_iter = LstmDataIterator(vocab.strs_to_ids(tok_train_dataset), batch_size, seq_len, self.device)

        optimizer = torch.optim.Adam(self.network.parameters(), lr=lr)
        criterion = torch.nn.NLLLoss(ignore_index=vocab.str_to_id[vocab.pad_tok])


        #best_params = {}
        #best_model = None
        #scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.9)
        WARMUP_EPOCHS=5
        WARMUP_STEPS=WARMUP_EPOCHS*len(train_data_iter)
        NORMAL_STEPS=(20)*len(train_data_iter)
        # Warmup scheduler
        #warmup_scheduler = torch.optim.lr_scheduler.LinearLR(optimizer, start_factor=lr, total_iters=WARMUP_STEPS)
        # Decay scheduler
        #decay_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=NORMAL_STEPS,)

        # Combine schedulers
        # scheduler = torch.optim.lr_scheduler.SequentialLR(optimizer,
        #                  schedulers=[warmup_scheduler, decay_scheduler],
        #                  milestones=[WARMUP_STEPS])
        #scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=lr, steps_per_epoch=len(train_data_iter), epochs=20)
        #scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.95)

        best_perplex = float('inf')
        patience_count = 0
        for epoch in tqdm.notebook.tqdm(range(n_epoch)):
            total_loss = 0
            state = None

            for i in tqdm.notebook.tqdm(range(len(train_data_iter))):
                inputs, targets = train_data_iter[i]

                if state is not None:
                    state = (state[0].detach(), state[1].detach())

                optimizer.zero_grad()
                log_probs, state = self.network(inputs, state)
                loss = criterion(log_probs.view(-1, vocab.size()), targets.view(-1))
                loss.backward()
                optimizer.step()

                total_loss += loss.item()

            #scheduler.step()

            # Print learning rate for this epoch
            #print(f"Epoch {epoch+1}, LR: {scheduler.get_last_lr()[0]}")

            print(f"Epoch {epoch+1}/{n_epoch}, Loss: {total_loss/len(train_data_iter):.4f}")
            val_perplex = self.dataset_perplexity(tok_validation_dataset)
            print('lstm validation perplexity:', val_perplex)
            print(f"lstm validation loss: {math.log(val_perplex)}")

            # early stopping
            if val_perplex < best_perplex:
              best_perplex = val_perplex
              regularizations_names = '-'.join(self.params['regularizations']) if 'regularizations' in self.params else "no_regularizations"
              save_truncated_distribution(self, f"lstm_{self.model_name}_{regularizations_names}_predictions_updated_{self.params}.npy", short=False)
            elif patience_count < patience:
              patience_count += 1
            else:
              break

    def next_word_probabilities(self, text_prefix: List[str]):
        "Return a list of probabilities for each word in the vocabulary."
        """
        For now, we assume it's not gonna be longer than training seq_len, so
          you can all the network only once.
        """

        # (seq, bsz=1)
        ids_prefix = torch.tensor(vocab.strs_to_ids(text_prefix), dtype=torch.long, device=self.device).view(-1, 1)

        # YOUR CODE HERE
        with torch.no_grad():
            log_probs, _ = self.network(ids_prefix)
            probs = torch.exp(log_probs[-1, 0])

        return probs.tolist()


    def dataset_perplexity(self, dataset: List[str], batch_size: int = 64, seq_len: int = 32):
        "Return perplexity as a float."
        # Your code should be very similar to next_word_probabilities, but
        # run in a loop over batches. Use torch.no_grad() for extra speed.

        data_iterator = LstmDataIterator(vocab.strs_to_ids(dataset), batch_size, seq_len, self.device)
        # YOUR CODE HERE
        total_loss = 0
        total_tokens = 0

        with torch.no_grad():
            state = None
            for i in range(len(data_iterator)):
                inputs, targets = data_iterator[i]

                if state is not None:
                    state = (state[0].detach(), state[1].detach())

                log_probs, state = self.network(inputs, state)
                loss = F.nll_loss(log_probs.view(-1, vocab.size()), targets.view(-1),
                                  ignore_index=vocab.str_to_id[vocab.pad_tok], reduction='sum')

                total_loss += loss.item()
                total_tokens += (targets != vocab.str_to_id[vocab.pad_tok]).sum().item()

        perplexity = math.exp(total_loss / total_tokens)
        return perplexity

In [8]:
lstm_model = LSTMModel(device="cuda")
lstm_model.train()

print('lstm validation perplexity:', lstm_model.dataset_perplexity(tok_validation_dataset))

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.9877
lstm validation perplexity: 493.8419265757367
lstm validation loss: 6.202215479292466


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.1836
lstm validation perplexity: 365.52338156438117
lstm validation loss: 5.901330248592123


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.8788
lstm validation perplexity: 297.75155790637615
lstm validation loss: 5.696259440484949


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.6506
lstm validation perplexity: 259.4775855626609
lstm validation loss: 5.558670323394291


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.4710
lstm validation perplexity: 234.3162914773356
lstm validation loss: 5.45667187565019


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.3241
lstm validation perplexity: 217.31881595111676
lstm validation loss: 5.381365473256538


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.2038
lstm validation perplexity: 204.28603969121065
lstm validation loss: 5.319521167091409


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 5.1023
lstm validation perplexity: 194.12349683810675
lstm validation loss: 5.268494538170909


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 5.0138
lstm validation perplexity: 188.08424978631277
lstm validation loss: 5.236889999607719


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.9376
lstm validation perplexity: 184.11315061465714
lstm validation loss: 5.215550517598037


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.8694
lstm validation perplexity: 181.14199508177705
lstm validation loss: 5.199281226918262


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.8068
lstm validation perplexity: 177.87183586983787
lstm validation loss: 5.181063267747725


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.7511
lstm validation perplexity: 176.68360046508806
lstm validation loss: 5.174360564973756


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.6991
lstm validation perplexity: 173.60788326534592
lstm validation loss: 5.156799211709889


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.6508
lstm validation perplexity: 171.9064226444208
lstm validation loss: 5.146950274369258


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 4.6071
lstm validation perplexity: 173.3254974397811
lstm validation loss: 5.155171314807307


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 4.5636
lstm validation perplexity: 171.66829683898717
lstm validation loss: 5.145564108099978


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 4.5250
lstm validation perplexity: 170.9048083473437
lstm validation loss: 5.141106725169541


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 4.4875
lstm validation perplexity: 172.17597430198282
lstm validation loss: 5.148517060203674


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 4.4525
lstm validation perplexity: 171.3723726723221
lstm validation loss: 5.143838806895942
lstm validation perplexity: 171.6078885812351


In [9]:
save_truncated_distribution(lstm_model, 'lstm_predictions.npy', short=False)

  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_predictions.npy


TODO: Report your LSTM perplexity.

LSTM validation perplexity: ***fill in here***

# Experimentation: 1-Page Report

Now it's time for you to experiment.  Try to reach a validation perplexity below 120. You may either modify the LSTM class above, or copy it down to the code cell below and modify it there. Just **be sure to run code cell below to generate results with your improved LSTM**.  

It is okay if the bulk of your improvements are due to hyperparameter tuning (such as changing number or sizes of layers), but implement at least one more substantial change to the model.  Here are some ideas (several of which come from https://arxiv.org/pdf/1708.02182.pdf):
* activation regularization - add a l2 regularization penalty on the activation of the LSTM output (standard l2 regularization is on the weights)
* weight-drop regularization - apply dropout to the weight matrices instead of activations
* learning rate scheduling - decrease the learning rate during training
* embedding dropout - zero out the entire embedding for a random set of words in the embedding matrix
* ensembling - average the predictions of several models trained with different initialization random seeds
* temporal activation regularization - add l2 regularization on the difference between the LSTM output activations at adjacent timesteps

You may notice that most of these suggestions are regularization techniques.  This dataset is considered fairly small, so regularization is one of the best ways to improve performance.

TODO: In the report, submit a write-up describing the extensions and/or modifications that you tried.  Your description should be **1-page maximum** in length.
For full credit, your write-up should include:
1.   A concise and precise description of the extension that you tried.
2.   A motivation for why you believed this approach might improve your model.
3.   A discussion of whether the extension was effective and/or an analysis of the results.  This will generally involve some combination of tables, learning curves, etc.
4.   A bottom-line summary of your results comparing validation perplexities of your improvement to the original LSTM.


Run the cell below in order to train your improved LSTM and evaluate it.  

In [None]:
# testing both initilaization methods against default
lstm_model_init_kaiming = LSTMModel(device="cuda", embedding_type="kaiming_normal")
lstm_model_init_kaiming.train()

print('lstm init validation perplexity kaiming_normal:', lstm_model_init_kaiming.dataset_perplexity(tok_validation_dataset))

lstm_model_init_xavier = LSTMModel(device="cuda", embedding_type="xavier_normal")
lstm_model_init_xavier.train()

print('lstm init validation perplexity xavier_normal:', lstm_model_init_xavier.dataset_perplexity(tok_validation_dataset))
#save_truncated_distribution(lstm_model_init, 'lstm_predictions_embeddings.npy', short=False)

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.5050


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 5.5517


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.1614


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 4.9058


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 4.7147


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 4.5621


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 4.4333


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 4.3241


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 4.2279


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.1446


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.0677


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 3.9978


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 3.9351


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 3.8765


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 3.8239


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 3.7745


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 3.7286


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 3.6866


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 3.6473


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 3.6120
lstm init validation perplexity kaiming_normal: 212.7869612261212


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 7.2749


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 7.0807


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 6.7403


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.8913


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.4231


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.1265


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 4.9165


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 4.7471


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 4.6057


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.4825


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.3764


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.2831


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.1990


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.1250


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.0570


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 3.9937


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 3.9379


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 3.8851


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 3.8369


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 3.7944
lstm init validation perplexity xavier_normal: 207.88362006599726


In [15]:
class WeightDropLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, dropout=0.5, weight_dropout=0.5):
        super(WeightDropLSTM, self).__init__()
        self.weight_dropout = weight_dropout
        # Original LSTM without dropout on weights
        self.lstm = torch.nn.LSTM(input_size, hidden_size, num_layers, dropout=dropout)
        # Store the original weights
        self._setup_weight_drop()

        # Apply weight drop to hidden-to-hidden weights
        for name, param in self.lstm.named_parameters():
            if 'weight_hh' in name:
                self.register_parameter(name + '_raw', nn.Parameter(param.data))
                param.requires_grad = False

    def _setup_weight_drop(self):
        """Modify the LSTM weights to include dropout only on the recurrent weights."""
        # Iterate over LSTM parameters and find recurrent weights
        for name_w in ['weight_hh_l0', 'weight_hh_l0_reverse']:
            if hasattr(self.lstm, name_w):
                # Save original weights
                raw_w = getattr(self.lstm, name_w).data
                # Register original weight as a persistent buffer
                self.register_parameter(f'{name_w}_raw', nn.Parameter(raw_w))

                # Replace the original weight with the dropped version during training
                setattr(self.lstm, name_w, nn.Parameter(self._weight_drop(getattr(self, f'{name_w}_raw'))))

    def _weight_drop(self, weights):
        """Apply dropout to the recurrent weights during training."""
        if self.training:
            return F.dropout(weights, p=self.weight_dropout, training=True)
        else:
            return weights

    def forward(self, x, hidden=None):
        """Forward pass of the LSTM with weight drop on recurrent weights."""
        # Apply weight dropout
        self._setup_weight_drop()
        output, hidden = self.lstm(x, hidden)
        return output, hidden

class VariationalDropout(nn.Module):
    def __init__(self, dropout=0.5):
        super(VariationalDropout, self).__init__()
        self.dropout = dropout

    def forward(self, x):
        if not self.training or self.dropout == 0:
            return x

        mask = x.new_empty(x.size(0), 1, x.size(2)).bernoulli_(1 - self.dropout)
        mask = mask.div_(1 - self.dropout)
        return mask * x

class EmbeddingDropout(nn.Module):
    def __init__(self, embed, dropout=0.1):
        super(EmbeddingDropout, self).__init__()
        self.embed = embed
        self.dropout = dropout

    def forward(self, words):
        if not self.training or self.dropout == 0:
            return self.embed(words)

        mask = self.embed.weight.data.new().resize_((self.embed.weight.size(0), 1))
        mask.bernoulli_(1 - self.dropout)
        mask = mask.expand_as(self.embed.weight) / (1 - self.dropout)
        masked_embed_weight = mask * self.embed.weight

        padding_idx = self.embed.padding_idx
        if padding_idx is not None:
            masked_embed_weight[padding_idx] = 0

        return F.embedding(words, masked_embed_weight)

class RegularizedLSTM(nn.Module):
    def __init__(self, embed_dim: int = 128, n_layer: int = 3, hidden_dim: int = 512, dropout_rate: float = 0.5, embedding_type: str = "default", regularizations: List[str] = []):
        super(RegularizedLSTM, self).__init__()
        print("using regularized lstm")
        if "embedding_dropout" in regularizations:
          print("using embedding dropout")
          self.embedding = EmbeddingDropout(nn.Embedding(vocab.size(), embed_dim), dropout=0.1)
        else:
          self.embedding = nn.Embedding(vocab.size(), embed_dim)
        if "weight_drop_lstm" in regularizations:
          print("using weight drop")
          self.lstm = WeightDropLSTM(embed_dim, hidden_dim, n_layer, dropout=dropout_rate)
        else:
          self.lstm = nn.LSTM(embed_dim, hidden_dim, n_layer, dropout=dropout_rate)

        self.fc1 = torch.nn.Linear(hidden_dim, embed_dim)
        self.fc2 = torch.nn.Linear(embed_dim, vocab.size())

        # Tie weights
        if "embedding_dropout" in regularizations:
          self.fc2.weight = self.embedding.embed.weight
        else:
          self.fc2.weight = self.embedding.weight

    def forward(self, x, state: Optional[Tuple[torch.Tensor, torch.Tensor]]=None):
        emb = self.embedding(x)
        output, state = self.lstm(emb, state)
        #output = self.dropout(output)
        # Project down to embed_dim
        output = self.fc1(output)

        # Project to vocab_size
        logits = self.fc2(output)
        # Compute log probabilities
        log_probs = F.log_softmax(logits, dim=-1)
        return log_probs, state

In [36]:
from itertools import product

def optimize_parameters(param_grid = {
        'model_type': ['default'],
        'lr': [1e-2, 5e-3, 1e-3],
        'embed_dim': [128, 512],
        'n_layer': [3],
        'hidden_dim': [256, 1024], #[256, 512, 1024],
        'dropout_rate': [0.5] #[0.3, 0.5, 0.7]
    }):

    best_perplex = float('inf')
    best_params = {}
    best_model = None

    for params in product(*param_grid.values()):
        current_params = dict(zip(param_grid.keys(), params))
        #embed_dim: int = 128, n_layer: int = 3, hidden_dim: int = 512, dropout_rate: float = 0.5,
        model = LSTMModel(device="cuda", model=current_params["model_type"], embedding_type="default", embed_dim=current_params['embed_dim'], hidden_dim=current_params['hidden_dim'],n_layer=current_params['n_layer'], dropout_rate=current_params['dropout_rate'], regularizations=current_params['regularizations'])

        print(f"{current_params['model_type']} lstm for {model}, default, {current_params}")
        model.train(lr=current_params['lr'])
        val_perplex = model.dataset_perplexity(tok_validation_dataset)
        print(f"{current_params['model_type']} lstm validation perplexity for {model}, default, {current_params} = {val_perplex}")

        if val_perplex < best_perplex:
            best_perplex = val_perplex
            best_params = current_params
            best_model = model

    return best_params, best_model, best_perplex

def optimize_parameters_list(param_list = []):

    best_perplex = float('inf')
    best_params = {}
    best_model = None

    for current_params in param_list:
        #embed_dim: int = 128, n_layer: int = 3, hidden_dim: int = 512, dropout_rate: float = 0.5,
        model = LSTMModel(device="cuda", model=current_params["model_type"], embedding_type="default", embed_dim=current_params['embed_dim'], hidden_dim=current_params['hidden_dim'],n_layer=current_params['n_layer'], dropout_rate=current_params['dropout_rate'], regularizations=current_params['regularizations'])

        print(f"{current_params['model_type']} lstm for {model}, default, {current_params}")
        model.train(lr=current_params['lr'], patience=0)
        val_perplex = model.dataset_perplexity(tok_validation_dataset)
        print(f"{current_params['model_type']} lstm validation perplexity for {model}, default, {current_params} = {val_perplex}")

        if val_perplex < best_perplex:
            best_perplex = val_perplex
            best_params = current_params
            best_model = model

    return best_params, best_model, best_perplex

In [20]:
param_grid = {
        'model_type': ['default', 'regularized_full'],
        'lr': [1e-2, 5e-3, 1e-3],
        'embed_dim': [128, 512],
        'n_layer': [3],
        'hidden_dim': [256, 1024], #[256, 512, 1024],
        'dropout_rate': [0.5] #[0.3, 0.5, 0.7]
    }
for params in product(*param_grid.values()):
  current_params = dict(zip(param_grid.keys(), params))
  print(current_params)

{'model_type': 'default', 'lr': 0.01, 'embed_dim': 128, 'n_layer': 3, 'hidden_dim': 256, 'dropout_rate': 0.5}
{'model_type': 'default', 'lr': 0.01, 'embed_dim': 128, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.5}
{'model_type': 'default', 'lr': 0.01, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 256, 'dropout_rate': 0.5}
{'model_type': 'default', 'lr': 0.01, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.5}
{'model_type': 'default', 'lr': 0.005, 'embed_dim': 128, 'n_layer': 3, 'hidden_dim': 256, 'dropout_rate': 0.5}
{'model_type': 'default', 'lr': 0.005, 'embed_dim': 128, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.5}
{'model_type': 'default', 'lr': 0.005, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 256, 'dropout_rate': 0.5}
{'model_type': 'default', 'lr': 0.005, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.5}
{'model_type': 'default', 'lr': 0.001, 'embed_dim': 128, 'n_layer': 3, 'hidden_dim': 256, 'dropout_rate': 0.5}
{

Default model LR tuning with LambdaLR scheduler and Early Stopping


In [None]:
param_grid = {
        'model_type': ['default'],
        'lr': [1e-3],
        'embed_dim': [128],
        'n_layer': [3],
        'hidden_dim': [256], #[256, 512, 1024],
        'dropout_rate': [0.3, 0.5, 0.7] #[0.3, 0.5, 0.7]
    }
best_params, best_model, best_perplex = optimize_parameters(param_grid=param_grid)
#save_truncated_distribution(best_model, 'lstm_predictions_embeddings_best_new_architecture.npy', short=False)

default lstm for <__main__.LSTMModel object at 0x780fa76267d0>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 128, 'n_layer': 3, 'hidden_dim': 256, 'dropout_rate': 0.3}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.9956
lstm validation perplexity: 506.7920107710479
lstm validation loss: 6.228100684256131


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.2231
lstm validation perplexity: 379.9006282863549
lstm validation loss: 5.939909714012573


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.9365
lstm validation perplexity: 316.9639826972657
lstm validation loss: 5.758788148170597


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.7244
lstm validation perplexity: 275.7281510864161
lstm validation loss: 5.6194154204936355


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.5515
lstm validation perplexity: 248.59295609162078
lstm validation loss: 5.515816844369558


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.4096
lstm validation perplexity: 230.6528627368387
lstm validation loss: 5.440913821532437


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.2939
lstm validation perplexity: 217.1224782254547
lstm validation loss: 5.380461610103285


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 5.1962
lstm validation perplexity: 207.41142034189124
lstm validation loss: 5.3347043586087155


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 5.1132
lstm validation perplexity: 200.51475427777658
lstm validation loss: 5.3008878314595345


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 5.0401
lstm validation perplexity: 194.25628100395193
lstm validation loss: 5.269178323365683


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.9757
lstm validation perplexity: 189.50550907840113
lstm validation loss: 5.2444180957537405


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.9199
lstm validation perplexity: 186.05145932819923
lstm validation loss: 5.22602329850387


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.8684
lstm validation perplexity: 183.17514695705628
lstm validation loss: 5.210442782322525


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.8229
lstm validation perplexity: 180.3828543509583
lstm validation loss: 5.195081560711255


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.7799
lstm validation perplexity: 178.8639750129845
lstm validation loss: 5.186625600829555


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 4.7417
lstm validation perplexity: 177.37705547482426
lstm validation loss: 5.178277723713693


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 4.7047
lstm validation perplexity: 177.4091958643034
lstm validation loss: 5.178458905422092


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 4.6701
lstm validation perplexity: 175.81551856761646
lstm validation loss: 5.16943525534615


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 4.6374
lstm validation perplexity: 175.38104159699986
lstm validation loss: 5.166960987420127


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 4.6080
lstm validation perplexity: 175.99257918481584
lstm validation loss: 5.1704418304266
default lstm validation perplexity for <__main__.LSTMModel object at 0x780fa76267d0>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 128, 'n_layer': 3, 'hidden_dim': 256, 'dropout_rate': 0.3} = 175.74593687906366
default lstm for <__main__.LSTMModel object at 0x780fa7626800>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 128, 'n_layer': 3, 'hidden_dim': 256, 'dropout_rate': 0.5}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 7.0365
lstm validation perplexity: 538.6764663870002
lstm validation loss: 6.289115142797573


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.3014
lstm validation perplexity: 405.70587968640774
lstm validation loss: 6.005628462802497


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 6.0248
lstm validation perplexity: 336.04540320326595
lstm validation loss: 5.817246279415309


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.8216
lstm validation perplexity: 293.69910930172017
lstm validation loss: 5.682555805519418


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.6572
lstm validation perplexity: 265.4758200608216
lstm validation loss: 5.581523763317332


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.5240
lstm validation perplexity: 246.40704029387348
lstm validation loss: 5.506984803872415


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.4151
lstm validation perplexity: 230.60651213898484
lstm validation loss: 5.44071284741447


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 5.3245
lstm validation perplexity: 219.67910668110756
lstm validation loss: 5.392167875556129


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 5.2453
lstm validation perplexity: 211.64036248376797
lstm validation loss: 5.354888430771981


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 5.1785
lstm validation perplexity: 203.64534040698604
lstm validation loss: 5.3163799534348515


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 5.1182
lstm validation perplexity: 198.39936364388532
lstm validation loss: 5.2902819874055345


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 5.0645
lstm validation perplexity: 193.64026787990014
lstm validation loss: 5.266002148461882


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 5.0164
lstm validation perplexity: 190.37264561634214
lstm validation loss: 5.248983444054636


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.9718
lstm validation perplexity: 186.20292637540257
lstm validation loss: 5.226837081021307


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.9320
lstm validation perplexity: 182.9228149043634
lstm validation loss: 5.209064287389815


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 4.8939
lstm validation perplexity: 180.73989688878078
lstm validation loss: 5.1970589639789075


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 4.8603
lstm validation perplexity: 178.43089832412298
lstm validation loss: 5.184201402033158


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 4.8262
lstm validation perplexity: 176.27435603661706
lstm validation loss: 5.172041622423659


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 4.7960
lstm validation perplexity: 175.74862061555154
lstm validation loss: 5.169054682099812


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 4.7668
lstm validation perplexity: 174.39356341353817
lstm validation loss: 5.161314603761333
default lstm validation perplexity for <__main__.LSTMModel object at 0x780fa7626800>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 128, 'n_layer': 3, 'hidden_dim': 256, 'dropout_rate': 0.5} = 174.1812908587501
default lstm for <__main__.LSTMModel object at 0x780fe9c4e800>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 128, 'n_layer': 3, 'hidden_dim': 256, 'dropout_rate': 0.7}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 7.1216
lstm validation perplexity: 596.9011080815208
lstm validation loss: 6.391751451566784


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.4179
lstm validation perplexity: 456.08195497755344
lstm validation loss: 6.1226725191936024


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 6.1626
lstm validation perplexity: 380.6596144626114
lstm validation loss: 5.941905575446146


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.9731
lstm validation perplexity: 332.1845592335587
lstm validation loss: 5.805690715766496


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.8208
lstm validation perplexity: 303.0556003310266
lstm validation loss: 5.713916288117738


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.6944
lstm validation perplexity: 278.92866185996724
lstm validation loss: 5.630956056868345


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.5926
lstm validation perplexity: 262.06431502065755
lstm validation loss: 5.568589950814586


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 5.5037
lstm validation perplexity: 248.81361175265346
lstm validation loss: 5.516704068985282


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 5.4289
lstm validation perplexity: 240.70425603338813
lstm validation loss: 5.483569026532945


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 5.3639
lstm validation perplexity: 230.5506597095533
lstm validation loss: 5.440470620108405


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 5.3062
lstm validation perplexity: 222.50725590692878
lstm validation loss: 5.404959711892231


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 5.2550
lstm validation perplexity: 216.8142663984273
lstm validation loss: 5.379041071821265


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 5.2067
lstm validation perplexity: 212.17118871056815
lstm validation loss: 5.357393442741975


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 5.1647
lstm validation perplexity: 210.97152110917696
lstm validation loss: 5.3517231533201


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 5.1260
lstm validation perplexity: 205.54358994900483
lstm validation loss: 5.325658127982926


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 5.0880
lstm validation perplexity: 203.04288168457944
lstm validation loss: 5.313417196559311


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 5.0558
lstm validation perplexity: 199.7732346390963
lstm validation loss: 5.297182896475625


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 5.0227
lstm validation perplexity: 198.1286601054691
lstm validation loss: 5.288916618179364


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 4.9925
lstm validation perplexity: 196.3059531410905
lstm validation loss: 5.279674427572902


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 4.9642
lstm validation perplexity: 192.58446957825652
lstm validation loss: 5.260534860485885
default lstm validation perplexity for <__main__.LSTMModel object at 0x780fe9c4e800>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 128, 'n_layer': 3, 'hidden_dim': 256, 'dropout_rate': 0.7} = 192.80501150811648


In [None]:
param_grid = {
        'model_type': ['default'],
        'lr': [1e-3],
        'embed_dim': [256],
        'n_layer': [3],
        'hidden_dim': [512], #[256, 512, 1024],
        'dropout_rate': [0.1, 0.3, 0.5] #[0.3, 0.5, 0.7]
    }
best_params, best_model, best_perplex = optimize_parameters(param_grid=param_grid)
save_truncated_distribution(best_model, 'lstm_predictions_embeddings_best_new_architecture.npy', short=False)

default lstm for <__main__.LSTMModel object at 0x780fa72124d0>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 512, 'dropout_rate': 0.1}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.6927
lstm validation perplexity: 390.9193368903453
lstm validation loss: 5.968501239189511


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 5.8317
lstm validation perplexity: 277.4719490679521
lstm validation loss: 5.625719843637301


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.4259
lstm validation perplexity: 228.09825495841477
lstm validation loss: 5.4297764789253256


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.1234
lstm validation perplexity: 209.4674857429599
lstm validation loss: 5.344568527999918


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 4.8803
lstm validation perplexity: 205.99236941858476
lstm validation loss: 5.327839126446166


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 4.6782
lstm validation perplexity: 208.40745064328655
lstm validation loss: 5.339495060887385


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 4.5036
lstm validation perplexity: 217.96402914343875
lstm validation loss: 5.384330045245263


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 4.3521
lstm validation perplexity: 232.48230251795425
lstm validation loss: 5.448814103927926


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 4.2182
lstm validation perplexity: 253.21020301396476
lstm validation loss: 5.534219985721832


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.0979
lstm validation perplexity: 268.75603398410357
lstm validation loss: 5.593804031223703


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 3.9906
lstm validation perplexity: 293.64536688972055
lstm validation loss: 5.682372804182628


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 3.8957
lstm validation perplexity: 302.01839340844214
lstm validation loss: 5.710487920846183


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 3.8080
lstm validation perplexity: 320.4393507898924
lstm validation loss: 5.769693025349022


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 3.7282
lstm validation perplexity: 341.07832780696936
lstm validation loss: 5.832112151220132


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 3.6550
lstm validation perplexity: 364.005479468068
lstm validation loss: 5.897168921007141


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 3.5884
lstm validation perplexity: 382.24682101572023
lstm validation loss: 5.946066528268952


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 3.5253
lstm validation perplexity: 405.31151749627077
lstm validation loss: 6.004655950443111


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 3.4677
lstm validation perplexity: 424.57374594636156
lstm validation loss: 6.05108571491836


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 3.4156
lstm validation perplexity: 444.9879623040881
lstm validation loss: 6.098047230798306


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 3.3645
lstm validation perplexity: 471.00974488565976
lstm validation loss: 6.154878783580858
default lstm validation perplexity for <__main__.LSTMModel object at 0x780fa72124d0>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 512, 'dropout_rate': 0.1} = 471.3563138721018
default lstm for <__main__.LSTMModel object at 0x780fa7242920>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 512, 'dropout_rate': 0.3}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.6609
lstm validation perplexity: 389.4205995820919
lstm validation loss: 5.964659992366049


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 5.8558
lstm validation perplexity: 280.7681687018445
lstm validation loss: 5.637529306402087


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.4960
lstm validation perplexity: 232.91923901695424
lstm validation loss: 5.45069177982016


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.2391
lstm validation perplexity: 207.82086097022886
lstm validation loss: 5.336676463281037


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.0445
lstm validation perplexity: 195.97689921782492
lstm validation loss: 5.277996791150802


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 4.8906
lstm validation perplexity: 189.41825647745958
lstm validation loss: 5.243957567223124


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 4.7636
lstm validation perplexity: 184.32793265930061
lstm validation loss: 5.216716414012544


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 4.6569
lstm validation perplexity: 182.65189162801377
lstm validation loss: 5.207582109760418


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 4.5611
lstm validation perplexity: 180.82444030603025
lstm validation loss: 5.197526617506057


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.4796
lstm validation perplexity: 181.87480438991213
lstm validation loss: 5.203318562405795


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.4039
lstm validation perplexity: 182.12248286793533
lstm validation loss: 5.204679443516278


  0%|          | 0/1014 [00:00<?, ?it/s]

KeyboardInterrupt: 

In [None]:
param_grid = {
        'model_type': ['default'],
        'lr': [1e-3, 1e-4],
        'embed_dim': [256],
        'n_layer': [3],
        'hidden_dim': [512], #[256, 512, 1024],
        'dropout_rate': [0.2] #[0.3, 0.5, 0.7]
    }
best_params, best_model, best_perplex = optimize_parameters(param_grid=param_grid)

default lstm for <__main__.LSTMModel object at 0x780fa719bac0>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 512, 'dropout_rate': 0.2}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.7764
lstm validation perplexity: 414.380223226878
lstm validation loss: 6.026783965933089


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 5.9273
lstm validation perplexity: 294.5568209305871
lstm validation loss: 5.6854719248225996


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.5429
lstm validation perplexity: 240.8208347665581
lstm validation loss: 5.484053232808693


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.2624
lstm validation perplexity: 214.11122608698253
lstm validation loss: 5.366495628069618


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.0441
lstm validation perplexity: 202.38943154185654
lstm validation loss: 5.310193720347733


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 4.8675
lstm validation perplexity: 196.60519210032405
lstm validation loss: 5.281197616826825


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 4.7169
lstm validation perplexity: 195.71035283946398
lstm validation loss: 5.2766357745841095


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 4.5891
lstm validation perplexity: 199.14295810041872
lstm validation loss: 5.294022949225523


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 4.4761
lstm validation perplexity: 202.33489922110994
lstm validation loss: 5.309924241510255


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.3759
lstm validation perplexity: 209.75689453671842
lstm validation loss: 5.345949215065591


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.2864
lstm validation perplexity: 213.5684747532885
lstm validation loss: 5.363957506087221


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.2052
lstm validation perplexity: 218.58142830203363
lstm validation loss: 5.387158614686111


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.1313
lstm validation perplexity: 228.79135856468366
lstm validation loss: 5.432810490281397


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.0642
lstm validation perplexity: 235.58476933302228
lstm validation loss: 5.462070803393297


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.0015
lstm validation perplexity: 238.92349287844135
lstm validation loss: 5.476143387204929


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 3.9424
lstm validation perplexity: 251.0274397295147
lstm validation loss: 5.5255622547881424


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 3.8903
lstm validation perplexity: 254.78904285261225
lstm validation loss: 5.540435919840123


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 3.8381
lstm validation perplexity: 256.9931111919367
lstm validation loss: 5.549049279835332


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 3.7906
lstm validation perplexity: 262.82291843651035
lstm validation loss: 5.571480491470521


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 3.7459
lstm validation perplexity: 275.18544523969854
lstm validation loss: 5.61744521672343
default lstm validation perplexity for <__main__.LSTMModel object at 0x780fa719bac0>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 512, 'dropout_rate': 0.2} = 276.9878860283502
default lstm for <__main__.LSTMModel object at 0x780fa701ffd0>, default, {'model_type': 'default', 'lr': 0.0001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 512, 'dropout_rate': 0.2}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 8.0777
lstm validation perplexity: 1134.638386723064
lstm validation loss: 7.03406927714499


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.9132
lstm validation perplexity: 673.7187351460246
lstm validation loss: 6.512812716910541


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 6.5666
lstm validation perplexity: 547.22051586419
lstm validation loss: 6.3048518580473845


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 6.3870
lstm validation perplexity: 483.16006693485
lstm validation loss: 6.180348000272646


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 6.2637
lstm validation perplexity: 444.6167554354342
lstm validation loss: 6.097212687359415


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 6.1653
lstm validation perplexity: 415.4774936297225
lstm validation loss: 6.029428445872277


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 6.0800
lstm validation perplexity: 392.1300461257217
lstm validation loss: 5.971593535094211


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 6.0028
lstm validation perplexity: 373.02430133498126
lstm validation loss: 5.9216435685537006


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 5.9316
lstm validation perplexity: 356.507123413856
lstm validation loss: 5.876354221180265


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 5.8656
lstm validation perplexity: 343.1904027087063
lstm validation loss: 5.838285403087332


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 5.8025
lstm validation perplexity: 330.445810602888
lstm validation loss: 5.800442683981317


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 5.7431
lstm validation perplexity: 320.583555317065
lstm validation loss: 5.770142945400962


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 5.6865
lstm validation perplexity: 311.80470564726147
lstm validation loss: 5.742377048386836


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 5.6321
lstm validation perplexity: 303.93148432851575
lstm validation loss: 5.716802295505917


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 5.5799
lstm validation perplexity: 297.31458601812136
lstm validation loss: 5.694790790415509


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 5.5303
lstm validation perplexity: 290.0924686431069
lstm validation loss: 5.670199729546329


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 5.4814
lstm validation perplexity: 285.8771127415744
lstm validation loss: 5.655562042754025


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 5.4350
lstm validation perplexity: 279.8319603518558
lstm validation loss: 5.634189281411681


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 5.3895
lstm validation perplexity: 277.32704485385244
lstm validation loss: 5.625197477090201


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 5.3465
lstm validation perplexity: 273.86066155642084
lstm validation loss: 5.612619442574948
default lstm validation perplexity for <__main__.LSTMModel object at 0x780fa701ffd0>, default, {'model_type': 'default', 'lr': 0.0001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 512, 'dropout_rate': 0.2} = 274.0759952079062


In [None]:
param_grid = {
        'model_type': ['default'],
        'lr': [1e-3],
        'embed_dim': [512],
        'n_layer': [3],
        'hidden_dim': [1024], #[256, 512, 1024],
        'dropout_rate': [0.1, 0.2, 0.3] #[0.3, 0.5, 0.7]
    }
best_params, best_model, best_perplex = optimize_parameters(param_grid=param_grid)

default lstm for <__main__.LSTMModel object at 0x780fe1529b70>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.1}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.5779
lstm validation perplexity: 340.1301239897592
lstm validation loss: 5.829328262009439


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 5.6071
lstm validation perplexity: 236.9152764532934
lstm validation loss: 5.4677025939033


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.1380
lstm validation perplexity: 204.4034171027345
lstm validation loss: 5.32009557591356


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 4.7680
lstm validation perplexity: 204.95285406938876
lstm validation loss: 5.322779972539595


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 4.4349
lstm validation perplexity: 224.6814514383875
lstm validation loss: 5.414683627668235


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 4.1219
lstm validation perplexity: 267.6219953126826
lstm validation loss: 5.589575519617237


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 3.8314
lstm validation perplexity: 336.66047095796466
lstm validation loss: 5.819074917902702


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 3.5643
lstm validation perplexity: 447.5312369542057
lstm validation loss: 6.103746338528398


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 3.3265
lstm validation perplexity: 592.3236442017983
lstm validation loss: 6.384053181786855


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 3.1158
lstm validation perplexity: 676.2891174280401
lstm validation loss: 6.516620673113323


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 2.9319
lstm validation perplexity: 763.1825824224788
lstm validation loss: 6.6374972981001745


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 2.7693
lstm validation perplexity: 987.9801241099294
lstm validation loss: 6.89566258024787


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 2.6257
lstm validation perplexity: 1194.1793085453878
lstm validation loss: 7.085214457338591


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 2.4998
lstm validation perplexity: 1353.6546548512504
lstm validation loss: 7.210563365438742


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 2.3886
lstm validation perplexity: 1643.80098133014
lstm validation loss: 7.404766510702991


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 2.2897
lstm validation perplexity: 1868.9954191446973
lstm validation loss: 7.53315635648664


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 2.1984
lstm validation perplexity: 1968.3866472695534
lstm validation loss: 7.584969525423577


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 2.1155
lstm validation perplexity: 2111.8289828545776
lstm validation loss: 7.65530966752039


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 2.0395
lstm validation perplexity: 2438.7343411051775
lstm validation loss: 7.799234471047482


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 1.9718
lstm validation perplexity: 2584.611317671245
lstm validation loss: 7.857330414968467
default lstm validation perplexity for <__main__.LSTMModel object at 0x780fe1529b70>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.1} = 2586.2815268629006
default lstm for <__main__.LSTMModel object at 0x780fa6dbb880>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.2}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.6051
lstm validation perplexity: 346.1469490767806
lstm validation loss: 5.846863393209071


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 5.6540
lstm validation perplexity: 245.92969739354086
lstm validation loss: 5.505045712135547


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.2109
lstm validation perplexity: 206.1390711412428
lstm validation loss: 5.328551043637705


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 4.8785
lstm validation perplexity: 198.0416305132868
lstm validation loss: 5.288477263711636


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 4.5998
lstm validation perplexity: 200.52197379309803
lstm validation loss: 5.300923835719578


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 4.3508
lstm validation perplexity: 215.63746696819507
lstm validation loss: 5.373598603937752


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 4.1289
lstm validation perplexity: 250.13782757938478
lstm validation loss: 5.522012076264085


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 3.9298
lstm validation perplexity: 276.3121589872601
lstm validation loss: 5.6215312374296635


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 3.7508
lstm validation perplexity: 317.89200552145854
lstm validation loss: 5.761711719822813


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 3.5915
lstm validation perplexity: 346.3168306953484
lstm validation loss: 5.847354051715518


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 3.4474
lstm validation perplexity: 383.0860487300777
lstm validation loss: 5.9482596342595695


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 3.3204
lstm validation perplexity: 413.5213581048811
lstm validation loss: 6.0247091651367946


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 3.2051
lstm validation perplexity: 456.6548319807722
lstm validation loss: 6.12392781444768


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 3.1000
lstm validation perplexity: 505.6349309083599
lstm validation loss: 6.225814928476955


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 3.0044
lstm validation perplexity: 534.5361013589297
lstm validation loss: 6.281399270483435


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 2.9175
lstm validation perplexity: 593.7100882844899
lstm validation loss: 6.386391133358118


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 2.8366
lstm validation perplexity: 640.0172652156118
lstm validation loss: 6.461495152889241


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 2.7623
lstm validation perplexity: 717.2903288658612
lstm validation loss: 6.575480680350657


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 2.6945
lstm validation perplexity: 732.0527464220639
lstm validation loss: 6.595852569318901


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 2.6299
lstm validation perplexity: 784.4939209494852
lstm validation loss: 6.665038823194033
default lstm validation perplexity for <__main__.LSTMModel object at 0x780fa6dbb880>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.2} = 786.2927440108327
default lstm for <__main__.LSTMModel object at 0x780fc0771c00>, default, {'model_type': 'default', 'lr': 0.001, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.3}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.5635
lstm validation perplexity: 343.45486138772696
lstm validation loss: 5.839055695131639


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 5.6418
lstm validation perplexity: 242.62270492169435
lstm validation loss: 5.49150758207996


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.2173
lstm validation perplexity: 205.85475994304838
lstm validation loss: 5.327170871305621


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 4.9114
lstm validation perplexity: 193.7268603610054
lstm validation loss: 5.266449230706268


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 4.6690
lstm validation perplexity: 188.9936274253862
lstm validation loss: 5.241713297165202


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 4.4625
lstm validation perplexity: 195.15572538435296
lstm validation loss: 5.273797831573679


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 4.2846
lstm validation perplexity: 205.0275413436308
lstm validation loss: 5.323144318132233


  0%|          | 0/1014 [00:00<?, ?it/s]

KeyboardInterrupt: 

In [13]:
param_grid = {
        'model_type': ['regularized_full'],
        'regularizations': [['embedding_dropout'], ['weight_drop_lstm'], ['embedding_dropout', 'weight_drop_lstm']],
        'lr': [1e-3],
        'embed_dim': [512],
        'n_layer': [3],
        'hidden_dim': [1024], #[256, 512, 1024],
        'dropout_rate': [0.5, 0.7, 0.9] #[0.3, 0.5, 0.7]
    }
best_params, best_model, best_perplex = optimize_parameters(param_grid=param_grid)

NameError: name 'RegularizedLSTM' is not defined

In [None]:
param_grid = {'model_type': ['regularized_full'], 'regularizations': [['embedding_dropout'], ['weight_drop_lstm'], ['embedding_dropout', 'weight_drop_lstm']], 'lr': [1e-3], 'embed_dim': [512], 'n_layer': [3], 'hidden_dim': [2048], 'dropout_rate': [0.7, 0.9]}
best_params, best_model, best_perplex = optimize_parameters(param_grid=param_grid)
save_truncated_distribution(best_model, 'lstm_regularized_improved_predictions.npy')

using regularized lstm
regularized_full lstm for <__main__.LSTMModel object at 0x7d9ff14dba60>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout'], 'lr': 0.001, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 2048, 'dropout_rate': 0.7}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.5541
lstm validation perplexity: 358.9448050084952
lstm validation loss: 5.883168630173633


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 5.7411
lstm validation perplexity: 258.17489142376155
lstm validation loss: 5.553637229004504


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.3755
lstm validation perplexity: 213.5890810348774
lstm validation loss: 5.364053987028858


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.1304
lstm validation perplexity: 191.43383969495142
lstm validation loss: 5.254542264305476


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 4.9436
lstm validation perplexity: 180.51289240389545
lstm validation loss: 5.195802201282579


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 4.7924
lstm validation perplexity: 173.90190027807813
lstm validation loss: 5.158491348719039


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 4.6632
lstm validation perplexity: 170.99251345633562
lstm validation loss: 5.141619774587144


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 4.5461
lstm validation perplexity: 170.88881506555043
lstm validation loss: 5.141013140733148


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 4.4391
lstm validation perplexity: 169.675577822539
lstm validation loss: 5.133888248050586


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.3367
lstm validation perplexity: 171.218334413675
lstm validation loss: 5.142939551511359
regularized_full lstm validation perplexity for <__main__.LSTMModel object at 0x7d9ff14dba60>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout'], 'lr': 0.001, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 2048, 'dropout_rate': 0.7} = 170.92991302063945
using regularized lstm
regularized_full lstm for <__main__.LSTMModel object at 0x7d9fa6586530>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout'], 'lr': 0.001, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 2048, 'dropout_rate': 0.9}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 7.0266
lstm validation perplexity: 502.1669246688699
lstm validation loss: 6.218932583680149


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.1647
lstm validation perplexity: 368.9613998908333
lstm validation loss: 5.9106920312268825


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.8672
lstm validation perplexity: 319.4407994199236
lstm validation loss: 5.766571965318588


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.6696
lstm validation perplexity: 283.78338368599407
lstm validation loss: 5.648211213631995


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.5228
lstm validation perplexity: 263.74378241444987
lstm validation loss: 5.574978110728334


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.4075
lstm validation perplexity: 249.27439755024824
lstm validation loss: 5.518554287904187


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.3067
lstm validation perplexity: 242.7021410723618
lstm validation loss: 5.491834934582237


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 5.2185
lstm validation perplexity: 233.75346214238
lstm validation loss: 5.4542669793634095


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 5.1371
lstm validation perplexity: 224.6389552038905
lstm validation loss: 5.414494469846798


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 5.0629
lstm validation perplexity: 224.59335019386864
lstm validation loss: 5.414291434536868


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.9926
lstm validation perplexity: 215.71627478604464
lstm validation loss: 5.373964001573603


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.9300
lstm validation perplexity: 212.02633206224917
lstm validation loss: 5.356710474799655


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.8676
lstm validation perplexity: 210.75663044447887
lstm validation loss: 5.350704057570839


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.8109
lstm validation perplexity: 207.25278690454354
lstm validation loss: 5.3339392409915485


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.7544
lstm validation perplexity: 206.58982984594573
lstm validation loss: 5.330735329171588


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 4.7020
lstm validation perplexity: 206.88412927454223
lstm validation loss: 5.332158874581422
regularized_full lstm validation perplexity for <__main__.LSTMModel object at 0x7d9fa6586530>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout'], 'lr': 0.001, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 2048, 'dropout_rate': 0.9} = 205.6850061898235
using regularized lstm
regularized_full lstm for <__main__.LSTMModel object at 0x7d9fa645f4c0>, default, {'model_type': 'regularized_full', 'regularizations': ['weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 512, 'n_layer': 3, 'hidden_dim': 2048, 'dropout_rate': 0.7}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.7191
lstm validation perplexity: 399.696160067733
lstm validation loss: 5.9907046586351855


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 5.8553
lstm validation perplexity: 281.46993404164033
lstm validation loss: 5.640025635629159


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.4799
lstm validation perplexity: 233.1849108583602
lstm validation loss: 5.451831747658884


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.2209
lstm validation perplexity: 203.4238717892357
lstm validation loss: 5.315291840490093


  0%|          | 0/1014 [00:00<?, ?it/s]

In [55]:
param_grid = {
    'model_type': ['regularized_full'],
    'regularizations': [['embedding_dropout'], ['weight_drop_lstm'], ['embedding_dropout', 'weight_drop_lstm']],
    'lr': [1e-3],
    'embed_dim': [256],
    'n_layer': [3],
    'hidden_dim': [1024],
    'dropout_rate': [0.3, 0.5, 0.7]
}
param_list = [
    {
        'model_type': 'regularized_full',
        'regularizations': ['embedding_dropout'],
        'lr': 1e-3,
        'embed_dim': 512,
        'n_layer': 3,
        'hidden_dim': 2048,
        'dropout_rate': 0.7,
    },
    {
        'model_type': 'regularized_full',
        'regularizations': ['weight_drop_lstm'],
        'lr': 1e-3,
        'embed_dim': 512,
        'n_layer': 3,
        'hidden_dim': 2048,
        'dropout_rate': 0.7,
    },
    {
        'model_type': 'regularized_full',
        'regularizations': ['embedding_dropout', 'weight_drop_lstm'],
        'lr': 1e-3,
        'embed_dim': 512,
        'n_layer': 3,
        'hidden_dim': 2048,
        'dropout_rate': 0.7,
    }
]
optimize_parameters(param_grid=param_grid)

using regularized lstm
using embedding dropout
regularized_full lstm for <__main__.LSTMModel object at 0x7edfcc8e9090>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.3}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.8819
lstm validation perplexity: 470.7972051488268
lstm validation loss: 6.154427438981132


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.0752
lstm validation perplexity: 339.0552114399707
lstm validation loss: 5.826162959723921


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.7124
lstm validation perplexity: 275.14690499229033
lstm validation loss: 5.617305155004526


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.4393
lstm validation perplexity: 245.00767991474925
lstm validation loss: 5.501289556644246


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.2171
lstm validation perplexity: 223.0287005394577
lstm validation loss: 5.407300465149393


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.0457
lstm validation perplexity: 215.65134335605723
lstm validation loss: 5.373662952409285


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 4.8918
lstm validation perplexity: 209.3321052877582
lstm validation loss: 5.343922011323127


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 4.7638
lstm validation perplexity: 214.71014900076287
lstm validation loss: 5.369288974381222


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 4.6542
lstm validation perplexity: 212.1333579151855
lstm validation loss: 5.357215123674768


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.5551
lstm validation perplexity: 217.67417649471733
lstm validation loss: 5.382999341520282


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.4620
lstm validation perplexity: 216.59274769969898
lstm validation loss: 5.37801885154881
regularized_full lstm validation perplexity for <__main__.LSTMModel object at 0x7edfcc8e9090>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.3} = 218.5019249778088
using regularized lstm
using embedding dropout
regularized_full lstm for <__main__.LSTMModel object at 0x7edfcc1e0cd0>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.5}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.8916
lstm validation perplexity: 498.506640342074
lstm validation loss: 6.21161690995926


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.1269
lstm validation perplexity: 356.68071986533596
lstm validation loss: 5.87684103950322


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.7850
lstm validation perplexity: 288.5943677761036
lstm validation loss: 5.6650221304740205


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.5316
lstm validation perplexity: 251.05771179232121
lstm validation loss: 5.525682840161964


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.3389
lstm validation perplexity: 229.03006470004198
lstm validation loss: 5.433853281836615


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.1925
lstm validation perplexity: 213.2586320023812
lstm validation loss: 5.36250566387557


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.0654
lstm validation perplexity: 204.01897066183903
lstm validation loss: 5.318212982960982


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 4.9577
lstm validation perplexity: 199.91769849528006
lstm validation loss: 5.297905774331981


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 4.8654
lstm validation perplexity: 193.28618444910137
lstm validation loss: 5.264171911591237


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.7848
lstm validation perplexity: 196.12710343706735
lstm validation loss: 5.278762935977545


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.7069
lstm validation perplexity: 191.1492350819702
lstm validation loss: 5.253054458470644


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.6375
lstm validation perplexity: 190.75905895330368
lstm validation loss: 5.251011160273086


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.5750
lstm validation perplexity: 191.45760554517034
lstm validation loss: 5.25466640314715


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.5114
lstm validation perplexity: 187.84764864283633
lstm validation loss: 5.235631254735549


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.4546
lstm validation perplexity: 193.211430630676
lstm validation loss: 5.263785084783304


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 4.4042
lstm validation perplexity: 191.63618339055594
lstm validation loss: 5.255598696300825
regularized_full lstm validation perplexity for <__main__.LSTMModel object at 0x7edfcc1e0cd0>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.5} = 192.14258538814894
using regularized lstm
using embedding dropout
regularized_full lstm for <__main__.LSTMModel object at 0x7edfcc3ebf10>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.7}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 6.9335
lstm validation perplexity: 518.618489021346
lstm validation loss: 6.251168524264939


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.2228
lstm validation perplexity: 388.7778117133479
lstm validation loss: 5.963008002319299


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.9217
lstm validation perplexity: 325.27658316247334
lstm validation loss: 5.7846758455284855


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.7001
lstm validation perplexity: 282.6944838083423
lstm validation loss: 5.644366751981201


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.5326
lstm validation perplexity: 255.0647884063674
lstm validation loss: 5.541517585069388


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.3903
lstm validation perplexity: 236.40812754223916
lstm validation loss: 5.465559665407273


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.2820
lstm validation perplexity: 221.23874396916082
lstm validation loss: 5.399242407878219


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 5.1877
lstm validation perplexity: 214.590409403282
lstm validation loss: 5.368731138626099


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 5.1012
lstm validation perplexity: 210.12565466443283
lstm validation loss: 5.3477057073190855


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 5.0265
lstm validation perplexity: 204.91249282315087
lstm validation loss: 5.322583023728605


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.9549
lstm validation perplexity: 201.86104998987412
lstm validation loss: 5.307579589371911


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.8889
lstm validation perplexity: 194.10981722529115
lstm validation loss: 5.268424067078781


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.8294
lstm validation perplexity: 191.66745964996005
lstm validation loss: 5.2557618894248215


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.7778
lstm validation perplexity: 192.04561626226598
lstm validation loss: 5.257732928508308


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.7199
lstm validation perplexity: 191.29506979687145
lstm validation loss: 5.253817104012017


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 4.6721
lstm validation perplexity: 193.71232223757147
lstm validation loss: 5.266374183451918


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 4.6226
lstm validation perplexity: 190.4753057402691
lstm validation loss: 5.24952255750423


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 4.5762
lstm validation perplexity: 192.62681522159468
lstm validation loss: 5.260754717199225


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 4.5272
lstm validation perplexity: 194.3678043070976
lstm validation loss: 5.269752262606313
regularized_full lstm validation perplexity for <__main__.LSTMModel object at 0x7edfcc3ebf10>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.7} = 190.44887223588924
using regularized lstm
using weight drop
regularized_full lstm for <__main__.LSTMModel object at 0x7edfcc1e3490>, default, {'model_type': 'regularized_full', 'regularizations': ['weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.3}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 7.0076
lstm validation perplexity: 512.976617343104
lstm validation loss: 6.240230263904092


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.1583
lstm validation perplexity: 349.4609061586588
lstm validation loss: 5.856391698932938


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.7599
lstm validation perplexity: 272.4302572434457
lstm validation loss: 5.6073826446246064


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.4574
lstm validation perplexity: 231.97685682857554
lstm validation loss: 5.446637611641201


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.2219
lstm validation perplexity: 207.72162040562094
lstm validation loss: 5.336198819858559


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.0329
lstm validation perplexity: 194.9338504590905
lstm validation loss: 5.272660272597852


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 4.8775
lstm validation perplexity: 186.63567804629727
lstm validation loss: 5.229158470813558


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 4.7466
lstm validation perplexity: 183.2928624642779
lstm validation loss: 5.2110852149946485


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 4.6340
lstm validation perplexity: 181.37388248707276
lstm validation loss: 5.200560549846006


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.5348
lstm validation perplexity: 181.21443772162937
lstm validation loss: 5.199681068810164


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.4473
lstm validation perplexity: 182.50249861335982
lstm validation loss: 5.2067638639609335


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.3675
lstm validation perplexity: 182.18686715463585
lstm validation loss: 5.205032902945013


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.2925
lstm validation perplexity: 183.0488388903492
lstm validation loss: 5.209752996417743


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.2232
lstm validation perplexity: 185.87598201808382
lstm validation loss: 5.22507968798403
regularized_full lstm validation perplexity for <__main__.LSTMModel object at 0x7edfcc1e3490>, default, {'model_type': 'regularized_full', 'regularizations': ['weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.3} = 186.02901173671123
using regularized lstm
using weight drop
regularized_full lstm for <__main__.LSTMModel object at 0x7edff3cc7910>, default, {'model_type': 'regularized_full', 'regularizations': ['weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.5}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 7.0571
lstm validation perplexity: 547.3156393460171
lstm validation loss: 6.30502567319568


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.2460
lstm validation perplexity: 379.4589724746219
lstm validation loss: 5.938746481570154


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.8736
lstm validation perplexity: 295.68407588830706
lstm validation loss: 5.689291573153721


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.5878
lstm validation perplexity: 248.6133422784046
lstm validation loss: 5.515898847300181


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.3677
lstm validation perplexity: 221.0497695783979
lstm validation loss: 5.398387877875861


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.1957
lstm validation perplexity: 202.72878555650547
lstm validation loss: 5.3118690540143


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.0561
lstm validation perplexity: 190.50039548579628
lstm validation loss: 5.249654270605526


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 4.9403
lstm validation perplexity: 181.84956853881613
lstm validation loss: 5.203179798808881


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 4.8423
lstm validation perplexity: 175.72145503226236
lstm validation loss: 5.168900099475427


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.7567
lstm validation perplexity: 170.87341851387532
lstm validation loss: 5.140923039778901


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.6801
lstm validation perplexity: 167.47374865836906
lstm validation loss: 5.120826614554917


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.6107
lstm validation perplexity: 164.48910811703786
lstm validation loss: 5.102844355958831


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.5473
lstm validation perplexity: 161.22415564490336
lstm validation loss: 5.082795667760093


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.4886
lstm validation perplexity: 159.37405458148845
lstm validation loss: 5.0712539838573765


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.4332
lstm validation perplexity: 158.20264287325944
lstm validation loss: 5.063876761093249


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 4.3817
lstm validation perplexity: 156.73297017190626
lstm validation loss: 5.054543530369498


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 4.3336
lstm validation perplexity: 155.35872637863528
lstm validation loss: 5.0457368066376


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 4.2876
lstm validation perplexity: 155.00035355969445
lstm validation loss: 5.043427397946932


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 4.2458
lstm validation perplexity: 154.71199515531623
lstm validation loss: 5.041565292745789


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 4.2043
lstm validation perplexity: 153.72141013459824
lstm validation loss: 5.035141939051642


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['weight_drop_lstm']}.npy
regularized_full lstm validation perplexity for <__main__.LSTMModel object at 0x7edff3cc7910>, default, {'model_type': 'regularized_full', 'regularizations': ['weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.5} = 154.09701545185945
using regularized lstm
using weight drop
regularized_full lstm for <__main__.LSTMModel object at 0x7edfcc1e0cd0>, default, {'model_type': 'regularized_full', 'regularizations': ['weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.7}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 7.1279
lstm validation perplexity: 599.9460667477093
lstm validation loss: 6.3968397624220925


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.3632
lstm validation perplexity: 423.37871303722505
lstm validation loss: 6.048267081190356


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 6.0198
lstm validation perplexity: 331.6335783089097
lstm validation loss: 5.804030679621


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.7532
lstm validation perplexity: 277.94868853888687
lstm validation loss: 5.627436523053707


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.5458
lstm validation perplexity: 245.7716699672182
lstm validation loss: 5.504402934050093


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.3839
lstm validation perplexity: 225.16840939648088
lstm validation loss: 5.416848608434898


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.2525
lstm validation perplexity: 209.35804588110136
lstm validation loss: 5.344045924404455


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 5.1436
lstm validation perplexity: 198.61359404117061
lstm validation loss: 5.291361198619835


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 5.0525
lstm validation perplexity: 189.56574917379584
lstm validation loss: 5.244735925684607


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.9736
lstm validation perplexity: 182.707336708533
lstm validation loss: 5.2078856197125125


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.9022
lstm validation perplexity: 177.9641549041918
lstm validation loss: 5.181582153070283


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.8386
lstm validation perplexity: 173.64975353723915
lstm validation loss: 5.157040359881498


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.7803
lstm validation perplexity: 170.14705167254735
lstm validation loss: 5.136663072983371


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.7266
lstm validation perplexity: 167.21379756482804
lstm validation loss: 5.119273218566646


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.6759
lstm validation perplexity: 164.96242612939236
lstm validation loss: 5.105717727540422


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 4.6298
lstm validation perplexity: 162.72903940071942
lstm validation loss: 5.092086482629963


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 4.5852
lstm validation perplexity: 161.75387374552395
lstm validation loss: 5.0860758820619285


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 4.5449
lstm validation perplexity: 160.6654086124539
lstm validation loss: 5.079323995138892


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 4.5055
lstm validation perplexity: 159.05970180728315
lstm validation loss: 5.069279614808781


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 4.4685
lstm validation perplexity: 157.79836683934653
lstm validation loss: 5.061318058797675


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['weight_drop_lstm']}.npy
regularized_full lstm validation perplexity for <__main__.LSTMModel object at 0x7edfcc1e0cd0>, default, {'model_type': 'regularized_full', 'regularizations': ['weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.7} = 158.23735144690946
using regularized lstm
using embedding dropout
using weight drop
regularized_full lstm for <__main__.LSTMModel object at 0x7edfcbbf7d00>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout', 'weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.3}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 7.0660
lstm validation perplexity: 559.7744221080758
lstm validation loss: 6.327533884912215


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.2922
lstm validation perplexity: 404.56141075768295
lstm validation loss: 6.00280354390645


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 5.9367
lstm validation perplexity: 322.0045239893587
lstm validation loss: 5.774565595102106


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.6661
lstm validation perplexity: 272.99609240359007
lstm validation loss: 5.609457481542557


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.4530
lstm validation perplexity: 247.84201496162646
lstm validation loss: 5.512791506725997


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.2904
lstm validation perplexity: 231.25550776710193
lstm validation loss: 5.443523193480157


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.1534
lstm validation perplexity: 220.25834359757587
lstm validation loss: 5.394801146494119


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 5.0391
lstm validation perplexity: 213.70650475666747
lstm validation loss: 5.364603600592164


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 4.9423
lstm validation perplexity: 210.738597734503
lstm validation loss: 5.35061849214195


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 4.8538
lstm validation perplexity: 202.69224529385386
lstm validation loss: 5.311688795665719


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.7781
lstm validation perplexity: 198.41296480888093
lstm validation loss: 5.290350539534718


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.7092
lstm validation perplexity: 197.75440902812903
lstm validation loss: 5.287025902369523


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.6419
lstm validation perplexity: 195.63899823057295
lstm validation loss: 5.27627111518459


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.5845
lstm validation perplexity: 197.34974342134268
lstm validation loss: 5.284977501988136


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.5319
lstm validation perplexity: 195.2944344468594
lstm validation loss: 5.274508340020656


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 4.4755
lstm validation perplexity: 194.83003868784826
lstm validation loss: 5.272127582030112


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.3, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 4.4289
lstm validation perplexity: 196.03595103834252
lstm validation loss: 5.278298066075466


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 4.3842
lstm validation perplexity: 196.27182363722793
lstm validation loss: 5.279500553726916


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 4.3424
lstm validation perplexity: 199.81856074721048
lstm validation loss: 5.297409758532513
regularized_full lstm validation perplexity for <__main__.LSTMModel object at 0x7edfcbbf7d00>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout', 'weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.3} = 199.22287092737125
using regularized lstm
using embedding dropout
using weight drop
regularized_full lstm for <__main__.LSTMModel object at 0x7edfcb7df700>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout', 'weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.5}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 7.1290
lstm validation perplexity: 615.8161290870714
lstm validation loss: 6.422948427234226


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.3945
lstm validation perplexity: 450.47104692700486
lstm validation loss: 6.110293806230581


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 6.0668
lstm validation perplexity: 354.35111307768756
lstm validation loss: 5.870288266432897


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 5.8030
lstm validation perplexity: 302.49406307164276
lstm validation loss: 5.712061651068854


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.5967
lstm validation perplexity: 262.2978572111508
lstm validation loss: 5.569480717610145


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.4396
lstm validation perplexity: 244.44662842794432
lstm validation loss: 5.49899699644823


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.3121
lstm validation perplexity: 231.0009376733103
lstm validation loss: 5.4424217697053745


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 5.2055
lstm validation perplexity: 218.18128374824786
lstm validation loss: 5.385326294047469


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 5.1181
lstm validation perplexity: 207.34485297863014
lstm validation loss: 5.334383363517793


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 5.0362
lstm validation perplexity: 203.38399347202855
lstm validation loss: 5.315095785690074


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 4.9731
lstm validation perplexity: 197.04539156132307
lstm validation loss: 5.283434116213624


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 4.9049
lstm validation perplexity: 190.10709263177185
lstm validation loss: 5.247587558802443


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 4.8438
lstm validation perplexity: 189.21976429297794
lstm validation loss: 5.242909113598416


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 4.7931
lstm validation perplexity: 187.24857542455257
lstm validation loss: 5.23243701448883


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.7415
lstm validation perplexity: 183.30287179488303
lstm validation loss: 5.211139821907235


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 4.6965
lstm validation perplexity: 181.53227371027685
lstm validation loss: 5.201433454482958


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 4.6541
lstm validation perplexity: 178.56069986586215
lstm validation loss: 5.184928598684941


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 4.6098
lstm validation perplexity: 178.54793111691885
lstm validation loss: 5.184857086837615


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 4.5700
lstm validation perplexity: 176.50076328659304
lstm validation loss: 5.173325200933933


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.5, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 4.5375
lstm validation perplexity: 177.90545799131368
lstm validation loss: 5.181252274309729
regularized_full lstm validation perplexity for <__main__.LSTMModel object at 0x7edfcb7df700>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout', 'weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.5} = 176.70256869780974
using regularized lstm
using embedding dropout
using weight drop
regularized_full lstm for <__main__.LSTMModel object at 0x7edfcb707eb0>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout', 'weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.7}


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 1/20, Loss: 7.2250
lstm validation perplexity: 673.3751460291525
lstm validation loss: 6.5123025978622335


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 2/20, Loss: 6.5334
lstm validation perplexity: 502.99405882283395
lstm validation loss: 6.2205783585445635


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 3/20, Loss: 6.2483
lstm validation perplexity: 410.7229923861293
lstm validation loss: 6.017919002804629


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 4/20, Loss: 6.0052
lstm validation perplexity: 340.83571781041337
lstm validation loss: 5.831400595245148


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 5/20, Loss: 5.8107
lstm validation perplexity: 307.3125516652677
lstm validation loss: 5.727865313291787


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 6/20, Loss: 5.6580
lstm validation perplexity: 279.8271698356214
lstm validation loss: 5.634172162004613


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 7/20, Loss: 5.5274
lstm validation perplexity: 260.3973792375042
lstm validation loss: 5.562208845911664


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 8/20, Loss: 5.4204
lstm validation perplexity: 243.0069003090488
lstm validation loss: 5.493089839270914


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 9/20, Loss: 5.3326
lstm validation perplexity: 233.73255576235442
lstm validation loss: 5.4541775376293735


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 10/20, Loss: 5.2605
lstm validation perplexity: 224.00776159989184
lstm validation loss: 5.41168070125426


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 11/20, Loss: 5.1873
lstm validation perplexity: 216.98299284968022
lstm validation loss: 5.379818976504453


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 12/20, Loss: 5.1269
lstm validation perplexity: 210.72143333740289
lstm validation loss: 5.350537040066573


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 13/20, Loss: 5.0669
lstm validation perplexity: 205.5673262927519
lstm validation loss: 5.3257736021425055


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 14/20, Loss: 5.0156
lstm validation perplexity: 202.93005872194627
lstm validation loss: 5.31286138135929


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 15/20, Loss: 4.9654
lstm validation perplexity: 197.89806930431
lstm validation loss: 5.2877520966453675


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 16/20, Loss: 4.9243
lstm validation perplexity: 197.00244727409853
lstm validation loss: 5.283216151371987


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 17/20, Loss: 4.8807
lstm validation perplexity: 193.63996798635574
lstm validation loss: 5.266000599745905


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 18/20, Loss: 4.8431
lstm validation perplexity: 191.55468254523032
lstm validation loss: 5.255173316398538


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 19/20, Loss: 4.8090
lstm validation perplexity: 190.62143792380573
lstm validation loss: 5.250289460885738


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy


  0%|          | 0/1014 [00:00<?, ?it/s]

Epoch 20/20, Loss: 4.7708
lstm validation perplexity: 188.17727425362833
lstm validation loss: 5.23738446669121


  0%|          | 0/5000 [00:00<?, ?it/s]

saved lstm_regularized_full_embedding_dropout-weight_drop_lstm_predictions_updated_{'embedding_type': 'default', 'embed_dim': 256, 'hidden_dim': 1024, 'n_layer': 3, 'dropout_rate': 0.7, 'regularizations': ['embedding_dropout', 'weight_drop_lstm']}.npy
regularized_full lstm validation perplexity for <__main__.LSTMModel object at 0x7edfcb707eb0>, default, {'model_type': 'regularized_full', 'regularizations': ['embedding_dropout', 'weight_drop_lstm'], 'lr': 0.001, 'embed_dim': 256, 'n_layer': 3, 'hidden_dim': 1024, 'dropout_rate': 0.7} = 188.97226982517134


({'model_type': 'regularized_full',
  'regularizations': ['weight_drop_lstm'],
  'lr': 0.001,
  'embed_dim': 256,
  'n_layer': 3,
  'hidden_dim': 1024,
  'dropout_rate': 0.5},
 <__main__.LSTMModel at 0x7edff3cc7910>,
 154.09701545185945)

### Submission

Upload a submission with the following files to Gradescope:
* proj_1.ipynb (rename to match this exactly)
* lstm_predictions.npy (this should also include all improvements from your exploration)
* neural_trigram_predictions.npy
* bigram_predictions.npy
* report.pdf

You can upload files individually or as part of a zip file, but if using a zip file be sure you are zipping the files directly and not a folder that contains them.

Be sure to check the output of the autograder after it runs.  It should confirm that no files are missing and that the output files have the correct format.  Note that the test set perplexities shown by the autograder are on a completely different scale from your validation set perplexities due to truncating the distribution and selecting different text.  Don't worry if the values seem much worse.