# Language Modeling

A language model attempts to approximate the underlying statistics of a text corpus $P(tok_n | tok_1, tok_2, ..., tok_{n-1}; \theta)$ where $\theta$ is a set of learned parameters/weights. For the purposes of this notebook, tokens will be words. Language models can be used for a variety of applications, one of which being text generation. In this assignement we will be looking at Recurrent Neural Networks.

**Tips:**
- Read all the code. We don't ask you to write the training loops, evaluation loops, and generation loops, but it is often instructive to see how the models are trained and evaluated.
- If you have a model that is learning (loss is decreasing), but you want to increase accuracy, try using ``nn.Dropout`` layers just before the final linear layer to force the model to handle missing or unfamiliar data.

In [1]:
# start time - notebook execution
import time
start_nb = time.time()

# Set up

Import packages

In [2]:
import numpy as np
import os
import re
import torch
import torch.nn as nn
import torch.nn.functional as F
import unicodedata

# ignore all warnings
import warnings
warnings.filterwarnings('ignore')

# Initialize the Autograder

In [3]:
# import the autograder tests
import hw3a_tests as ag

We will build a *vocabulary*, which will act as a dictionary of all the words our systems will know about. It will also allow us to map words to tokens, which will be unique indexes in the vocabulary. This will further allow us to transform words into one-hot vectors, where a word is represented as a vector of the same length as the vocabulary wherein all values are zeros except for the *i*th element, where *i* is the token number of the word.

In [4]:
SOS_token = 0    # A special token representing the start of a sequence
EOS_token = 1    # A special token representing the end of a sequence

class Vocab:
    def __init__(self, name):
        self.name = name                             # The name of the vocabulary
        self._word2index = {}                        # Map words to token index
        self._word2count = {}                        # Track how many times a word occurs in a corpus
        self._index2word = {0: "SOS", 1: "EOS"}      # Map token indexs back into words
        self._n_words = 2 # Count SOS and EOS        # Number of unique words in the corpus

    # Get a list of all words
    def get_words(self):
      return list(self._word2count.keys())

    # Get the number of words
    def num_words(self):
      return self._n_words

    # Convert a word into a token index
    def word2index(self, word):
      return self._word2index[word]

    # Convert a token into a word
    def index2word(self, word):
      return self._index2word[word]

    # Get the number of times a word occurs
    def word2count(self, word):
      return self._word2count[word]

    # Add all the words in a sentence to the vocabulary
    def add_sentence(self, sentence):
        for word in sentence.split(' '):
            self.add_word(word)

    # Add a single word to the vocabulary
    def add_word(self, word):
        if word not in self._word2index:
            self._word2index[word] = self._n_words
            self._word2count[word] = 1
            self._index2word[self._n_words] = word
            self._n_words += 1
        else:
            self._word2count[word] += 1

These are some helper functions to *normalize* texts, ie, make the text regular and remove some of the more problematic exceptions found in texts. This normalizer will make all words lowercase, trim plurals, and remove non-letter characters.

In [5]:
# Convert any unicode to ascii
def unicode_to_ascii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )

# Lowercase, trim, and remove non-letter characters
def normalize_string(s):
    s = unicode_to_ascii(s.lower().strip())
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    return s

Download a corpus. This corpus is the ascii text of the book, *The Silmarillion*, by J.R.R. Tolkein. It has a lot of non-common words and names to illustrate how language models deal with such things.

In [6]:
if os.path.isfile('data.txt'):
  print("data.txt already downloaded")
else:
  print("downloading data.txt")
  !wget -O data.txt https://www.dropbox.com/s/pgvn1n7t4sjxt8r/silmarillion?dl=1

data.txt already downloaded


Let's read in the data and take a look at it.

In [7]:
filename = 'data.txt'
with open(filename, encoding='utf-8') as f:
  text = f.read()
text[:1000]

'The Silmarillon Chapter 1\n\n\nOf the Beginning of Days It is told among the wise that the First War began before Arda was full-shaped, and ere yet there was any thing that grew or walked upon earth; and for long Melkor had the upper hand. But in the midst of the war a spirit of great strength and hardihood came to the aid of the Valar, hearing in the far heaven that there was battle in the Little Kingdom; and Arda was filled with the sound of his laughter. So came Tulkas the Strong, whose anger passes like a mighty wind, scattering cloud and darkness before it; and Melkor fled before his wrath and his laughter, and forsook Arda, and there was peace for a long age. And Tulkas remained and became one of the Valar of the Kingdom of Arda; but Melkor brooded in the outer darkness, and his hate was given to Tulkas for ever after.\n\nIn that time the Valar brought order to the seas and the lands and the mountains, and Yavanna planted at last the seeds that she had long devised. And since, w

Normalize the text and build the vocabulary

In [8]:
normalized_text = normalize_string(text)
VOCAB = Vocab("text")
VOCAB.add_sentence(normalized_text)

Make training and testing data splits.

In [9]:
# Convert every word into a token and build a numpy array of tokens
encoded_text = np.array([VOCAB.word2index(word) for word in normalized_text.split()])
print("The first 100 tokens")
print(encoded_text[:100])
# get the validation and the training data
test_split = 0.1
test_idx = int(len(encoded_text) * (1 - test_split))
TRAIN = encoded_text[:test_idx]
TEST = encoded_text[test_idx:]
# Decrease the size of the training set to make the assignment more tractable
TRAIN = TRAIN[:len(TRAIN)//10]

The first 100 tokens
[ 2  3  4  5  2  6  5  7  8  9 10 11  2 12 13  2 14 15 16 17 18 19 20 21
 22 23 24 25 19 26 27 13 28 29 30 31 32 22 33 34 35 36  2 37 38 39 40 41
  2 42  5  2 15 43 44  5 45 46 22 47 48 49  2 50  5  2 51 52 41  2 53 54
 13 25 19 55 41  2 56 57 22 18 19 58 59  2 60  5 61 62 39 63 48 64  2 65
 66 67 68 69]


# RNN

**Complete the code for an RNN.** An RNN takes a one-hot vector for a single word and a hidden state vector (initially all zeroes), compresses it to a hidden state, and then decompresses it. The decompressed word is tested against the **next** word in the training sequence using cross-entropy loss. The RNN also produces the hidden state vector to pass in with the next word in the training sequence. The neural network must guess the next word as well as learn to create a hidden state vector that helps the next iteration make a better word guess.

The neural network's forward function should take two inputs:
- The input word (`x`) represented as a one-hot vector of size `1 x vocab_size`
- The hidden state, a `1 x hidden_size` vector.

A brief note on batching: We will not be using batching in this assignment. But there must always be a batching dimension in our input and output tensors. Thus we will have a batching dimension size of 1 and our tensors will often be of a shape `1 x something`.

The neural network architecture shoud concatenate the `x` and the `hidden_state` to make one big long vector. The neural network get's to learn through weights whether to draw from the one-hot (or certain parts of the one-hot) or the hidden state when trying to predict the next token.

The neural network should have two affine transformations (`nn.Linear` modules). The first should transform the a tensor of size `1 x (vocab_size + hidden_size)` into a tensor of size `1 x hidden_size`, followed by a sigmoid activation. This compresses the information from the input and forces the neural network to make compromises about what is important. Think of the sigmoid a gate that says yes or no to different combinations of inputs. The second affine transform should be from a tensor of `1 x hidden_size` to one of `1 x vocab_size`. The values in resultant tensor are the raw scores for each possible token in the vocabulary. The forward function should run these scores through a log softmax.

The output of the forward function should be two values: a tensor of log-scale scores for each token, and a new hidden state.

## RNN---Model (30 Points)

In [10]:
class MyRNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size=None):
    super(MyRNN, self).__init__()
    # If output_size is not given, use input_size
    if output_size is None:
      output_size = input_size
    self.input_size = input_size      # the size of the input vocabulary
    self.hidden_size = hidden_size    # the size of the hidden state
    self.output_size = output_size    # the size of the output vocabulary (if different)
    
    # Create 2 affine transformation as instructed.
    # First linear layer (vocab+hidden) to (hidden)
    self.enc_fc = nn.Linear(input_size + hidden_size, hidden_size)
    # Second liniear layer (hidden) to output
    self.dec_fc = nn.Linear(hidden_size, output_size)
    # Activation
    self.enc_act = nn.Sigmoid()
    self.dec_act = nn.LogSoftmax(dim = 1)



  def forward(self, x, hidden_state):
    output = None
    hidden = None

    # Concatenate input
    input_state = torch.cat((x,hidden_state), dim = 1)      # (1, vocab + hidden)
    # Collect new hidden state
    hidden = self.enc_act(self.enc_fc(input_state))  # (1, hidden size)
    # Decoder
    output = self.dec_act(self.dec_fc(hidden))       # (1, vocab size)


    return output, hidden

  # Make an initial hidden state with some randomness to the values
  def init_hidden(self):
    return nn.init.kaiming_uniform_(torch.empty(1, self.hidden_size))

Construct the RNN

In [11]:
# it's ok to modify this cell
RNN_HIDDEN_SIZE = 256
RNN_LEARNING_RATE = 0.0005
RNN_NUM_EPOCHS = 4

In [12]:
# Create the network
rnn = MyRNN(VOCAB.num_words(), RNN_HIDDEN_SIZE)

# Create the loss function and optimizer
criterion_rnn = nn.NLLLoss()
optimizer_rnn = torch.optim.Adam(rnn.parameters(), lr=RNN_LEARNING_RATE)

In [13]:
# student check - the following test must return a value of 10 to receive full credit (no partial credit)
ag.unit_test_RNN_structure()

NllLossBackward0
LogSoftmaxBackward0
AddmmBackward0
AccumulateGrad
SigmoidBackward0
AddmmBackward0
AccumulateGrad
TBackward0
AccumulateGrad
TBackward0
AccumulateGrad
NllLossBackward0
LogSoftmaxBackward0
AddmmBackward0
AccumulateGrad
SigmoidBackward0
AddmmBackward0
AccumulateGrad
TBackward0
AccumulateGrad
TBackward0
AccumulateGrad
NllLossBackward0
LogSoftmaxBackward0
AddmmBackward0
AccumulateGrad
SigmoidBackward0
AddmmBackward0
AccumulateGrad
TBackward0
AccumulateGrad
TBackward0
AccumulateGrad
Test A: 10/10


Complete the following function.

`token2onehot()` takes a token—a number—and converts it to a one-hot tensor of the shape `1 x vocab_size`. All values should be zeros except for one element, which should be a `1`.

`get_rnn_x_y_()` should return the `x` and `y` for a recurrent neural network.

- The `x` return value should be a tensor containing a one-hot vector representing the word at position `index` and have a shape of `1 x vocab_size` (the batch size is 1).
- The `y` return value should be the token of the word at position `index+1` and be a vector with a single value in it. That is, it should not be a scalar but a vector of length 1.


In [14]:
def token2onehot(token, vocab_size = VOCAB.num_words()):
  one_hot = None
  
  # print(token.shape)
  one_hot = torch.zeros(1, vocab_size)
  one_hot[0, token] = 1.0

  
  return one_hot

def get_rnn_x_y(data, index, vocab_size = VOCAB.num_words()):
  x = None
  y = None
  
  x = data[index]
  y = data[index + 1]
  x = token2onehot(x, vocab_size)
  y = torch.tensor([y], dtype=torch.long)

  
  return x, y

In [15]:
# student check - the following test must return a value of 10 to receive full credit (no partial credit)
ag.unit_test_token2onehot()

Test B: 10/10


In [16]:
# student check - the following test must return a value of 100 to receive full credit (no partial credit)
ag.unit_test_get_xy()

Test C: 10/10


The following is the training loop. You can see how your `get_rnn_x_y()` is used.

In [17]:
def train_rnn(model, optimizer, criterion, data, num_epochs):
  model.train()
  for epoch in range(num_epochs):
    hidden_state = model.init_hidden()
    losses = []
    for i in range(len(data)-2):
      x, y = get_rnn_x_y(data, i)
      x = x.float()
      output, new_hidden = model(x, hidden_state)
      hidden_state = new_hidden.detach()
      loss = criterion(output, y)
      optimizer.zero_grad()
      loss.backward()
      losses.append(loss.item())
      nn.utils.clip_grad_norm_(model.parameters(), 1)
      optimizer.step()
      if i%100 == 0:
        print('iter', i, 'loss', np.array(losses).mean())
    print('epoch', epoch, 'loss', np.array(losses).mean())

In [18]:
train_rnn(rnn, optimizer_rnn, criterion_rnn, TRAIN, RNN_NUM_EPOCHS)

iter

 0 loss 8.599178314208984
iter 100 loss 7.770304787276995
iter 200 loss 7.115117769336226
iter 300 loss 6.9720861824643965
iter 400 loss 6.904005127951986
iter 500 loss 6.865162016388899
iter 600 loss 6.85215837090662
iter 700 loss 6.907519641684398
iter 800 loss 6.863512961159038
iter 900 loss 6.882670731047018
iter 1000 loss 6.838692603649555
iter 1100 loss 6.759765257302682
iter 1200 loss 6.769200443824463
iter 1300 loss 6.773051324392813
iter 1400 loss 6.804413663924038
iter 1500 loss 6.808048483056279
iter 1600 loss 6.824075170861863
iter 1700 loss 6.8038947246552635
iter 1800 loss 6.797307993755415
iter 1900 loss 6.835429075829046
iter 2000 loss 6.8563809046919255
iter 2100 loss 6.859360862379696
iter 2200 loss 6.825062740991896
iter 2300 loss 6.800405086666954
iter 2400 loss 6.797861463325513
iter 2500 loss 6.792690817283088
iter 2600 loss 6.775830049094949
iter 2700 loss 6.763777700542124
iter 2800 loss 6.75262380612403
iter 2900 loss 6.752545065240422
iter 3000 loss 6.76655007

## RNN---Test (20 Points)

Even if loss went down, we can't make any guarantees about what the network will do on unseen sequences. To evaluate, we will measure **perplexity**, how much the network is confused by data. As you adjust hyperparameters and retrain the model you may notice that a model with a lower loss on training data doesn't necessarily produce a model with lower perplexity on test data.

In [19]:
# student check - the following test must return a value < 2000 to receive full credit (no partial credit)
ag.evaluate_rnn(rnn, TEST, criterion_rnn)

Perplexity:  tensor(1427.0468)
Test D: 20/20


## RNN---Generate (10 Points)

Let's use the RNN to generate some text. This is going to take a bit of set up. We need to take an input prompt---the start of the text---and tokenize it. Then we need a hidden state that represents the prompt, so we have to run the input prompt through the RNN to build up the hidden state. Then we can finally let the RNN loose to generate new text by feeding the outputs of the RNN (and the hidden state) back into the RNN as inputs.

In [20]:
# Example input prompt:
input_prompt = "the First War began"
# How long should the continuation be?
num_new_tokens = 10

# Normalize the input
normalized_input = normalize_string(input_prompt)
# Tokenize the input
tokenized_input = [VOCAB.word2index(w) for w in normalized_input.split()]
print("input prompt:", input_prompt)
print("input tokens:", tokenized_input, '\n')

# We need to make a hidden_state that is representative of what is in the input prompt
def prep_hidden_state(tokenized_input, rnn, verbose=False):
  # Get an initial hidden state
  hidden_state = rnn.init_hidden()
  # Run the input prompt through the RNN to build up the hidden state.
  # Discard the outputs (we are not trying to make predictions) until we get to the end
  for token in tokenized_input:
    if verbose:
      print("current token:", token, VOCAB.index2word(token))
    # Get the one-hot for the current token
    x = token2onehot(token)
    x = x.float()
    # Run the current one-hot and hidden state through the RNN
    output, hidden_state = rnn(x, hidden_state)
    # Get the highest predicted token
    next_token = output.argmax().item()
    if verbose:
      print("predicted next token:", next_token, VOCAB.index2word(next_token), '\n')
  return hidden_state

# Get the hidden state that represents the input prompt
print("Prepping hidden state:\n")
hidden_state = prep_hidden_state(tokenized_input, rnn, verbose=True)

# Generate a continuation by sampling from the RNN and then feeding the predicted output
# back into the RNN over and over. The default sampling is argmax.
def generate_rnn(rnn, num_new_tokens, token, hidden_state, fn=lambda d:d.argmax().item(), verbose=False):
  # Keep generating more by feeding the predicted output back into the RNN as input
  # Start with the last token of the input prompt and the newly prepped hidden state
  if verbose:
    print("Generating continuation:\n")
  continuation = []
  for n in range(num_new_tokens):
    if verbose:
      print("current token:", token, VOCAB.index2word(token))
    # Get the one-hot for the current token
    x = token2onehot(token)
    x = x.float()
    # Run the current one-hot through the RNN
    output, hidden_state = rnn(x, hidden_state)
    # Predict the next token
    next_token = fn(output)
    if verbose:
      print("predicted next token:", next_token, VOCAB.index2word(next_token), '\n')
    # Remember the new token
    continuation.append(next_token)
    # update the current
    token = next_token
  return continuation

# Generate the continuation. Use the argmax function to sample from the RNN's outputs
token = tokenized_input[-1]
continuation = generate_rnn(rnn, num_new_tokens, token, hidden_state, verbose=True)

# All done
print("Final continuation:")
print(continuation)
continuation_text = [VOCAB.index2word(t) for t in continuation]
print(continuation_text)
print("Final:")
print(input_prompt + ' ' + ' '.join(continuation_text))

input prompt: the First War began
input tokens: [2, 14, 15, 16] 

Prepping hidden state:

current token: 2 the
predicted next token: 5 of 

current token: 14 first
predicted next token: 2 the 

current token: 15 war
predicted next token: 2 the 

current token: 16 began
predicted next token: 2 the 

Generating continuation:

current token: 16 began
predicted next token: 2 the 

current token: 2 the
predicted next token: 2 the 

current token: 2 the
predicted next token: 5 of 

current token: 5 of
predicted next token: 2 the 

current token: 2 the
predicted next token: 2 the 

current token: 2 the
predicted next token: 5 of 

current token: 5 of
predicted next token: 2 the 

current token: 2 the
predicted next token: 2 the 

current token: 2 the
predicted next token: 5 of 

current token: 5 of
predicted next token: 2 the 

Final continuation:
[2, 2, 5, 2, 2, 5, 2, 2, 5, 2]
['the', 'the', 'of', 'the', 'the', 'of', 'the', 'the', 'of', 'the']
Final:
the First War began the the of the the of

Odds are good that you got an output that was highly repetitive. This is in part because we always take the `argmax` of the output logits. There are sequences in the corpus that are highly probable, so by sampling the most likely logit, we are going to get trapped in a local max.

Instead, we need to treat the output of the RNN as a distribution and *sample* from the distribution to proabilisticially choose the next token, proportional to how highly activated each token is.

**Complete the following function.** `my_sample()` should take a tensor of log probabilities for each token in the the vocabulary (the output of the RNN). It should probabilistically sample from this distribution and return a highly probable next token as an integer.

**Hints:** Consider using `torch.multinomial`. Remember, your input is in log scale.

In [21]:
def my_sample(log_probs):
  token = None
  ### BEGIN SOLUTION

  # Given the hint, we need to convert log probability to probablity first
  # Then using the probablity we sample using torch.multinomial
  # print(log_probs.shape)  #(1, vocab)
  probability = torch.exp(log_probs)
  token = torch.multinomial(probability, num_samples=1)
  token = token.item()

  ### END SOLUTION
  return token

In [22]:
# student check - the following test must return a value > 90 to receive full credit (no partial credit)
ag.unit_test_my_sample()

Score:  96
Test E: 10/10


Run the cell below a few times and see what gets generated with your sampling technique.

In [23]:
# Example input prompt:
input_prompt = "the First War began"
# How long should the continuation be?
num_new_tokens = 10

# Normalize the input
normalized_input = normalize_string(input_prompt)
# Tokenize the input
tokenized_input = [VOCAB.word2index(w) for w in normalized_input.split()]
print("input prompt:", input_prompt)
print("input tokens:", tokenized_input, '\n')
# Get an initial hidden state
hidden_state = prep_hidden_state(tokenized_input, rnn)

# Generate the continuation. Use my_sample
token = tokenized_input[-1]
continuation = generate_rnn(rnn, num_new_tokens, token, hidden_state, fn=my_sample, verbose=True)

# All done
print("Final continuation:")
print(continuation)
continuation_text = [VOCAB.index2word(t) for t in continuation]
print(' '.join(continuation_text))
print("Final:")
print(input_prompt + ' ' + ' '.join(continuation_text))

input prompt: the First War began
input tokens: [2, 14, 15, 16] 

Generating continuation:

current token: 16 began
predicted next token: 32 earth 

current token: 32 earth
predicted next token: 309 under 

current token: 309 under
predicted next token: 103 when 

current token: 103 when
predicted next token: 39 . 

current token: 39 .
predicted next token: 650 what 

current token: 650 what
predicted next token: 760 teleri 

current token: 760 teleri
predicted next token: 10 told 

current token: 10 told
predicted next token: 277 him 

current token: 277 him
predicted next token: 40 but 

current token: 40 but
predicted next token: 258 desired 

Final continuation:
[32, 309, 103, 39, 650, 760, 10, 277, 40, 258]
earth under when . what teleri told him but desired
Final:
the First War began earth under when . what teleri told him but desired


## RNN---Optimize (10 Points)

Well, that is probably crazy and non-sensical. If the distribution has a lot of nearly-equal probability candidates, the sampling technique you wrote probably makes some choices that look random. But it probably isn't stuck in a local max.

How do we fix this? We introduce something called **temperature**. Temperature is a value between 0.0 and 1.0 that makes higher probability tokens more probable and less probable tokens less probable.

**Complete the following function.** It should operate exactly like `my_sample()` except that it should divide the probabilities (between 0.0 and 1.0) by temperature.

Dividing by `temperature=1.0` leaves the probability distribution unchanged. As temperature gets smaller, approaching 0.0, the high probability tokens approach infinity faster than low probability tokens. The distribution spreads out along an exponential curve.

In [24]:
def my_temperature_sample(log_probs, temperature=1.0):
  token = None
  
  # Given the hint, we need to convert log probability to probablity first
  # Then using the probablity we sample using torch.multinomial
  # print(log_probs.shape)  #(1, vocab)

  # divide the log prob with temperature
  log_probs = log_probs / temperature
  # convert to probability
  probability = torch.exp(log_probs)
  token = torch.multinomial(probability, num_samples=1)
  token = token.item()

  
  return token

In [25]:
# student check - the following test must return True to receive full credit (no partial credit)
ag.unit_test_my_temperature_sample()

p-value:  0.0
Test F: 10/10


One more time. This time, play around with the temperature value

In [26]:
# set the temperature - it's ok to modify this cell
RNN_TEMPERATURE = 0.5

In [27]:
# Example input prompt:
input_prompt = "the First War began"
# How long should the continuation be?
num_new_tokens = 10

# Normalize the input
normalized_input = normalize_string(input_prompt)
# Tokenize the input
tokenized_input = [VOCAB.word2index(w) for w in normalized_input.split()]
print("input prompt:", input_prompt)
print("input tokens:", tokenized_input, '\n')
# Get an initial hidden state
hidden_state = prep_hidden_state(tokenized_input, rnn)

# Generate the continuation. Use my_sample
token = tokenized_input[-1]
continuation = generate_rnn(rnn, num_new_tokens, token, hidden_state, fn=lambda d:my_temperature_sample(d, RNN_TEMPERATURE), verbose=True)

# All done
print("Final continuation:")
print(continuation)
continuation_text = [VOCAB.index2word(t) for t in continuation]
print(' '.join(continuation_text))
print("Final:")
print(input_prompt + ' ' + ' '.join(continuation_text))

input prompt: the First War began
input tokens: [2, 14, 15, 16] 

Generating continuation:

current token: 16 began
predicted next token: 22 and 

current token: 22 and
predicted next token: 2 the 

current token: 2 the
predicted next token: 105 were 

current token: 105 were
predicted next token: 441 many 

current token: 441 many
predicted next token: 2 the 

current token: 2 the
predicted next token: 2 the 

current token: 2 the
predicted next token: 41 in 

current token: 41 in
predicted next token: 2 the 

current token: 2 the
predicted next token: 139 they 

current token: 139 they
predicted next token: 5 of 

Final continuation:
[22, 2, 105, 441, 2, 2, 41, 2, 139, 5]
and the were many the the in the they of
Final:
the First War began and the were many the the in the they of


# Grading

Please submit this .ipynb file to Canvas for grading.

## Final Grade

In [28]:
# student check
ag.final_grade()

Your projected points for this assignment is 70/70.

NOTE: THIS IS NOT YOUR FINAL GRADE. YOUR FINAL GRADE FOR THIS ASSIGNMENT WILL BE AT LEAST 70 OR MORE, BUT NOT LESS



## Notebook Runtime

In [29]:
# end time - notebook execution
end_nb = time.time()
# print notebook execution time in minutes
print("Notebook execution time in minutes =", (end_nb - start_nb)/60)
# warn student if notebook execution time is greater than 30 minutes
if (end_nb - start_nb)/60 > 30:
  print("WARNING: Notebook execution time is greater than 30 minutes. Your submission may not complete auto-grading on Gradescope. Please optimize your code to reduce the notebook execution time.")

Notebook execution time in minutes = 10.087188673019408
