# T-725 Natural Language Processing: Lab 5
In today's lab, we will be working with neural networks, using GRUs and Transformers for text generation.

To begin with, do the following:
* Select `"File" > "Save a copy in Drive"` to create a local copy of this notebook that you can edit.
* **Select `"Runtime" > "Change runtime type"`, and make sure that you have "Hardware accelerator" set to "GPU"**
* Select `"Runtime" > "Run all"` to run the code in this notebook.

In [1]:
import os
import warnings

# Suppress some warnings from TensorFlow about deprecated functions
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

## Generating text with neural networks
Let's create a neural language model and use it to generate some text. This time, we will use character embeddings rather than word embeddings. They are created in exactly the same way, and are often used together in neural network-based models. One benefit of using character embeddings is that we can generate words that our model has never seen before.

The model takes as input a sequence of characters and predicts which character is most likely to follow. We will generate text by repeatedly predicting and appending the next character to a string. First, however, we need some text to train it on.


In [2]:
# Based on the following tutorial:
# https://www.tensorflow.org/tutorials/text/text_generation

import tensorflow as tf
import numpy as np
import os
import time

# Let's download some text by Shakespeare to train our model
url = 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt'
path_to_file = tf.keras.utils.get_file('shakespeare.txt', url)

with open(path_to_file, encoding='utf-8') as f:
  shakespeare = f.read()

print("First 250 characters:")
print(shakespeare[:250])

print ("Length of text: {:,} characters".format(len(shakespeare)))

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
[1m1115394/1115394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
First 250 characters:
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

Length of text: 1,115,394 characters


Now we can create training examples for our model. Each example will be a pair of strings: one input string containing 100 characters, and a target string that is one character ahead. For example, the first pair we create is:

**Input string**:  `'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'`

**Target string**: `'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '`

However, before we can start training, we need to convert our text into a list of integers, where each integer represents a different character. For example, "First Citizen" becomes:

```
Character:   F   i   r   s   t      C   i   t   i   z   e   n
Integer:   [18, 47, 56, 57, 58, 1, 15, 47, 58, 47, 64, 43, 52]
```

In [3]:
# Hyper-parameters:

BATCH_SIZE = 64  # Batch size
BUFFER_SIZE = 10000  # Buffer size to shuffle the dataset
SEQUENCE_LENGTH = 100  # Length of input sequence
EMBEDDING_DIMENSION = 65  # Embedding dimension
RNN_UNITS = 1024  # Number of RNN units

In [4]:
def split_input_target(chunk):
  # Create (input_string, output_string) pairs
  input_text = chunk[:-1]
  target_text = chunk[1:]
  return input_text, target_text

def prepare_text(text):
  # The unique characters in the file
  vocab = sorted(set(text))
  print ('{} unique characters'.format(len(vocab)))

  # Creating a mapping from unique characters to indices
  char_map = {
      'char_to_index': {char: index for index, char in enumerate(vocab)},
      'index_to_char': np.array(vocab)
  }

  text_as_int = np.array([char_map['char_to_index'][c] for c in text])

  # The maximum length sentence we want for a single input in characters
  seq_length = SEQUENCE_LENGTH
  examples_per_epoch = len(text) // (seq_length+1)

  # Create training examples / targets
  char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
  sequences = char_dataset.batch(seq_length + 1, drop_remainder=True)
  dataset = sequences.map(split_input_target)

  # (TF data is designed to work with possibly infinite sequences,
  # so it doesn't attempt to shuffle the entire sequence in memory. Instead,
  # it maintains a buffer in which it shuffles elements).
  dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

  return dataset, vocab, examples_per_epoch, char_map

Now we can create and train the neural network.

In [5]:
import os

def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)


def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
      tf.keras.layers.Embedding(vocab_size,
                                embedding_dim),
      tf.keras.layers.GRU(rnn_units,
                          return_sequences=True,
                          recurrent_initializer='glorot_uniform',
                          stateful=True),
      tf.keras.layers.Dense(vocab_size)
  ])

  return model

In [9]:
def create_model(text, epochs=3):
  dataset, vocab, examples_per_epoch, char_map = prepare_text(text)

  train_model = build_model(len(vocab), EMBEDDING_DIMENSION, RNN_UNITS, BATCH_SIZE)
  train_model.compile(optimizer='adam', loss=loss)

  train_model.fit(dataset, epochs=epochs)

  pred_model = build_model(len(vocab), EMBEDDING_DIMENSION, RNN_UNITS, batch_size=1)
  pred_model.build(input_shape=(1, 100))
  pred_model.set_weights(train_model.get_weights())

  return pred_model, char_map

In [14]:
shakes_model, shakes_chars = create_model(shakespeare, epochs=3)

65 unique characters
Epoch 1/3
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 49ms/step - loss: 3.2747
Epoch 2/3
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 49ms/step - loss: 2.1069
Epoch 3/3
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 49ms/step - loss: 1.8096


In [8]:
# Ignore. Use only if Colab fails.
dataset, vocab, examples_per_epoch, char_map = prepare_text(shakespeare)
mini_data = dataset.take(1)
newshake = build_model(len(vocab), EMBEDDING_DIMENSION, RNN_UNITS, batch_size=1)
newshake.build(input_shape=(1, 100))
newshake.summary()
newshake.load_weights('shakes_model.weights.h5')


65 unique characters


FileNotFoundError: [Errno 2] Unable to synchronously open file (unable to open file: name = 'shakes_model.weights.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Now that we've trained our model, we can finally use it to generate some text. The following function takes a model and a string as input, and continually predicts and appends the next character to the string until it becomes 1,000 characters long.

In [11]:
def generate_text(model, char_map, start_string, temperature=1.0):
  # Evaluation step (generating text using the learned model)
  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  if not start_string:
    print("start_string can't be empty")
    return ""

  # Number of characters to generate
  num_generate = 1000

  # Converting our start string to numbers (vectorizing)
  input_eval = [char_map['char_to_index'][s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Here batch size == 1
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted word as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(char_map['index_to_char'][predicted_id])

  return (start_string + ''.join(text_generated))

Let's generate some text!

In [15]:
#### If connected to GPU
print(generate_text(shakes_model, shakes_chars, "ROMEO: ", temperature=1.0))

#### If not connected to GPU
# print(generate_text(newshake, char_map, "ROMEO: ", temperature=1.0))

ROMEO: Norbangsul o comfiges?
Aepo, I wirr Guck and Which well you not;-

DUKE OF YORK:
My hearth, there,
The hath ast time your lat his extage
Fride gellem and prience,
Be shorse, so not; and 'spuabte him a tood fur-thee? Wit! how in a you and growarvest see cut wrat a sholls,
Comeo it placise-he put curson?

PecHARDIUS:
So yon gurbor real? For on you with as thou west held?

HENPRYARD:
I counterfiteds siming,' in jeany and lord.

BEDNUMET:
Nor proce IGlast the? 
CAUTILISHAN:
Bome allor by the ramper king:
Tell you sar as me to vilove to earty thee:
Now feint of he would my heaved un his fou he badk'd mu.

BRINPEO:
Yow plear with lebest there of Romal,
He carlory kill best the her;

BELINDULI vit att the worrQUS:
RINCESTIO:
That I see shall sige-Be that thou hast as I my signet
To ploty for ay thee. 'Tains any this propend tit with lie.

RUCIO:
No gome must be bear,
Oy Jur nown dor'gly to comes, you.

EUCALUSn so: gear!

LUUD TORK:
VenGe and the sonf, and will his mankers,
To dever mi

# Assignment
Answer the following questions and hand in your solution in Canvas before 23:59 on Friday, September 27th. Remember to save your file before uploading it.

## Question 1
The `temperature` parameter of `generate_text()`, defined earlier in the notebook, controls how predictable the generated text will be. The lower the temperature, the more the function will tend to append the most likely character (according to the model's prediction). A higher temperature introduces some randomness, leading to more unpredictable text.

The text we generated above used a temperature of 1.0. Try generating more text using the Shakespeare model:

(a) once using a temperature of 0.2 and

(b) again using a temperature of 0.8

and describe the difference.

In [16]:
# Your solution here
point_two_text = generate_text(shakes_model, shakes_chars, "ROMEO: ", temperature=0.2)
point_eight_text = generate_text(shakes_model, shakes_chars, "ROMEO: ", temperature=0.8)
print(f'Text generated using temperature of 0.2: {point_two_text}\n\n')
print(f'Text generated using temperature of 0.8: {point_eight_text}\n\n')


# the difference seems to be that for a lower temperature, there is less instances of people speaking, and more text generated that is said by one person at a time.
# essentially, a lower temperature means conversations of long sentences from each speaker, while a higher temperature appears to generate quicker and faster conversations with
# smaller sentences per speaker.

Text generated using temperature of 0.2: ROMEO: the stranger the come to the good sould him the comes to the will and my sone,
And the stroke the with the hands and the dead my soul son.

BRUTUS:
The king of the sent the stranger to my son,
And the will stand the stranger the soners here and the hands the earth
The sentle and the hands and the consent the comportion of your son:
The see the son of shall be a fortune of the with the will and my lord,
And the stronged the stranger the hands and my lord,
And the stand and the come to the parting to the sone the comes the hand,
The come of the ward of the with the hands and my son the come to the were his consents
The hands and the proper the hand of the will of the send the stroke the earth the wind of my son,
And the proper the hands and my lord, and where is the come to the partion of the will.

BRUTUS:
The king the stare the come of the seed and the stroke the father shall be sone of the earth.

LEONTES:
I have he hath should be so sta


the difference seems to be that for a lower temperature, there is less instances of people speaking, and more text generated that is said by one person at a time.

essentially, a lower temperature means conversations of long sentences from each speaker, while a higher temperature appears to generate quicker and faster conversations with
smaller sentences per speaker, probably due to less speakers reflecting less variance of higher tempratures.

## Question 2
NLTK's `names` corpus contains a list of approximately 8,000 English names. Train a new model on `names_raw` for at least 20 epochs using the `create_model(text, epochs=n)` function defined earlier. Use the trained model to generate a list of names (with the `generate_text` function defined earlier), starting with your own first name. Your name should not contain any non-English characters, and should end with an `\n`.

Print out the names that do not appear in the training data.

(a) Do you get any actual names (or at least names that sound plausible)?

In [17]:
# Don't modify this code cell
import nltk
from nltk.corpus import names
nltk.download('names')

# Print out a few examples
names_raw = names.raw()
names_unique = set(names_raw.split())
names_raw = "\n".join(names_unique)
print(names_raw.splitlines()[:5])

[nltk_data] Downloading package names to /root/nltk_data...


['Sibley', 'Quintilla', 'Charmaine', 'Tildy', 'Love']


[nltk_data]   Unzipping corpora/names.zip.


In [25]:
# Your solution here
names_model, names_chars = create_model(names_raw, 20)
starting_name = "Valgard\n"
generated_names = generate_text(names_model, names_chars, starting_name, temperature=0.2).split('\n')
print(f'generated names:\n')
for name in generated_names:
  if name not in names_unique:
    print(f'{name}')

55 unique characters
Epoch 1/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 54ms/step - loss: 4.0902
Epoch 2/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 54ms/step - loss: 3.7510
Epoch 3/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 50ms/step - loss: 3.3749
Epoch 4/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 50ms/step - loss: 3.0707
Epoch 5/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 50ms/step - loss: 2.8029
Epoch 6/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 50ms/step - loss: 2.5742
Epoch 7/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 48ms/step - loss: 2.4606
Epoch 8/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 49ms/step - loss: 2.4003
Epoch 9/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 49ms/step - loss: 2.3698
Epoch 10/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step - loss: 2.334

we actually do get plenty of names that sound like real, plausible names such as 'Shanne', 'Jenne', Ronne', or 'Lanne'.

##Question 3
The size of the model can make a difference when it comes to performance. Create a new model that has twice the number of hidden units as the previous model and double the size of the embeddings.

(a) How does the performance change?

(b) What happens if you decrease these parameters?

In [31]:
# Your solution here
def custom_create_model(text, epochs=3, custom_embed_dims=EMBEDDING_DIMENSION, custom_rnn_units=RNN_UNITS):
  dataset, vocab, examples_per_epoch, char_map = prepare_text(text)

  train_model = build_model(len(vocab), custom_embed_dims, custom_rnn_units, BATCH_SIZE)
  train_model.compile(optimizer='adam', loss=loss)

  train_model.fit(dataset, epochs=epochs)

  pred_model = build_model(len(vocab), custom_embed_dims, custom_rnn_units, batch_size=1)
  pred_model.build(input_shape=(1, 100))
  pred_model.set_weights(train_model.get_weights())

  return pred_model, char_map


# improved model
DOUBLE_EMBEDDING_DIMENSION = EMBEDDING_DIMENSION * 2
DOUBLE_RNN_UNITS = RNN_UNITS * 2

# reduced model
REDUCED_EMBEDDING_DIMENSION = EMBEDDING_DIMENSION // 2
REDUCED_RNN_UNITS = RNN_UNITS // 2

starting_name = "Valgard\n"

double_model, double_chars = custom_create_model(names_raw, 20, DOUBLE_EMBEDDING_DIMENSION, DOUBLE_RNN_UNITS)
double_names = generate_text(double_model, double_chars, starting_name, temperature=1.0).split('\n')
print(f'generated names for the improved model:\n')
for name in double_names:
  if name not in names_unique:
    print(f'{name}')

reduced_model, reduced_chars = custom_create_model(names_raw, 20, REDUCED_EMBEDDING_DIMENSION, REDUCED_RNN_UNITS)
reduced_names = generate_text(reduced_model, reduced_chars, starting_name, temperature=0.2).split('\n')
print(f'generated names for the reduced model:\n')
for name in reduced_names:
  if name not in names_unique:
    print(f'{name}')



55 unique characters
Epoch 1/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 699ms/step - loss: 5.2259
Epoch 2/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 714ms/step - loss: 6.0185
Epoch 3/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 718ms/step - loss: 6.7325
Epoch 4/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 724ms/step - loss: 5.2512
Epoch 5/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 733ms/step - loss: 4.4817
Epoch 6/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 742ms/step - loss: 3.7971
Epoch 7/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 743ms/step - loss: 3.6510
Epoch 8/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 742ms/step - loss: 3.3743
Epoch 9/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 741ms/step - loss: 3.2607
Epoch 10/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 743ms/st

The "improved" model takes significantly longer to train, an epoch taking roughly 8 seconds on average to train, and the results are nonsenical and very poor compared to the normal and reduced modal. This is likely due to overfitting of the data and parameters being too large.

The reduced model however takes much less time to train, about 1 second per epoch of training, and still performs very well, generating names that are plausible such as "Carrine", "Alline", or "Jannie".

## Question 4
Transformer large language models can also generate text. The following code imports a pretrained GPT-2 model from Huggingface's Transformer library. This model can then be used directly to generate text, given a prompt as context. Alter the prompt to have the transformer model (GPT-2) generate an engaging story beginning using one of the following story starters:


*   It was the day the moon fell.
*   Am I in heaven?  What happened to me?
*   Wandering through the graveyard it felt like something was watching me.
*   Three of us.  We were the only ones left, the only ones to make it to the island.

There are several different methods to choose from to generate the text (as seen in the commented out lines below). Try out the different methods and play with the parameters. This [blogpost](https://huggingface.co/blog/how-to-generate) explains their differences.

(a) Which method has the best performance?

(b) Can GPT-2 generate Shakespere?

In [32]:
# Uncomment if transformers is not installed
!pip install transformers



In [33]:
# Do not modify this code
# https://huggingface.co/docs/transformers/main_classes/text_generation

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2")

gpt2_model = AutoModelForCausalLM.from_pretrained("gpt2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [34]:
# Do not modify this code

prompt = "Today I believe we can finally"

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

outputs = gpt2_model.generate(input_ids, pad_token_id=tokenizer.eos_token_id, max_length=100) # Greedy search
#outputs = gpt2_model.generate(input_ids, max_length=100, num_beams=5, no_repeat_ngram_size=3, early_stopping=True) # Beam search
#outputs = gpt2_model.generate(input_ids, do_sample=True, max_length=100, top_k=0, temperature=0.7) # Sampling
#outputs = gpt2_model.generate(input_ids, do_sample=True, max_length=100, top_k=50) # Top-k
#outputs = gpt2_model.generate(input_ids, do_sample=True, max_length=100, top_k=50, top_p=0.92) # Top-p

tokenizer.batch_decode(outputs, skip_special_tokens=True)

### To supress the warning, add:
# pad_token_id=tokenizer.eos_token_id
# for example: outputs = gpt2_model.generate(input_ids, pad_token_id=tokenizer.eos_token_id, max_length=100)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


['Today I believe we can finally get to the point where we can make a difference in the lives of the people of the United States of America.\n\nI believe that we can make a difference in the lives of the people of the United States of America.\n\nI believe that we can make a difference in the lives of the people of the United States of America.\n\nI believe that we can make a difference in the lives of the people of the United States of America.\n\n']

In [36]:
# Your answer here
prompt = "It was the day the moon fell."

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

output_batch = []

output_batch.append([gpt2_model.generate(input_ids, pad_token_id=tokenizer.eos_token_id, max_length=100), "Greedy"]) # Greedy search
output_batch.append([gpt2_model.generate(input_ids, max_length=100, num_beams=5, no_repeat_ngram_size=3, early_stopping=True), "Beam"]) # Beam search
output_batch.append([gpt2_model.generate(input_ids, do_sample=True, max_length=100, top_k=0, temperature=0.7), "Sampling"]) # Sampling
output_batch.append([gpt2_model.generate(input_ids, do_sample=True, max_length=100, top_k=50), "Top-k"]) # Top-k
output_batch.append([gpt2_model.generate(input_ids, do_sample=True, max_length=100, top_k=50, top_p=0.92), "Top-p"]) # Top-p

for output_search_pair in output_batch:
  print(f'Output of model {output_search_pair[1]}\n{tokenizer.batch_decode(output_search_pair[0], skip_special_tokens=True)}')


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generati

Output of model Gready
['It was the day the moon fell.\n\n"I was in the middle of the night, and I saw the moon rise and fall," he said. "I was in the middle of the night, and I saw the moon rise and fall."\n\nHe said he was in the middle of the night, and he saw the moon rise and fall.\n\n"I was in the middle of the night, and I saw the moon rise and fall," he said. "I was']
Output of model Beam
['It was the day the moon fell.\n\n"It was a beautiful day," he said. "It was beautiful. It was beautiful."\n\nIt was also the first time he had ever seen the moon fall. He had never seen it before, but he knew it was there. He knew it would be there for a long time to come, and he knew he would never be able to see it again. He didn\'t know how long it would take him to get there.\n']
Output of model Sampling
['It was the day the moon fell. The idea of a molotov tree was already being imitated by the engineers and their friends.\n\n"But the moon was not the only thing in the sky," said Jeanne

Model Greedy stays on topic, but produces very repetitive output and does not make very much sense.

Model Beam produces structured sentences with little grammatical issues, and does stay on topic. The content of these sentences however is not very coherent.

Model sampling also produces well structured sentences and has good grammar, and if read individually the sentences make sense but as part of the larger chain of sentences they do not make much sense altogether.

Model Top-k seems to be quite non-sensical and does not stay on the topic of the moon falling out of the sky.

Model Top-p produced very coherent and good sentences that are easy to understand and follow some strucutre / storyline, but it also does a poor job of staying on the topic of the prompt.

In [43]:
prompt = shakespeare[:180]
print(f'prompt: {prompt}')

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

output = gpt2_model.generate(input_ids, pad_token_id=tokenizer.eos_token_id, max_length=250)
tokenizer.batch_decode(output, skip_special_tokens=True)

prompt: First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First


['First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\n\nI am not a coward.\n\nAll:\n\nResolved. resolved.\n\nFirst Citizen:\n\nI am not a coward.\n\nAll:\n\nResolved. resolved.\n\nFirst Citizen:\n\nI am not a coward.\n\nAll:\n\nResolved. resolved.\n\nFirst Citizen:\n\nI am not a coward.\n\nAll:\n\nResolved. resolved.\n\nFirst Citizen:\n\nI am not a coward.\n\nAll:\n\nResolved. resolved.\n\nFirst Citizen:\n\nI am not a coward.\n\nAll:\n\nResolved. resolved.\n\nFirst Citizen:\n\nI am not a coward.\n\nAll:\n\nResolved. resolved.\n\nFirst Citizen:\n\nI am not a coward.\n\nAll:\n\nResolved. resolved.\n\nFirst Citizen:\n']

No, it appears the model can not handle generating Shakespeare. It simply devolves to repeating the same text.