# Hugging Face for Text Generation

See [this overview](https://huggingface.co/course/chapter1/1) for an introduction to Hugging Face.

## Load Libraries and Data

In [51]:
# May need to run the following command to install Hugging Face
# !pip install transformers

import tensorflow as tf
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

tf.random.set_seed(0)

In [3]:
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Add the EOS token as a PAD token to avoid warnings
model = TFGPT2LMHeadModel.from_pretrained('gpt2', pad_token_id=tokenizer.eos_token_id)

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading tf_model.h5:   0%|          | 0.00/498M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

## Generate Text

In [38]:
# Encode our input
input_ids = tokenizer.encode('I enjoy coffee in the morning', return_tensors='tf')

In [39]:
# View input vocabulary indices
input_ids

<tf.Tensor: shape=(1, 6), dtype=int32, numpy=array([[  40, 2883, 6891,  287,  262, 3329]], dtype=int32)>

In [46]:
# Generate tokens using token-by-token maximum probabilities (a "greedy" approach)
greedy_output = model.generate(input_ids, early_stopping=True, max_length=19)
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

I enjoy coffee in the morning, but I'm not a fan of coffee in the evening.


In [50]:
# Generate tokens using beam search, a heuristic that looks for likely sequences
output = model.generate(input_ids, max_length=17, num_beams=5, early_stopping=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))

I enjoy coffee in the morning, but I don't like coffee in the evening.


In [66]:
# Generate with random sampling
sample_output = model.generate(input_ids, do_sample=True, max_length=50, 
                               top_k=0, temperature=0.5)
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

I enjoy coffee in the morning, but I don't like to spend time with my friends and family. I'm not one of those people who likes to spend time with my kids. I like to spend time with my family and friends. I like


In [65]:
# Use top K sampling
k_sample_output = model.generate(input_ids, do_sample=True, max_length=50, 
                               top_k=50)

print(tokenizer.decode(k_sample_output[0], skip_special_tokens=True))

I enjoy coffee in the morning and on the evening, on the weekends, or on weekends in particular," she told Crain's. "The fact that we're part of this industry brings a huge potential market of consumers that we don't have access


In [69]:
# Use top p sampling
p_sample_output = model.generate(input_ids, do_sample=True, max_length=50,
                                 top_p=0.9, top_k=0)

print(tokenizer.decode(p_sample_output[0], skip_special_tokens=True))

I enjoy coffee in the morning. You just need to focus on the task at hand, as much as possible. Nothing will stop you from getting back in the spirit.

I also don't like drinking – I hate it.

Whatever
