# Exercise: Generating one token at a time

In this exercise, we will get to understand how an LLM generates text--one token at a time, using the previous tokens to predict the following ones.


## Step 1. Load a tokenizer and a model

First we load a tokenizer and a model from HuggingFace's transformers library. A tokenizer is a function that splits a string into a list of numbers that the model can understand.

In this exercise, all the code will be written for you. All you need to do is follow along!

In [1]:
# Student Task: Load the tokenizer and model for `gpt2` from HuggingFace
# TODO: Replace the ********** with the correct code

from transformers import AutoModelForCausalLM, AutoTokenizer

# To load a pretrained model and a tokenizer using HuggingFace, we only need two lines of code!
# tokenizer = **********
# model = **********
model_name= "distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# We create a partial sentence and tokenize it.
# No changes needed here

text = "Udacity is the best place to learn about generative"
inputs = tokenizer(text, return_tensors="pt")

# Show the tokens as numbers, i.e. "input_ids"
inputs["input_ids"]

tensor([[  52,   67, 4355,  318,  262, 1266, 1295,  284, 2193,  546, 1152,  876]])

In [3]:
for id in inputs["input_ids"][0]:
    print(id, tokenizer.decode(id))

tensor(52) U
tensor(67) d
tensor(4355) acity
tensor(318)  is
tensor(262)  the
tensor(1266)  best
tensor(1295)  place
tensor(284)  to
tensor(2193)  learn
tensor(546)  about
tensor(1152)  gener
tensor(876) ative


## Step 2. Examine the tokenization

Let's explore what these tokens mean!

In [4]:
# Show how the sentence is tokenized
# No changes needed here

import pandas as pd


def show_tokenization(inputs):
    return pd.DataFrame(
        [(id, tokenizer.decode(id)) for id in inputs["input_ids"][0]],
        columns=["id", "token"],
    )


show_tokenization(inputs)

Unnamed: 0,id,token
0,tensor(52),U
1,tensor(67),d
2,tensor(4355),acity
3,tensor(318),is
4,tensor(262),the
5,tensor(1266),best
6,tensor(1295),place
7,tensor(284),to
8,tensor(2193),learn
9,tensor(546),about


### Subword tokenization

The interesting thing is that tokens in this case are neither just letters nor just words. Sometimes shorter words are represented by a single token, but other times a single token represents a part of a word, or even a single letter. This is called subword tokenization.

## Step 2. Calculate the probability of the next token

Now let's use PyTorch to calculate the probability of the next token given the previous ones.

In [5]:
# Calculate the probabilities for the next token for all possible choices. We show the
# top 5 choices and the corresponding words or subwords for these tokens.
# No changes needed here

import torch

with torch.no_grad():
    logits = model(**inputs).logits[:, -1, :]
    probabilities = torch.nn.functional.softmax(logits[0], dim=-1)

The line logits = model(**inputs).logits[:, -1, :] is extracting the model's predictions for the next token in text generation. Let me break it down step by step:

### What the Model Outputs
- model(**inputs): Runs the GPT-2 model on your tokenized input text
- .logits: Gets the raw prediction scores (logits) from the model
    - Shape: (batch_size, sequence_length, vocab_size)
    - For each position in your input sequence, it predicts scores for every possible next token
### The Slicing: [:, -1, :]
- [:, -1, :] selects:
    - : (first dimension): All batches (usually just 1 in this case)
    - -1 (second dimension): The last position in the sequence (the most recent token)
    - : (third dimension): All vocabulary scores (one score per possible token)
### Why This Specific Slice?
In autoregressive text generation, you want the model's prediction for what comes next after your current text
[:, -1, :] gives you the logits for the next token, based on the entire input sequence
These logits can then be converted to probabilities (via softmax) to choose the most likely next word
### Example
If your input is "Hello world" (tokenized to 2 tokens), the model outputs logits for every position:

- Position 0: Predictions after "Hello"
- Position 1: Predictions after "Hello world" ‚Üê This is what [:, -1, :] grabs

This is the core of how LLMs generate text one token at a time! The model uses all previous context to predict what's most likely to come next.

In [6]:
test_logits = model(**inputs).logits

In [7]:
test_logits.shape

torch.Size([1, 12, 50257])

In [8]:
test_logits[0][-1]

tensor([-77.0661, -75.2075, -79.4712,  ..., -79.0780, -83.3320, -76.6110],
       grad_fn=<SelectBackward0>)

In [9]:
probabilities.shape

torch.Size([50257])

In [10]:



def show_next_token_choices(probabilities, top_n=5):
    return pd.DataFrame(
        [
            (id, tokenizer.decode(id), p.item())
            for id, p in enumerate(probabilities)
            if p.item()
        ],
        columns=["id", "token", "p"],
    ).sort_values("p", ascending=False)[:top_n]


show_next_token_choices(probabilities)

Unnamed: 0,id,token,p
8300,8300,programming,0.117578
4673,4673,learning,0.082171
8950,8950,languages,0.028905
3725,3725,knowledge,0.026193
290,290,and,0.025532


In [11]:
df = pd.DataFrame(
        [
            (id, tokenizer.decode(id), p.item())
            for id, p in enumerate(probabilities)
            if p.item()
        ],
        columns=["id", "token", "p"],
    ).sort_values("p", ascending=False)

In [12]:
df.loc[8950,:]

id             8950
token     languages
p          0.028905
Name: 8950, dtype: object

Interesting! The model thinks that the most likely next word is "programming", followed up closely by "learning".

In [13]:
# Student task: Obtain the token id for the most probable next token
# TODO: Replace the ********** with the correct code

# Get the id of the most probable next token, i.e. the index of the highest value in `probabilities`
# Hint: torch.argmax will be useful
# Hint: `.item()` converts a single-value tensor to a standard Python number
# next_token_id = **********
next_token_id = torch.argmax(probabilities).item()


print(f"Next token id: {next_token_id}")
print(f"Next token: {tokenizer.decode(next_token_id)}")

Next token id: 8300
Next token:  programming


In [14]:
# We append the most likely token to the text.
# No changes needed here

new_text = text + tokenizer.decode(8300)
new_text

'Udacity is the best place to learn about generative programming'

## Step 3. Generate some more tokens

The following cell will take `text`, show the most probable tokens to follow, and append the most likely token to text. Run the cell over and over to see it in action!

In [15]:
# Press ctrl + enter to run this cell again and again to see how the text is generated.

from IPython.display import Markdown, display

# Show the text
print(new_text)

# Convert to tokens
inputs = tokenizer(text, return_tensors="pt")

# Calculate the probabilities for the next token and show the top 5 choices
with torch.no_grad():
    logits = model(**inputs).logits[:, -1, :]
    probabilities = torch.nn.functional.softmax(logits[0], dim=-1)

display(Markdown("**Next token probabilities:**"))
display(show_next_token_choices(probabilities))

# Choose the most likely token id and add it to the text
next_token_id = torch.argmax(probabilities).item()
new_text = new_text + tokenizer.decode(next_token_id)

Udacity is the best place to learn about generative programming


**Next token probabilities:**

Unnamed: 0,id,token,p
8300,8300,programming,0.117578
4673,4673,learning,0.082171
8950,8950,languages,0.028905
3725,3725,knowledge,0.026193
290,290,and,0.025532


## Step 4. Use the `generate` method

In [16]:
# Student Task: Use the model.generate method to generate lots of text quickly
# TODO: Replace the ********** with the correct code

from IPython.display import Markdown, display

# Start with some text and tokenize it
text = "Once upon a time, generative models"
inputs = tokenizer(text, return_tensors="pt")

# Use the `generate` method to generate a max of 100 tokens
# See: https://docs.pytorch.org/torchtune/0.3/generated/torchtune.generation.generate.html
# output = **********
output= model.generate(**inputs, max_new_tokens=100)

# Show the generated text
display(Markdown(tokenizer.decode(output[0])))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


: 

### That's interesting...

You'll notice that GPT-2 is not nearly as sophisticated as later models like GPT-4/5. It often repeats itself and doesn't always make much sense. But it's still pretty impressive that it can generate text that looks like English.

## Congrats for completing the exercise! üéâ

Give yourself a hand. And please take a break if you need to. We'll be here when you're refreshed and ready to learn more!

<br /><br /><br /><br /><br /><br /><br /><br />