# Demo: Generating Text One Token at a Time

This notebook provides a brief demonstration of how a large language model (LLM) generates text. We will walk through the process of loading a model, tokenizing text, and generating new text step-by-step.

### Step 1. Load a Tokenizer and a Model

First, we need to load a pre-trained model and its corresponding tokenizer from the Hugging Face `transformers` library. The **tokenizer** converts text into a sequence of numbers (tokens) that the model can process, and the **model** will perform the text generation.

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# For this demo, we'll use 'distilgpt2', a smaller and faster version of GPT-2
model_name = "distilgpt2"

# Load the tokenizer and model associated with our chosen model name
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

### Step 2. Examine the Tokenization

Let's see how the tokenizer converts a simple sentence into a list of token IDs. This process is called **tokenization**.

In [7]:
# Define a starting phrase, also known as a prompt
prompt_text = "Machine learning is a field of"

# Use the tokenizer to convert the text prompt into input tensors for the model
inputs = tokenizer(prompt_text, return_tensors="pt")

# The 'input_ids' are the numerical representations of our text
print("Prompt text:", prompt_text)
print("Token IDs:", inputs["input_ids"])

Prompt text: Machine learning is a field of
Token IDs: tensor([[37573,  4673,   318,   257,  2214,   286]])


To better understand the tokenization, we can decode each ID back into its corresponding text. Notice that some tokens are whole words, while others are parts of words or punctuation. This is called **subword tokenization**.

In [8]:
import pandas as pd

# Get the list of token IDs from our inputs
token_ids = inputs["input_ids"][0].tolist()

# Decode each token ID back to its string representation
tokens = [tokenizer.decode(token_id) for token_id in token_ids]

# Display the IDs and their corresponding tokens in a table for clarity
token_df = pd.DataFrame({"ID": token_ids, "Token": tokens})

print(token_df.to_string(index=False))

   ID     Token
37573   Machine
 4673  learning
  318        is
  257         a
 2214     field
  286        of


### Step 3. Generate the Next Token

Now, we'll feed our tokenized prompt to the model and ask it to predict the single most likely next token.

In [9]:
# We use torch.no_grad() to disable gradient calculations, as we are not training the model
with torch.no_grad():
    # Get the model's raw output, called 'logits'
    outputs = model(**inputs)

    # We only care about the logits for the very last token in our input sequence
    next_token_logits = outputs.logits[:, -1, :]

    # Convert logits into probabilities using the softmax function
    probabilities = torch.nn.functional.softmax(next_token_logits, dim=-1)

    # Find the token ID with the highest probability
    most_likely_next_token_id = torch.argmax(probabilities).item()

print(f"The most likely next token ID is: {most_likely_next_token_id}")
print(f"This token is: '{tokenizer.decode(most_likely_next_token_id)}'")

The most likely next token ID is: 2267
This token is: ' research'


By repeatedly predicting the next token and appending it to our input, we can generate a longer sequence of text.

In [10]:
# Let's generate a few more tokens by repeating the process in a loop
generated_ids = inputs["input_ids"]

print("Generating 5 tokens one at a time:")
print(tokenizer.decode(generated_ids[0]), end="")

# This loop generates one token at a time
for _ in range(5):
    with torch.no_grad():
        outputs = model(generated_ids)
        next_token_logits = outputs.logits[:, -1, :]
        next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)

    # Append the newly predicted token ID to our sequence
    generated_ids = torch.cat([generated_ids, next_token_id], dim=-1)

    # Print the newly generated token
    print(tokenizer.decode(next_token_id[0]), end="")

Generating 5 tokens one at a time:
Machine learning is a field of research that has been growing

### Step 4. Use the `generate` Method

Generating tokens one-by-one manually is great for understanding the process, but it's inefficient. The `transformers` library provides a convenient `.generate()` method that handles this entire process for us.

In [11]:
from IPython.display import Markdown, display

# We start with the same tokenized prompt
inputs = tokenizer(prompt_text, return_tensors="pt")

# Use the .generate() method to create a sequence of a desired length
output_ids = model.generate(
    **inputs, max_length=50, pad_token_id=tokenizer.eos_token_id
)

# Decode the entire sequence of token IDs into a single string
generated_text = tokenizer.decode(output_ids[0])

print("--- Text Generated with model.generate() ---")
display(Markdown(generated_text))

--- Text Generated with model.generate() ---


Machine learning is a field of research that has been growing in the past few years.


































This demo shows the core logic of how a language model generates text. Now you're ready to try it yourself!

<br /><br /><br /><br /><br /><br /><br /><br /><br />