# Unit 1

## Extracting Log Probabilities for Tokens

# Introduction to Log Probabilities in LLMs

Welcome to the first lesson of the course **"Scoring LLM Outputs with Logprobs and Perplexity."** In this lesson, we will explore the concept of **log probabilities** in language models. Log probabilities help measure how likely a model considers each possible next word or token given a specific prompt. This internal signal is essential for understanding how models like GPT-3.5 make predictions and how confident they are about each token.

By the end of this lesson, you will be able to extract and interpret log probabilities from a model’s response and get a deeper look into how it evaluates different word choices.

-----

## What Are Log Probabilities?

A **log probability** is the natural logarithm of a probability value. In the context of language models, the model assigns a probability ($$p$$) to each possible next token given the preceding context. The log probability is then calculated as:

$$\text{logprob} = \log(p)$$

where $$\log$$ denotes the natural logarithm (base $$e$$). Since probabilities ($$p$$) are always between 0 and 1, their log probabilities are always negative or zero.

### Why use log probabilities?

  * Log probabilities are numerically more stable, especially when dealing with very small probabilities (as is common in language modeling).
  * They make it easier to sum probabilities across sequences, since multiplying probabilities corresponds to adding their log probabilities:

$$\log(p_1 \times p_2 \times \dots \times p_n) = \log(p_1) + \log(p_2) + \dots + \log(p_n)$$

Log probabilities are widely used in evaluating and comparing model outputs, such as in perplexity calculations and sequence scoring.

In summary, log probabilities provide a convenient and robust way to represent and manipulate the likelihoods assigned by language models to tokens and sequences.

-----

## Setting Up the Environment

To follow along, ensure you have the **OpenAI** library installed. If you're using your local environment, run:

```bash
pip install openai
```

If you're working inside the CodeSignal environment, the necessary libraries are already installed for you.

-----

## Understanding the Code Structure

Let’s take a look at a code snippet that extracts log probabilities using OpenAI’s API. We’ll use a more open-ended prompt that can yield multiple possible completions. This gives us a better opportunity to inspect how confident the model is in its different predictions.

Notice that we set `max_tokens=1` in the API call. This is essential because it tells the model to generate only a single token as output. By limiting the output to one token, we can clearly examine the log probabilities for just the immediate next token, making it much easier to interpret the model’s confidence and the alternatives at that specific position. If we allowed more tokens, the response would include log probabilities for each generated token in the sequence, which could make the analysis more complex and less focused for this introductory example.

Before iterating through the log probability data, it’s important to understand the structure of `response.choices[0].logprobs.content`. This object is a list, where each element corresponds to a generated token. Each element contains:

  * **token:** the generated token (e.g., "apple")
  * **top\_logprobs:** a list of the top alternative tokens and their log probabilities for that position. Each entry in `top_logprobs` has:
      * **token:** the alternative token
      * **logprob:** the log probability assigned to that token

This structure allows you to see not only the token the model generated, but also the model’s confidence in the top alternative tokens at that position.

```python
from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "A common fruit people eat is"}],
    max_tokens=1,
    temperature=1.0,
    logprobs=True,      # Request log probabilities for generated tokens
    top_logprobs=5      # Return the top 5 most likely tokens at each position
)
print("Top token probabilities:")
try:
    # response.choices[0].logprobs.content is a list of token info objects.
    # Each token info object contains:
    #   - token: the generated token
    #   - top_logprobs: a list of alternative tokens and their logprobs
    if response.choices[0].logprobs and response.choices[0].logprobs.content:
        for token_info in response.choices[0].logprobs.content:
            print(f"\nGenerated token: {token_info.token}")
            if token_info.top_logprobs:
                for alt in token_info.top_logprobs:
                    print(f"  {alt.token} → {alt.logprob:.3f}")
    else:
        print("No token probabilities returned.")
except Exception as e:
    print(f"An error occurred: {e}")
```

-----

## Extracting and Interpreting Log Probabilities

Once the model responds, we inspect the returned token and its associated alternatives. Each alternative token is associated with a **log probability**, which represents the model’s confidence. Log probabilities are typically negative, with values closer to zero indicating higher confidence. The more negative the log probability, the less likely the model considers that token as the next word.

For example, you may see:

```text
Generated token: apple
  apple → -0.018
  banana → -0.576
  orange → -1.043
  grapes → -1.832
  fruit → -2.067
```

In this example, "apple" has the highest confidence with a log probability of -0.018, making it the most likely token. "Banana" follows with a log probability of -0.576, indicating it is less likely than "apple" but more likely than "orange", "grapes", and "fruit". Understanding these values helps in assessing the model's prediction confidence and the relative likelihood of different tokens.

-----

## Visualizing Log Probabilities and Probabilities

To better understand the relationship between log probabilities and probabilities, you can visualize both for the top tokens using a simple plot. This helps you see how a small difference in log probability can correspond to a much larger difference in actual probability.

```python
import matplotlib.pyplot as plt
import numpy as np

# Example log probabilities returned by the model
token_probs = {
    "apple": -0.018,
    "banana": -0.576,
    "orange": -1.043,
    "grapes": -1.832,
    "fruit": -2.067
}
tokens = list(token_probs.keys())
log_probs = np.array(list(token_probs.values()))
probs = np.exp(log_probs)  # Convert logprobs to probabilities

x = np.arange(len(tokens))

plt.figure(figsize=(10, 6))

# Plot probabilities
plt.plot(x, probs, marker='o', linestyle='-', color='navy', label='Probability (exp(log(p)))')

# Plot log probabilities
plt.plot(x, log_probs, marker='s', linestyle='--', color='darkgreen', label='Log Probability (log(p))')

# Annotate each point with its value
for i in x:
    plt.text(i, probs[i] + 0.01, f"p={probs[i]:.3f}", ha='center', va='bottom', fontsize=9, color='blue')
    plt.text(i, log_probs[i] - 0.1, f"log(p)={log_probs[i]:.3f}", ha='center', va='top', fontsize=9, color='green')

plt.xticks(x, tokens)
plt.title("Log Probabilities vs Probabilities of Tokens", fontsize=14)
plt.xlabel("Tokens")
plt.ylabel("Value")
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend()
plt.tight_layout()
plt.show()
```

The output plot looks like this:

This plot shows:

  * The log probability for each token (dashed green line)
  * The corresponding probability (solid blue line)
  * Value annotations for both log probability and probability

Notice how the token with the log probability closest to zero ("apple") also has the highest probability, and how the differences in log probability translate to much larger differences in actual probability. This visualization can help you build intuition for interpreting log probabilities in practice.

-----

## Summary and Next Steps

In this lesson, you learned what log probabilities are, how to retrieve them using the OpenAI API, and how to interpret them in practice. This technique is useful for gaining deeper insight into model behavior and understanding how confident it is in its generated outputs.

In the next lesson, we’ll build on this by using log probabilities to compare sentence likelihoods—giving us a tool for scoring how "natural" different sentences sound to a language model.

Stay curious and experiment with different prompts to see how token predictions vary\!

## Fixing Token Probability Display Code

Now that you understand what log probabilities are and how they help us peek into a language model's "thinking," let's put that knowledge into practice! In this exercise, you'll fix a piece of code that extracts log probabilities from an OpenAI model but has some issues with processing and displaying the results.

Your tasks are to:

Fix the loop that iterates through tokens in the response.
Correct how the code accesses log probability data.
Format the output to show token → logprob with 3 decimal places.
Add proper error handling for when no tokens are returned.
By completing this exercise, you'll gain hands-on experience with extracting and interpreting log probabilities — a fundamental skill for analyzing model confidence that we'll build upon in future lessons.

```python
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Name a color that people often wear:"}],
    max_tokens=1,
    temperature=1.0,
    logprobs=True,
    top_logprobs=5
)

print("Top token probabilities:")
try:
    if response.choices[0].logprobs:
        # TODO: Fix this loop to correctly iterate through the tokens in the response
        for token in response.choices[0].logprobs:
            print(f"\nGenerated token: {token}")
            # TODO: Fix this loop to correctly access the top log probabilities
            for alt_token, prob in token.items():
                # TODO: Update this print statement to format as "token → logprob" with 3 decimal places
                print(f"  {alt_token}: {prob}")
    # TODO: Add an else clause to handle cases when no token probabilities are returned
except Exception as e:
    print(f"An error occurred: {e}")
```

```python
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Name a color that people often wear:"}],
    max_tokens=1,
    temperature=1.0,
    logprobs=True,
    top_logprobs=5
)

print("Top token probabilities:")
try:
    if response.choices[0].logprobs and response.choices[0].logprobs.content:
        # Fixed: Correctly iterate through the list of token information objects
        for token_info in response.choices[0].logprobs.content:
            print(f"\nGenerated token: {token_info.token}")
            
            # Fixed: Correctly access the top log probabilities, which is a list of objects
            if token_info.top_logprobs:
                for alt in token_info.top_logprobs:
                    # Fixed: Correctly format the output using the object's attributes
                    print(f"  {alt.token} → {alt.logprob:.3f}")
    else:
        # Added: Handle the case where no token probabilities are returned
        print("No token probabilities returned.")
except Exception as e:
    print(f"An error occurred: {e}")
```

## Making Token Probabilities Dynamic

Excellent work on fixing the token probability display code! Now, let's make our code more flexible by allowing it to show different numbers of alternative tokens. Currently, we're always requesting the top 5 token probabilities, but what if we want to see more or fewer options?

In this exercise, you'll modify the code to use a variable for the number of top log probabilities instead of a hardcoded value. This will allow you to easily experiment with different settings to see how the model ranks its token choices.

Your tasks are to:

Create a variable to control how many top token probabilities to return.
Update the API call to use this variable instead of the fixed value.
Run your code with different values (3, 5, and 7) to observe the differences.
Make sure the output message reflects the number you've chosen.
By making this improvement, you'll gain a more flexible tool for exploring model predictions at different levels of detail — an important skill for analyzing model behavior in various contexts.

```python
from openai import OpenAI

client = OpenAI()

# TODO: Create a variable to control how many top token probabilities to return

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Name a color that people often wear:"}],
    max_tokens=1,
    temperature=1.0,
    logprobs=True,
    # TODO: Replace the hardcoded value with your variable
    top_logprobs=5
)

print("Top token probabilities:")
try:
    if response.choices[0].logprobs and response.choices[0].logprobs.content:
        for token_info in response.choices[0].logprobs.content:
            print(f"\nGenerated token: {token_info.token}")
            if token_info.top_logprobs:
                for alt in token_info.top_logprobs:
                    print(f"  {alt.token} → {alt.logprob:.3f}")
    else:
        print("No token probabilities returned.")
except Exception as e:
    print(f"An error occurred: {e}")

# TODO: Run your code with different values (3, 5, and 7) for the number of top log probabilities
# TODO: Compare the results and notice how the number of alternative tokens changes

```

```python
from openai import OpenAI

client = OpenAI()

# Task 1: Create a variable to control how many top token probabilities to return
num_top_logprobs = 5  # You can change this to 3, 5, or 7 to test

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Name a color that people often wear:"}],
    max_tokens=1,
    temperature=1.0,
    logprobs=True,
    # Task 2: Replace the hardcoded value with your variable
    top_logprobs=num_top_logprobs
)

# Task 4: Make the output message reflect the number chosen
print(f"Top {num_top_logprobs} token probabilities:")
try:
    if response.choices[0].logprobs and response.choices[0].logprobs.content:
        for token_info in response.choices[0].logprobs.content:
            print(f"\nGenerated token: {token_info.token}")
            if token_info.top_logprobs:
                for alt in token_info.top_logprobs:
                    print(f"  {alt.token} → {alt.logprob:.3f}")
    else:
        print("No token probabilities returned.")
except Exception as e:
    print(f"An error occurred: {e}")

# Task 3: Run your code with different values (3, 5, and 7) for the number of top log probabilities
# When you set num_top_logprobs = 3, you will see the top 3 alternatives.
# When you set num_top_logprobs = 5, you will see the top 5 alternatives.
# When you set num_top_logprobs = 7, you will see the top 7 alternatives.
# By changing this single variable and rerunning the script, you can easily control the number of returned probabilities.
```

## Filtering Tokens by Probability Threshold


Now that you've made your token probability code more flexible, let's take it a step further by focusing on the most relevant information. In real-world applications, we often want to filter out low-probability tokens and focus only on the most likely options.

In this exercise, you'll add a filtering mechanism to your code that shows only tokens with log probabilities above a certain threshold. This helps clean up your output and allows you to focus on what matters most.

Your tasks are to:

Add a threshold variable (try starting with -1.0)
Modify the display loop to show only tokens above this threshold
Track how many tokens pass your filter
Add a summary showing how many tokens were filtered out
Try experimenting with different threshold values (-0.5, -1.0, -2.0) to see how they affect your results. This filtering technique will help you develop a more practical understanding of how to interpret and use log probabilities in your analytical work.


```python
from openai import OpenAI

client = OpenAI()

# TODO: Add a threshold variable for filtering log probabilities

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Name a color that people often wear:"}],
    max_tokens=1,
    temperature=1.0,
    logprobs=True,
    top_logprobs=5
)

print("Top token probabilities:")
try:
    if response.choices[0].logprobs and response.choices[0].logprobs.content:
        for token_info in response.choices[0].logprobs.content:
            print(f"\nGenerated token: {token_info.token}")
            if token_info.top_logprobs:
                # TODO: Add variables to track total and filtered token counts
                
                for alt in token_info.top_logprobs:
                    # TODO: Add a conditional check to only print tokens above the threshold
                    print(f"  {alt.token} → {alt.logprob:.3f}")
                    # TODO: Update the counter for filtered tokens if needed
                
                # TODO: Add a summary showing how many tokens were filtered out
    else:
        print("No token probabilities returned.")
except Exception as e:
    print(f"An error occurred: {e}")

# Try different threshold values (-0.5, -1.0, -2.0) to see how they affect the results

```

```python
from openai import OpenAI

client = OpenAI()

# Task 1: Add a threshold variable for filtering log probabilities
logprob_threshold = -1.0  # You can change this to -0.5, -1.0, or -2.0

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Name a color that people often wear:"}],
    max_tokens=1,
    temperature=1.0,
    logprobs=True,
    top_logprobs=5
)

print("Top token probabilities (filtered with a threshold of " + str(logprob_threshold) + "):")
try:
    if response.choices[0].logprobs and response.choices[0].logprobs.content:
        for token_info in response.choices[0].logprobs.content:
            print(f"\nGenerated token: {token_info.token}")
            if token_info.top_logprobs:
                # Task 2: Add variables to track total and filtered token counts
                total_tokens = len(token_info.top_logprobs)
                filtered_tokens_count = 0
                
                print("Tokens above threshold:")
                
                for alt in token_info.top_logprobs:
                    # Task 3: Add a conditional check to only print tokens above the threshold
                    if alt.logprob >= logprob_threshold:
                        print(f"  {alt.token} → {alt.logprob:.3f}")
                    else:
                        # Task 4: Update the counter for filtered tokens
                        filtered_tokens_count += 1
                
                # Task 5: Add a summary showing how many tokens were filtered out
                print(f"\nSummary: {total_tokens - filtered_tokens_count} of {total_tokens} tokens shown.")
    else:
        print("No token probabilities returned.")
except Exception as e:
    print(f"An error occurred: {e}")

# Try different threshold values (-0.5, -1.0, -2.0) to see how they affect the results
# For a threshold of -0.5, you will likely see only a few of the top tokens.
# For a threshold of -1.0, you will see a broader range of high-confidence tokens.
# For a threshold of -2.0, most of the top 5 tokens will likely be displayed.
```