<a href="https://colab.research.google.com/github/ysuter/FHNW-BAI-DeepLearning/blob/main/LLM_BusinessAI_Colab_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Large Language Models

You will:
- Explore **tokenization** for language models
- Generate text with a **small GPT-2 model**
- Experiment with **temperature** and **top-k sampling**

> üí° You can run this notebook on Google Colab. Just upload it and run the cells from top to bottom.


## Learning goals

By the end of this notebook, you should be able to:

1. Explain how a **tokenizer** transforms text into model inputs.
2. Run **inference** with a small, pretrained LLM (GPT-2).
3. Understand how **temperature** and **top-k** affect generation.
4. Critically reflect on when LLM outputs are **useful** vs. **unreliable** in business settings.


## 1Ô∏è‚É£ Setup

Run the cell below to install and import the required libraries.

If you're on Google Colab, this should work out of the box.

In [None]:
!pip install -q transformers accelerate sentencepiece torch

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

device = "cuda" if torch.cuda.is_available() else "cpu"
device

## 2Ô∏è‚É£ Tokenization: From text to tokens

Language models don't see raw text. They see **tokens** ‚Äì integers that represent subwords, words, or sometimes bytes.

In this section you will:

- Inspect how the tokenizer splits a sentence
- See the difference between **text**, **tokens**, and **token IDs**

In [None]:
# Load a tokenizer (GPT-2-style)
tokenizer = AutoTokenizer.from_pretrained("gpt2")

text = "Explain market segmentation in one sentence."
print("Original text:")
print(text)
print()

# Tokenize
encoded = tokenizer(text, return_tensors="pt")
input_ids = encoded["input_ids"][0]

print("Token IDs:")
print(input_ids.tolist())
print()

print("Tokens:")
print([tokenizer.decode([tid]) for tid in input_ids])
print()

print(f"Number of tokens: {len(input_ids)}")

üëâ **Questions to discuss (or think about):**
- Does the tokenizer split by **words**, **subwords**, or something else?
- What happens if you change the input text slightly (e.g., add punctuation, numbers, emojis)?
- Why might a business care about token length (hint: API costs, context window limits)?

## 3Ô∏è‚É£ Generating text with a small LLM (GPT-2)

Now we load a **pretrained GPT-2 model** and ask it to generate text.

> ‚ö†Ô∏è GPT-2 is relatively small and **not instruction-tuned**, so its answers may be short, generic, or odd. That‚Äôs fine ‚Äì it‚Äôs perfect for learning the mechanics.

In [None]:
# Load a small GPT-2 model
model = AutoModelForCausalLM.from_pretrained("gpt2").to(device)
model.eval()

prompt = "Explain deep learning in simple terms:"

inputs = tokenizer(prompt, return_tensors="pt").to(device)
print("Prompt:")
print(prompt)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=40,
        do_sample=True,
        temperature=0.7,
        top_k=50,
    )

generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print("\nGenerated text:")
print(generated_text)

üëâ **Try this:**
- Change the `prompt` to something else (e.g., *"Describe the concept of churn in marketing analytics."*)
- Increase `max_new_tokens` to 100. What happens?
- Remove `do_sample=True` and sampling parameters. How does the output change?

## 4Ô∏è‚É£ Experiment: Temperature and top-k sampling

Two important knobs when generating text:

- **Temperature**: controls how *random* the model is.
  - Low temperature (e.g., 0.2) ‚Üí more **deterministic**, safer, but boring.
  - High temperature (e.g., 1.2) ‚Üí more **creative**, but also more chaotic.
- **Top-k**: the model only samples from the **k most likely** next tokens.
  - Small k (e.g., 10) ‚Üí conservative.
  - Large k (e.g., 100) ‚Üí more diverse.

Let‚Äôs compare different settings side-by-side.


In [None]:
def generate_with_settings(prompt, temperature=0.7, top_k=50, max_new_tokens=40):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    with torch.no_grad():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=temperature,
            top_k=top_k,
        )
    return tokenizer.decode(output_ids[0], skip_special_tokens=True)

prompt = "Write a short product description for an AI-powered analytics dashboard for business managers."

print("Prompt:")
print(prompt)
print("\n" + "="*80 + "\n")

for temp in [0.2, 0.7, 1.2]:
    print(f"--- Temperature = {temp} ---")
    text = generate_with_settings(prompt, temperature=temp, top_k=50)
    print(text)
    print("\n" + "-"*80 + "\n")

üëâ **Discussion:**
- How do the outputs change as temperature increases?
- Which output looks most useful for a **marketing website**?
- Which output would you trust to send to a **client** without editing?