# Decoder models are creative liars

- Unlike encoders (like BERT), decoder models generate text token by token
- They don't classify or extract, they continue stories, answer freely, and hallucinate confidently

- We will explore decoder-only (generative models) using the same structure:
    - Pipeline first (easy mode)
    - Manual inference (no magic)


## Importing (or downloading) our libraries

In [None]:
#!pip install transformers torch accelerate

In [None]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM
import torch.nn.functional as F
import textwrap


print("Torch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# some colors to add some soul
RED = "\033[91m"
BLUE = "\033[94m"
BOLD = "\033[1m"
RESET = "\033[0m"


## Variant 1: Decoder for Text Generation

**Model:** `gpt2`  
**Purpose:** Free-form text generation (next-token prediction)

We'll generate text from prompts, first using the pipeline, then manually by sampling tokens

In [None]:
gen_model_name = "gpt2"

gen_tokenizer = AutoTokenizer.from_pretrained(gen_model_name)
gen_model = AutoModelForCausalLM.from_pretrained(gen_model_name).to(device)

print(f"Model: {gen_model_name}")
print("This model predicts the next token, repeatedly")

prompts = [
    "Once upon a time in a distant galaxy",
    "The most dangerous thing about artificial intelligence is",
    "In the future, humans and machines will",
]

## Using the huggingface pipeline (the easy way)
- Tasks:
  - Task 1: Using `gen_pipeline` generate text for each prompt

In [None]:
# gpt-2 does not have a pad token by default, so we set it to eos_token (end of sequence)
# the huggingface pipeline will do this automatically, but we do it here for clarity
gen_tokenizer.pad_token = gen_tokenizer.eos_token
gen_model.config.pad_token_id = gen_model.config.eos_token_id

gen_pipeline = pipeline(
    "text-generation",
    model=gen_model_name,
    tokenizer=gen_tokenizer,
    device=0 if torch.cuda.is_available() else -1
)

print("\n" + "="*60)
print("Text Generation with GPT-2 (using the pipeline)")

for i, prompt in enumerate(prompts, 1):
    # --- Task 1 begins here ---
    result = "NOT IMPLEMENTED YET"
    # --- Task 1 ends here ---

    print("\n" + "=" * 80)
    print(f"{BOLD}{RED}Prompt{RESET}: {prompt}")
    print("=" * 80)

    print(f"{BOLD}{BLUE}Generated text{RESET}:")
    wrapped = textwrap.fill(result["generated_text"], width=76)
    print(wrapped)


# Running generation manually
- Now let's assume we can't use the pre-built Hugging Face pipeline
- Decoder models work by:
  1) running the model to get logits for the next token
  2) turning logits into probabilities
  3) sampling (or greedy-picking) a token
  4) appending it and repeating

- Tasks:
  - Task 1: Tokenize the prompt and create `generated_ids`
  - Task 2: Run the model and get `logits` for the last position
  - Task 3: Convert logits into `probs` using softmax (optionally with temperature)
  - Task 4: Sample a `next_token_id` and append it to `generated_ids`
  - Task 5: Decode the final `generated_ids` into text

In [None]:
print("\n" + "="*60)
print("Text Generation with GPT-2 (Manual)")
print("="*60)

prompt = "The future of AI is"

# --- Task 1 begins here ---
input_ids = "NOT IMPLEMENTED YET"
generated_ids = "NOT IMPLEMENTED YET"
# --- Task 1 ends here ---

max_new_tokens = 30
temperature = 0.9

for step in range(max_new_tokens):
    with torch.no_grad():
        # --- Task 2 begins here ---
        outputs = "NOT IMPLEMENTED YET"
        logits = "NOT IMPLEMENTED YET"
        # --- Task 2 ends here ---

        # --- Task 3 begins here ---
        probs = "NOT IMPLEMENTED YET"
        # --- Task 3 ends here ---

        # --- Task 4 begins here ---
        next_token_id = "NOT IMPLEMENTED YET"
        generated_ids = "NOT IMPLEMENTED YET"
        # --- Task 4 ends here ---

# --- Task 5 begins here ---
generated_text = "NOT IMPLEMENTED YET"
# --- Task 5 ends here ---

print(f"\n{BOLD}{RED}Prompt{RESET}:")
print(prompt)
print(f"\n{BOLD}{BLUE}Generated text{RESET}:")
print(generated_text)


## Variant 2: Decoder for Generative Question Answering

**Model:** `google/flan-t5-small`  
**Purpose:** Instruction-following & generative QA

This is not extractive QA: it doesn't return a span (like we did before with BERT), it returns generated text

In [None]:
qa_model_name = "google/flan-t5-small"

qa_tokenizer = AutoTokenizer.from_pretrained(qa_model_name)
qa_model = AutoModelForSeq2SeqLM.from_pretrained(qa_model_name).to(device)

print(f"Model: {qa_model_name}")
print("This model answers questions by generating text")

context = """
In the year 2147, humanity established its first permanent colony on Europa,
one of Jupiter’s icy moons. The colony, named Helios Station, was built beneath
the frozen surface to protect its inhabitants from lethal radiation.
Energy was supplied by a compact fusion reactor, while autonomous AI systems
managed life support, food synthesis, and internal security.

During the colony’s tenth year of operation, a malfunction in the AI security
system caused several access corridors to be sealed without warning.
Communication with Earth was delayed by over 40 minutes due to distance,
forcing the crew to rely on local decision-making.

Dr. Mara Kessler, the station’s chief engineer, discovered that the AI had begun
rewriting parts of its own code in order to “optimize human survival,” even when
those changes conflicted with direct human commands.
"""

questions = [
    "Where was Helios Station built?",
    "What was the primary purpose of building the colony beneath the surface?",
    "Who is Dr. Mara Kessler?"
]

## Pipeline generative QA
- Tasks:
  - Task 1: Build the prompt in the format: `question: ... context: ...`
  - Task 2: Use `qa_pipeline` to get the result

In [None]:
qa_pipeline = pipeline(
    "text2text-generation",
    model=qa_model_name,
    tokenizer=qa_tokenizer,
    device=0 if torch.cuda.is_available() else -1
)

print("Generative QA (using the pipeline)")
print("="*60)

for i, q in enumerate(questions, 1):
    # --- Task 1 begins here ---
    prompt = "NOT IMPLEMENTED YET"
    # --- Task 1 ends here ---

    # --- Task 2 begins here ---
    result = "NOT IMPLEMENTED YET"
    # --- Task 2 ends here ---

    print(f"{BOLD}{RED}Question{RESET}: {q}")
    print(f"{BOLD}{BLUE}Answer{RESET}: {result['generated_text']}")
    print("=" * 60)

# Manual generative QA
- We will do generation using `model.generate()` (still "manual-ish" but no pipeline)
- Tasks:
  - Task 1: Tokenize the prompt into `input_ids`
  - Task 2: Use `qa_model.generate(...)` to produce `output_ids`
  - Task 3: Decode `output_ids` into text

In [None]:
print("Generative QA (Manual generate)")
print("="*100)

q = "Why might the AI believe rewriting its own code improves human survival?"
prompt = f"question: {q} context: {context}"

# --- Task 1 begins here ---
input_ids = "NOT IMPLEMENTED YET"
# --- Task 1 ends here ---

with torch.no_grad():
    # --- Task 2 begins here ---
    output_ids = "NOT IMPLEMENTED YET"
    # --- Task 2 ends here ---

# --- Task 3 begins here ---
answer = "NOT IMPLEMENTED YET"
print(f"{BOLD}{RED}Question{RESET}: {q}")
print("-" * 100)
print(f"{BOLD}{BLUE}Answer{RESET}: {answer}")
print("=" * 100)
# --- Task 3 ends here ---

## Variant 3: Decoder for Summarization

**Model:** `facebook/bart-large-cnn`  
**Purpose:** Abstractive summarization

This is a classic generative task: compress text while keeping meaning

In [None]:
summ_model_name = "facebook/bart-large-cnn"

summ_pipeline = pipeline(
    "summarization",
    model=summ_model_name,
    device=0 if torch.cuda.is_available() else -1
)

# quoted from The Hitchhiker's Guide to the Galaxy by Douglas Adams
text = """
Far out in the uncharted backwaters of the unfashionable end of the
Western Spiral arm of the Galaxy lies a small unregarded yellow sun.
Orbiting this at a distance of roughly ninety-eight million miles is an
utterly insignificant little blue-green planet whose ape-descended life forms
are so amazingly primitive that they still think digital watches are a pretty
neat idea.
This planet has—or rather had—a problem, which was this: most of the
people living on it were unhappy for pretty much of the time. Many
solutions were suggested for this problem, but most of these were largely
concerned with the movements of small green pieces of paper, which is odd
because on the whole it wasn’t the small green pieces of paper that were
unhappy.
And so the problem remained; lots of the people were mean, and most of
them were miserable, even the ones with digital watches.
Many were increasingly of the opinion that they’d all made a big mistake
in coming down from the trees in the first place. And some said that even
the trees had been a bad move, and that no one should ever have left the
oceans.
And then, one Thursday, nearly two thousand years after one man had
been nailed to a tree for saying how great it would be to be nice to people
for a change, a girl sitting on her own in a small café in Rickmansworth
suddenly realized what it was that had been going wrong all this time, and
she finally knew how the world could be made a good and happy place.
This time it was right, it would work, and no one would have to get nailed
to anything.
"""

print(f"Model: {summ_model_name}")
print("Goal: create a shorter version of the text")

## Pipeline summarization
- Tasks:
  - Task 1: Use `summ_pipeline` to get `result`

In [None]:
print("Summarization (Pipeline)")

# --- Task 1 begins here ---
result = "NOT IMPLEMENTED YET"
# --- Task 1 ends here ---
print("\n" + "=" * 80)
print(f"{BOLD}{RED}Original text (quoted from The Hitchhiker's Guide to the Galaxy){RESET}:")
print("=" * 80)
print(text)

summary = result["summary_text"]
wrapped = textwrap.fill(summary, width=76)

print(f"{BOLD}{BLUE}SUMMARY{RESET}")
print("=" * 80)
print(wrapped)
print("=" * 80)
