## 🖼️ Lab 5 – Story with Images

In this notebook, you're going to combine everything you've built so far and bring your story to life with illustrations.

You'll use a **diffusion model** to generate one image for each part of your story, based on the text of that chapter. By the end, you'll have a complete, multi-part story — with both **text and AI-generated images** — displayed together in the notebook.

This is where your project becomes truly multimodal!

### What We'll Do:

- Generate a table of contents from your story idea (like in Lab 4)
- Use the LLM to write each chapter, one at a time
- For each chapter, generate a matching image using a diffusion model
- Display the full story with illustrations

Let’s bring your story to life!

In [None]:
# 📦 Load the diffusion model (run this cell only once)
from diffusers import StableDiffusion3Pipeline
import torch

pipe = StableDiffusion3Pipeline.from_pretrained(
    "ckpt/stable-diffusion-3.5-medium", 
    torch_dtype=torch.bfloat16
).to("cuda")

In [None]:
from IPython.display import display, Markdown
from PIL import Image
import ollama

# 🔧 Configuration
MODEL_NAME = "gemma3:4b-it-qat"
NUM_PARTS = 5

# ✍️ TODO: Fill in your story idea below
story_idea = ""

assert story_idea != ""

# STEP 1: Generate table of contents
system_prompt_toc = f"""
You are a storyteller. We're going to generate a multi-part story.
Your task is to plan the structure, by generating story abstracts.

Given the user prompt as idea, generate {NUM_PARTS} one-line abstracts.
Each abstract should represent the content of one part of the story.
Do not write the full story yet. Just output the abstracts, one per line.
Only output the {NUM_PARTS} one-line abstracts. Nothing else.
"""

response_toc = ollama.chat(
    model=MODEL_NAME,
    messages=[
        {"role": "system", "content": system_prompt_toc},
        {"role": "user", "content": story_idea}
    ]
)

abstracts_raw = response_toc["message"]["content"].strip()
abstracts = [line.strip() for line in abstracts_raw.split("\n") if line.strip()]
abstracts_full = "\n".join(abstracts)

print("=== Abstracts ===")
print(abstracts_full)

# STEP 2: Generate story chapters + illustrations
full_story_so_far = ""

for i, abstract in enumerate(abstracts):
    # Text generation
    system_prompt_chapter = """
You are a storyteller. We're building a multi-part story.
Please write the next part of the story.
Use previous parts as context to ensure continuity.
Make sure the text flows naturally from earlier events.
"""

    user_prompt_chapter = f"""
All abstracts for the full story (one per line):
{abstracts_full}

Here is the story so far:
{full_story_so_far}

Now continue with part {i+1}: {abstract}
Each part should be no more than 2 paragraphs!
"""

    response_chapter = ollama.chat(
        model=MODEL_NAME,
        messages=[
            {"role": "system", "content": system_prompt_chapter},
            {"role": "user", "content": user_prompt_chapter}
        ]
    )

    chapter_text = response_chapter["message"]["content"].strip()
    full_story_so_far += "\n\n" + chapter_text

    # Display text
    display(Markdown(f"### Part {i+1}: {abstract}"))
    display(Markdown(chapter_text))

    # 🧠 TODO: Turn the chapter text into an image prompt for the diffusion model
    # 
    # Right now, 'image_prompt' is empty, so the cell will crash until you fix it.
    # Your task: extract or create a short prompt that describes the scene.
    #
    # ⚠️ The diffusion model only accepts prompts up to 77 characters — longer text will be cut off!
    #
    # 💡 Tips:
    # - The simplest version is to just use the beginning of the chapter:
    #     image_prompt = chapter_text[:77]
    # - You can extract the first sentence using .split(".")[0]
    # - A better solution: use the LLM to generate a visual description
    #     (e.g., "Describe this scene in one line for a text-to-image model.")
    # - Try calling your LLM again inside the loop to do this!

    image_prompt = ""  # 👈 TODO: Replace this with your own logic!

    assert image_prompt != ""

    # Generate and display image
    image = pipe(
        prompt=image_prompt,
        num_inference_steps=20,
        guidance_scale=5,
        width=512,
        height=512
    ).images[0]

    display(image)

    # Optional: Save to file
    # image.save(f"chapter-{i+1}.png")