# Lab 01: LLM API Fundamentals and Multi-turn Context (Ollama + OpenAI)

This lab is local-first for zero API cost in class.
1. Local model access via Ollama (primary)
2. OpenAI API usage (optional)


## 0) Setup

Before running:
- Install dependencies with `uv sync`
- Create `.env` from `.env.example`
- Ensure Ollama is running and your Qwen model is installed


In [None]:
import os
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

load_dotenv(override=True)

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-5-nano")

def has_real_openai_key(value: str | None) -> bool:
    if not value or not value.strip():
        return False
    cleaned = value.strip()
    lowered = cleaned.lower()
    if lowered in {"your_openai_api_key_here", "sk-your_key_here"}:
        return False
    if lowered.startswith("your_"):
        return False
    return cleaned.startswith("sk-")

OPENAI_ENABLED = has_real_openai_key(OPENAI_API_KEY)

OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "qwen3:8b")

print("OPENAI_MODEL (optional):", OPENAI_MODEL)
print("OPENAI enabled:", OPENAI_ENABLED)
print("OLLAMA_BASE_URL:", OLLAMA_BASE_URL)
print("OLLAMA_MODEL:", OLLAMA_MODEL)


## How `chat.completions.create` Works

- `model`: the model name (for example `qwen3:8b` on Ollama or `gpt-5-nano` on OpenAI).
- `messages`: conversation history passed to the model.
- `temperature`: controls randomness (lower = more deterministic).

### Message Roles
- `system`: global behavior/instructions for the assistant.
- `user`: the current user request.
- `assistant`: previous model output included as conversation context.


## 1) Local Ollama (Primary): OpenAI-Compatible Qwen Call

Use this section as the default classroom path.


In [None]:
ollama_client = OpenAI(
    base_url=f"{OLLAMA_BASE_URL.rstrip('/')}/v1",
    api_key=os.getenv("OLLAMA_API_KEY", "ollama"),
)

ollama_response = ollama_client.chat.completions.create(
    model=OLLAMA_MODEL,
    messages=[
        {"role": "system", "content": "You are a concise teaching assistant."},
        {"role": "user", "content": "Explain what a token is in LLMs in 2 sentences."}
    ],
    temperature=1,
)

display(Markdown("## Qwen Response\n\n" + ollama_response.choices[0].message.content))


## 2) OpenAI API (Optional)

Use this only if you want to spend on hosted/premium models.


In [None]:
if not OPENAI_ENABLED:
    display(Markdown("**Skipped:** set a real `OPENAI_API_KEY` in `.env` to run this section."))
else:
    openai_client = OpenAI(api_key=OPENAI_API_KEY)
    openai_response = openai_client.chat.completions.create(
        model=OPENAI_MODEL,
        messages=[
            {"role": "system", "content": "You are a concise teaching assistant."},
            {"role": "user", "content": "Explain what a token is in LLMs in 2 sentences."}
        ],
        temperature=1,
    )
    display(Markdown("## OpenAI Response\n\n" + openai_response.choices[0].message.content))


## 3) Compare Responses

Use the same prompt and compare local vs cloud output quality, style, and latency.


In [None]:
shared_prompt = "Give 3 practical tips to write better prompts for beginner LLM users."

ollama_compare = ollama_client.chat.completions.create(
    model=OLLAMA_MODEL,
    messages=[{"role": "user", "content": shared_prompt}],
    temperature=1,
)

display(Markdown("## Ollama (Qwen)\n\n" + ollama_compare.choices[0].message.content))

if OPENAI_ENABLED:
    openai_client = OpenAI(api_key=OPENAI_API_KEY)
    openai_compare = openai_client.chat.completions.create(
        model=OPENAI_MODEL,
        messages=[{"role": "user", "content": shared_prompt}],
        temperature=1,
    )
    display(Markdown("## OpenAI\n\n" + openai_compare.choices[0].message.content))
else:
    display(Markdown("**OpenAI compare skipped** because `OPENAI_API_KEY` is not configured."))


## 4) Multi-turn Example: Feed Assistant Output Back As Context

This demonstrates how an assistant message from turn 1 is sent back in turn 2 using `role="assistant"`.


In [None]:
seed_topic = "vector databases in RAG"

turn1 = ollama_client.chat.completions.create(
    model=OLLAMA_MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful instructor."},
        {"role": "user", "content": f"Generate one strong interview question about {seed_topic}. Return the question only"},
    ],
    temperature=1,
)

assistant_question = turn1.choices[0].message.content
display(Markdown("## Turn 1: Assistant-Generated Question\n\n" + assistant_question))

turn2 = ollama_client.chat.completions.create(
    model=OLLAMA_MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful instructor."},
        {"role": "assistant", "content": assistant_question},
        {"role": "user", "content": "Now answer that question in a concise beginner-friendly way."},
    ],
    temperature=1,
)

display(Markdown("## Turn 2: Answer Using Assistant Context\n\n" + turn2.choices[0].message.content))


## 5) Multi-turn Example (OpenAI API, Optional)

Same pattern as above, but using OpenAI API. This is optional and may incur cost.


In [None]:
if not OPENAI_ENABLED:
    display(Markdown("**Skipped:** set a real `OPENAI_API_KEY` in `.env` to run this section."))
else:
    seed_topic = "function calling in agents"
    openai_client = OpenAI(api_key=OPENAI_API_KEY)
    openai_turn1 = openai_client.chat.completions.create(
        model=OPENAI_MODEL,
        messages=[
            {"role": "system", "content": "You are a helpful instructor."},
            {"role": "user", "content": f"Generate one strong interview question about {seed_topic}. Return the question only"},
        ],
        temperature=1,
    )

    openai_assistant_question = openai_turn1.choices[0].message.content
    display(Markdown("## OpenAI Turn 1: Assistant-Generated Question\n\n" + openai_assistant_question))

    openai_turn2 = openai_client.chat.completions.create(
        model=OPENAI_MODEL,
        messages=[
            {"role": "system", "content": "You are a helpful instructor."},
            {"role": "assistant", "content": openai_assistant_question},
            {"role": "user", "content": "Now answer that question in 4 short bullet points for beginners."},
        ],
        temperature=1,
    )

    display(Markdown("## OpenAI Turn 2: Answer Using Assistant Context\n\n" + openai_turn2.choices[0].message.content))


## 6) Exercises
- Build your own 3-turn loop: (1) model generates a question, (2) you pass that as an `assistant` message, (3) model answers and then proposes one follow-up question.
- Repeat the loop for 2 more turns while keeping the full `messages` history.
- Compare how the conversation quality changes when you remove prior `assistant` messages.
- Optional: run the same loop on both Ollama and OpenAI and compare results.


**Hint (persistent message history):** keep a single `messages` list, append each new `assistant` output and `user` prompt, then pass the full list again in the next call.
