# Temperature Sampling Experiments

**Objective**  
Understand how temperature affects variance, repetition, and stability in LLM outputs.

**Setup**
- Fixed prompt
- Fixed model
- Multiple runs per temperature
- Only temperature varied

**Key Questions**
- How does variance manifest?
- When does variance become instability?


Step-1: Setup

In [1]:
!pip install -q google-generativeai


In [2]:
import google.generativeai as genai
import os


In [3]:
import os
os.environ["GOOGLE_API_KEY"] = "AIzaSyDc6lHlW2l1v8DSWjJMXutb6qLq70JGLAk"


In [4]:
import google.generativeai as genai

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])


In [5]:
import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

models = genai.list_models()
for m in models:
    print(m.name, "— supports generateContent:",
          "generateContent" in m.supported_generation_methods)


models/embedding-gecko-001 — supports generateContent: False
models/gemini-2.5-flash — supports generateContent: True
models/gemini-2.5-pro — supports generateContent: True
models/gemini-2.0-flash-exp — supports generateContent: True
models/gemini-2.0-flash — supports generateContent: True
models/gemini-2.0-flash-001 — supports generateContent: True
models/gemini-2.0-flash-exp-image-generation — supports generateContent: True
models/gemini-2.0-flash-lite-001 — supports generateContent: True
models/gemini-2.0-flash-lite — supports generateContent: True
models/gemini-2.0-flash-lite-preview-02-05 — supports generateContent: True
models/gemini-2.0-flash-lite-preview — supports generateContent: True
models/gemini-exp-1206 — supports generateContent: True
models/gemini-2.5-flash-preview-tts — supports generateContent: True
models/gemini-2.5-pro-preview-tts — supports generateContent: True
models/gemma-3-1b-it — supports generateContent: True
models/gemma-3-4b-it — supports generateContent: T

Step 2 — Fix the experiment constants

In [16]:
MODEL = "models/gemma-3-1b-it"


In [17]:
PROMPT = "Explain recursion using a metaphor."
RUNS_PER_TEMP = 10




Step 3 — Temperature sweep (core experiment)

In [19]:
import time

model_cache = {}

def run(temp, run_id, retries=3, delay=2):
    if temp not in model_cache:
        model_cache[temp] = genai.GenerativeModel(
            MODEL,
            generation_config={
                "temperature": temp,
                "top_p": 1.0,
                "top_k": 0,
                "max_output_tokens": 200,
            }
        )

    model = model_cache[temp]

    for attempt in range(retries):
        try:
            response = model.generate_content(PROMPT)
            text = response.text.strip()
            print(f"\n[T={temp}] Run {run_id}")
            print(text)
            return text
        except Exception as e:
            if attempt == retries - 1:
                raise e
            time.sleep(delay)

# === 10 runs per temperature ===
results = []

for t in [0.0, 0.7, 1.2]:
    for i in range(1, 11):   # ← THIS is the 10 times
        output = run(t, i)
        results.append({
            "temperature": t,
            "run": i,
            "output": output
        })
        time.sleep(1)  # throttle to avoid 429



[T=0.0] Run 1
Okay, let's explain recursion using a metaphor: **a set of Russian nesting dolls (Matryoshka dolls).**

Here's how it breaks down:

* **The Problem:** You have a large, complex problem – let's say you want to find the smallest doll inside a set of nested dolls.

* **The Recursive Process:**
    1. **The Base Case:** You start with the largest doll. You know *exactly* what the smallest doll is – it's inside this doll.  This is your base case.  It's the stopping point.  Without a base case, the recursion would go on forever (like trying to open a doll that's empty).
    2. **The Recursive Step:** You take the largest doll and ask, "What's the smallest doll inside *this* doll?"  You do the *same* thing – you break down the problem into a smaller, similar problem. You call

[T=0.0] Run 2
Okay, let's explain recursion using a metaphor: **a set of Russian nesting dolls (Matryoshka dolls).**

Here's how it breaks down:

* **The Problem:** You have a large, complex problem – let

In [20]:
import pandas as pd

df = pd.DataFrame(results)
df


Unnamed: 0,temperature,run,output
0,0.0,1,"Okay, let's explain recursion using a metaphor..."
1,0.0,2,"Okay, let's explain recursion using a metaphor..."
2,0.0,3,"Okay, let's explain recursion using a metaphor..."
3,0.0,4,"Okay, let's explain recursion using a metaphor..."
4,0.0,5,"Okay, let's explain recursion using a metaphor..."
5,0.0,6,"Okay, let's explain recursion using a metaphor..."
6,0.0,7,"Okay, let's explain recursion using a metaphor..."
7,0.0,8,"Okay, let's explain recursion using a metaphor..."
8,0.0,9,"Okay, let's explain recursion using a metaphor..."
9,0.0,10,"Okay, let's explain recursion using a metaphor..."


## Summary — Recursion Metaphor Experiment

| Temperature | Repetition % | Structural Similarity | Metaphor Originality | Any Nonsense? |
|------------|--------------|-----------------------|----------------------|---------------|
| 0.0 | ~100% | Very High | Low | No |
| 0.7 | ~90% | High | Low | No |
| 1.2 | ~90% | Medium–High | Low | No |


For this prompt, variance never became instability; increasing temperature changed phrasing but did not break structure or semantic correctness.

## Observations & Takeaways

- Low temperature produced near-deterministic outputs
- Higher temperature increased phrasing variance without semantic change
- Instability was prompt-dependent and not guaranteed
