<img src="https://theaiengineer.dev/tae_logo_gw_flatter.png" width=35% align=right>

# Building a Large Language Model from Scratch — A Step-by-Step Guide Using Python and PyTorch
## Chapter 11 — Testing & Sampling
**© Dr. Yves J. Hilpisch**<br>AI-Powered by GPT-5.

## How to Use This Notebook

- Compare sampling strategies such as greedy, top-k, and nucleus sampling.
- Evaluate qualitative outputs with checklists anchored to your use case.
- Instrument temperature sweeps to understand controllability.

### Roadmap

We load a trained checkpoint, generate continuations with multiple decoding schemes, and analyze the trade-offs each introduces.

### Study Tips

Save representative generations for later review. Side-by-side comparisons are invaluable during stakeholder discussions.

In [None]:
# Torch + plotting setup
import sys, subprocess
try:
    import torch  # noqa: F401
except Exception:
    idx = 'https://download.pytorch.org/whl/cpu'
    subprocess.check_call([sys.executable, '-m', 'pip', 'install',
                           '--index-url', idx, 'torch', 'torchvision',
                           'torchaudio'])
    import torch  # noqa: F401
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8')
%config InlineBackend.figure_format = 'svg'
torch.manual_seed(0); 'ok'


In [None]:
# Probability at temperature T for toy logits.
logits = torch.tensor([[2.0, 1.0, 0.2, -1.0]])

def probs_at_T(T):
    """Return softmax(logits/T) for a single toy row.
    Lower T sharpens, higher T flattens.
    """
    p = torch.softmax(logits / T, dim=-1)
    return p
probs_at_T(1.0)


In [None]:
# Plot temperature effects
Ts = [0.7, 1.0, 1.3]
fig, axes = plt.subplots(1, 3, figsize=(6.0, 2.2), constrained_layout=True)
for ax, T in zip(axes, Ts):
    p = probs_at_T(T)[0]
    ax.bar(range(len(p)), p, color='#0A66C2')
    ax.set_title(f'T={T}')
    ax.set_ylim(0, 1.0); ax.set_xticks([]); ax.set_yticks([])
fig.suptitle('Temperature'); fig


In [None]:
# Top-k and top-p filters: set low-prob tokens to a large negative
# logit so softmax effectively assigns zero probability.
def top_k_filter(logits, k):
    """Keep only the k largest logits per row.
    Others are set to a very negative number.
    """
    if k <= 0: return logits
    v, _ = torch.topk(logits, k)
    thr = v[:, [-1]]
    return torch.where(logits < thr, torch.tensor(-1e9), logits)
def top_p_filter(logits, p):
    """Keep the smallest set of tokens whose cumulative
    probability exceeds p. Works row-wise.
    """
    if p <= 0 or p >= 1: return logits
    s, idx = torch.sort(logits, dim=-1, descending=True)
    pr = torch.softmax(s, dim=-1)
    cum = torch.cumsum(pr, dim=-1)
    mask = cum > p; mask[..., 0] = False
    s = s.masked_fill(mask, -1e9)
    out = torch.empty_like(s).scatter_(1, idx, s)
    return out
top_k_filter(logits, 3), top_p_filter(logits, 0.9)


In [None]:
# One sampling step on toy logits: apply temperature and optional
# top-k/top-p, then draw the next token (or greedy if T<=0).
def step_sample(logits, T=1.0, k=None, p=None):
    """Return next token ids for a single step.
    """
    x = logits / T if T > 0 else logits
    if k is not None: x = top_k_filter(x, k)
    if p is not None: x = top_p_filter(x, p)
    if T <= 0: return torch.argmax(x, dim=-1, keepdim=True)
    pr = torch.softmax(x, dim=-1)
    return torch.multinomial(pr, num_samples=1)
step_sample(logits, T=0.8, k=3, p=0.9)


In [None]:
# Simple dummy language model for a quick perplexity demo.
class DummyLM(torch.nn.Module):
    def __init__(self, V):
        super().__init__(); self.V = V
    def forward(self, x, targets=None):
        B, T = x.size(); logits = torch.zeros(B, T, self.V)
        loss = None
        if targets is not None:
            loss = torch.nn.functional.cross_entropy(
                logits.reshape(B*T, self.V), targets.reshape(B*T)
            )
        return logits, loss
def perplexity(model, loader):
    """Compute (H, exp(H)) over a loader of (x,y) pairs.
    """
    total, tokens = 0.0, 0
    for x, y in loader:
        _, loss = model(x, targets=y)
        total += float(loss.detach().item()) * y.numel()
        tokens += int(y.numel())
    H = total / max(1, tokens)
    import math; return H, math.exp(H)
V = 16; model = DummyLM(V)
ids = torch.randint(0, V, (1, 128))
class DS(torch.utils.data.Dataset):
    def __len__(self): return 64
    def __getitem__(self, i):
        x = ids[0, i:i+32]; y = ids[0, i+1:i+33]; return x, y
dl = torch.utils.data.DataLoader(DS(), batch_size=16, drop_last=True)
perplexity(model, dl)


## Exercises

- Implement beam search and compare its outputs against nucleus sampling.
- Add automated toxicity or bias checks using an available open-source detector.
- Create a table summarizing how temperature and top-k interact across several prompts.

<img src="https://theaiengineer.dev/tae_logo_gw_flatter.png" width=35% align=right>