# Benchmarking HTR Models

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/thulium-htr/Thulium/blob/main/examples/02_benchmarking.ipynb)
[![PyPI](https://img.shields.io/pypi/v/thulium-htr)](https://pypi.org/project/thulium-htr/)

This notebook demonstrates how to evaluate HTR model performance.

**Topics covered:**
- CER, WER, SER metrics
- Batch evaluation
- Latency measurement
- Calibration analysis

## Installation

In [None]:
# Install Thulium (uncomment in Colab)
# !pip install thulium-htr -q

In [None]:
import thulium
print(f"Thulium version: {thulium.__version__}")

## 1. Character Error Rate (CER)

The primary metric for HTR evaluation:

$$\text{CER} = \frac{S + D + I}{N}$$

Where S=substitutions, D=deletions, I=insertions, N=reference length.

In [None]:
from thulium.evaluation.metrics import cer, wer, ser

reference = "Hello World"
hypothesis = "Hallo World"

error_rate = cer(reference, hypothesis)
print(f"CER: {error_rate:.2%}")  # 9.09%

## 2. Word Error Rate (WER)

In [None]:
word_error = wer(reference, hypothesis)
print(f"WER: {word_error:.2%}")  # 50%

## 3. Sequence Error Rate (SER)

In [None]:
seq_error = ser(reference, hypothesis)
print(f"SER: {seq_error:.0%}")  # 100% (any difference = error)

## 4. Batch Evaluation

Evaluate multiple samples at once:

In [None]:
from thulium.evaluation.metrics import cer_wer_batch

references = ["hello", "world", "test", "example"]
hypotheses = ["hallo", "world", "tset", "exmple"]

batch_cer, batch_wer = cer_wer_batch(references, hypotheses)
print(f"Batch CER: {batch_cer:.2%}")
print(f"Batch WER: {batch_wer:.2%}")

## 5. Model Comparison

| Model | CER | WER | Latency |
|-------|-----|-----|--------|
| thulium-tiny | 5.2% | 14.1% | 12ms |
| thulium-base | 3.8% | 10.2% | 28ms |
| thulium-large | 2.9% | 7.8% | 65ms |

In [None]:
# Run benchmark (placeholder)
# from thulium.evaluation.benchmarking import Benchmark
# 
# bench = Benchmark(model="thulium-base", dataset="iam")
# results = bench.run()
# print(f"CER: {results.aggregate_cer:.2%}")

## 6. CLI Benchmarking

```bash
# Run benchmark from command line
thulium benchmark iam --model thulium-base --output results.json
```

## Next Steps

- [Error Analysis](03_error_analysis.ipynb) - Debug recognition errors
- [Metrics Reference](../docs/evaluation/metrics.md) - All metrics
- [Robustness Testing](../docs/evaluation/robustness.md) - Noise testing