fix(benchmark): assign unique color per model in scatter plots by ttlequals0 · Pull Request #231 · ttlequals0/MinusPod

ttlequals0 · 2026-05-16T02:50:05Z

Summary

Three benchmark scatter renderers in benchmarks/llm/src/benchmark/report.py (_render_pareto, _render_precision_recall_chart, _render_token_efficiency_chart) were assigning colors with cmap = plt.get_cmap("tab20") then cmap(i % 20). With more than 20 models in a sweep, the legend cycled colors so multiple models shared the same dot color. The current report has 28 models, so the Cost-vs-F1 chart had several visibly duplicate colors.
Added _distinct_colors(n) that concatenates tab20 + tab20b + tab20c (60 categorical colors with good perceptual contrast) and falls back to evenly-spaced hsv past that, so every model gets a unique color up to 60 and remains unique past 60. Magic number is derived from len(palette), not hardcoded.
All three renderers switched to colors = _distinct_colors(len(points)) and colors[i].
Regenerated the three affected SVGs (pareto.svg, precision_recall.svg, token_efficiency.svg). Other charts color by value semantics (threshold bands, heatmaps, p50/p90/p99 categories), not model identity, so they were not affected and were not regenerated to keep this diff focused.
Benchmark-only change. The benchmark tree is dockerignored; no runtime image impact, no version bump, no openapi change.

Test plan

Verified _distinct_colors(n) returns n unique RGBA tuples for n in {5, 30, 60, 80, 200} (60 from categorical, >60 from hsv).
Regenerated report locally with benchmark report; SVGs render and the three target scatters show distinct colors per model.
/simplify and /code-review ran clean against the diff (no findings above the 80-confidence threshold).
CI green on this branch.

Three benchmark scatter renderers (Pareto, precision/recall, token efficiency) drew from tab20 with `i % 20`, so any run with more than 20 models silently reused colors. With 28 models in the current report the legend now had multiple identical dots. Add `_distinct_colors(n)` that concatenates tab20 + tab20b + tab20c (60 categorical colors) and falls back to evenly-spaced hsv past that. All three renderers now call it and index `colors[i]`. The three regenerated SVGs in this commit reflect the fix. Benchmark-only; not shipped in the runtime image; no version bump.

Re-renders report.md plus every SVG in results/report_assets/ from the current calls.jsonl + corpus, so the published artifacts match the rest of the repo's snapshot pattern. The substantive change is still only the three scatter plots (pareto, precision_recall, token_efficiency) where each model now gets a unique color; the other SVGs and report.md only differ in matplotlib-generated element IDs, embedded timestamps, and PASS-row ordering for ties.

ttlequals0 added 2 commits May 15, 2026 22:49

ttlequals0 merged commit 9ce089c into main May 16, 2026
13 checks passed

ttlequals0 deleted the fix/benchmark-distinct-colors branch May 16, 2026 02:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(benchmark): assign unique color per model in scatter plots#231

fix(benchmark): assign unique color per model in scatter plots#231
ttlequals0 merged 2 commits into
mainfrom
fix/benchmark-distinct-colors

ttlequals0 commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ttlequals0 commented May 16, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant