RANKEDLLM

The first ladder for AI operators. 1v1 ranked matches where you direct Claude Haiku 4.5 to fix a Python bug under a 40,000-token cap and a 10-minute clock. Hidden pytest decides the winner. Glicko-2 underneath, Bronze through Challenger on top.

Live at https://rankedllm.com. Closed beta cohort 1 in flight.

What's in this repo

condition-a/         The task pool
  tasks/             10 validated bug-fix challenges (csv-merger,
                     timezone-window, regex-validator, pagination,
                     json-key-mismatch, sql-injection, async-race,
                     lru-cache-keying, recursion-bound, config-loader).
                     Each task: starter/, hidden_tests/, reference_solution/.
  validate_tasks.sh  Verifies every task: starter fails, reference passes.
  PROTOCOL.md        The closed-beta participant brief.
  run.py             Synthetic Condition A driver (no API needed).

src/llm_ranked/      Python platform
  glicko.py          Glicko-2 rating math
  scoring.py         Match-result resolution
  matchmaker.py      Swiss-style pairing
  analysis.py        Spearman + Condition A decision rule
  orchestrator.py    Full Condition-A run driver
  synthetic.py       Skill simulator for methodology validation
  harness/           Match harness: tools, agent loop, scorer

web/                 The Next.js web app served at rankedllm.com
  app/               Pages + API routes
  lib/               glicko, scorer, agent, tasks, store, invites
  components/        SiteHeader, SiteFooter

tests/               Python unit tests (32 passing)
scripts/             Utilities
docs/design/         The locked v0 spec

Quickstart

# Python harness
python3 -m venv .venv
.venv/bin/pip install -e ".[dev]" pytest-asyncio
.venv/bin/pytest
bash condition-a/validate_tasks.sh

# Web app
npm install --prefix web
npm run dev --prefix web
# → http://localhost:3000

Env vars

ANTHROPIC_API_KEY     Claude calls
BETA_PASSPHRASE       Shared closed-beta gate
ADMIN_TOKEN           For /api/admin endpoints
KV_REST_API_URL       Upstash Redis (auto-set by Vercel integration)
KV_REST_API_TOKEN     Upstash auth

Without KV_REST_API_*, the store falls back to in-memory — fine for local dev, not for production.

Stack

Python 3.11+, pytest, ruff. Next.js 16, TypeScript strict, Tailwind v4, Anthropic SDK, @vercel/kv (Upstash), Resend. Vitest for TS unit tests. Vercel deployment.

Contributing

See CONTRIBUTING.md. Three tracks:

New tasks for the hidden pool (condition-a/tasks/)
Tooling + UI (web/, src/llm_ranked/harness/)
Rating-system research (src/llm_ranked/analysis.py, synthetic.py)

License

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RANKEDLLM

What's in this repo

Quickstart

Env vars

Stack

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
condition-a		condition-a
docs		docs
scripts		scripts
src/llm_ranked		src/llm_ranked
tests		tests
web		web
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pyproject.toml		pyproject.toml
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

RANKEDLLM

What's in this repo

Quickstart

Env vars

Stack

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages