Skip to content

upneja/rankedllm

Repository files navigation

RANKEDLLM

The first ladder for AI operators. 1v1 ranked matches where you direct Claude Haiku 4.5 to fix a Python bug under a 40,000-token cap and a 10-minute clock. Hidden pytest decides the winner. Glicko-2 underneath, Bronze through Challenger on top.

Live at https://rankedllm.com. Closed beta cohort 1 in flight.

What's in this repo

condition-a/         The task pool
  tasks/             10 validated bug-fix challenges (csv-merger,
                     timezone-window, regex-validator, pagination,
                     json-key-mismatch, sql-injection, async-race,
                     lru-cache-keying, recursion-bound, config-loader).
                     Each task: starter/, hidden_tests/, reference_solution/.
  validate_tasks.sh  Verifies every task: starter fails, reference passes.
  PROTOCOL.md        The closed-beta participant brief.
  run.py             Synthetic Condition A driver (no API needed).

src/llm_ranked/      Python platform
  glicko.py          Glicko-2 rating math
  scoring.py         Match-result resolution
  matchmaker.py      Swiss-style pairing
  analysis.py        Spearman + Condition A decision rule
  orchestrator.py    Full Condition-A run driver
  synthetic.py       Skill simulator for methodology validation
  harness/           Match harness: tools, agent loop, scorer

web/                 The Next.js web app served at rankedllm.com
  app/               Pages + API routes
  lib/               glicko, scorer, agent, tasks, store, invites
  components/        SiteHeader, SiteFooter

tests/               Python unit tests (32 passing)
scripts/             Utilities
docs/design/         The locked v0 spec

Quickstart

# Python harness
python3 -m venv .venv
.venv/bin/pip install -e ".[dev]" pytest-asyncio
.venv/bin/pytest
bash condition-a/validate_tasks.sh

# Web app
npm install --prefix web
npm run dev --prefix web
# → http://localhost:3000

Env vars

ANTHROPIC_API_KEY     Claude calls
BETA_PASSPHRASE       Shared closed-beta gate
ADMIN_TOKEN           For /api/admin endpoints
KV_REST_API_URL       Upstash Redis (auto-set by Vercel integration)
KV_REST_API_TOKEN     Upstash auth

Without KV_REST_API_*, the store falls back to in-memory — fine for local dev, not for production.

Stack

Python 3.11+, pytest, ruff. Next.js 16, TypeScript strict, Tailwind v4, Anthropic SDK, @vercel/kv (Upstash), Resend. Vitest for TS unit tests. Vercel deployment.

Contributing

See CONTRIBUTING.md. Three tracks:

  1. New tasks for the hidden pool (condition-a/tasks/)
  2. Tooling + UI (web/, src/llm_ranked/harness/)
  3. Rating-system research (src/llm_ranked/analysis.py, synthetic.py)

License

MIT. See LICENSE.

About

Ranked AI orchestration — 1v1 matches at rankedllm.com

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors