Skip to content

Roadmap

tavlean edited this page Jul 2, 2026 · 3 revisions

Roadmap — RankedAGI Raycast extension (founded 2026-07-02 by Claude Fable 5; moved to the wiki 2026-07-02)

This page is the single source of truth for this initiative's plan and status. The "Current phase" line and the checkboxes ARE the status. History lives in Worklog. The main site's roadmap lives in the site repo (~/Development/Tavlean/RankedAGI/docs/fable-plans/ROADMAP.md) and covers a different initiative (score model v2); this initiative deliberately stays out of its way — see "Parallel-work rules" below.

Current phase: R1 done and LIVE in production; R2 + R3 BUILT (2026-07-02) — both unticked until Tav tries them in Raycast (npm run dev in the extension repo, browse both commands, ask @rankedagi a scores question). https://rankedagi.com/api/export is live and verified — the extension works against production out of the box, no Data URL override needed. Repos are on GitHub (tavlean/rankedagi-raycast + this wiki). Next after Tav's confirmation: R4 store submission.

Vision

RankedAGI in the launcher: hit the Raycast hotkey, type a model name, and see how it ranks — RAGI composites, every benchmark score, pricing, links — without opening a browser. Same for benchmarks: pick SWE-Bench Pro and see how every model ranks on it. Published on the public Raycast Store, where it doubles as a discovery/marketing surface for rankedagi.com. AI tools let Raycast AI answer questions like "how does Opus 4.8 score on SWE-Bench Pro?" from the same data.

Decisions locked with Tav (2026-07-02)

  • Separate repo at ~/Development/RaycastExtensions/rankedagi-raycast — zero clash with the score-model sessions running in the site repo. The only site-repo touch is one small additive endpoint (R1).
  • Built to public Raycast Store standards from day one; docs live in THIS wiki, never in the extension repo (it gets PR'd to the store verbatim — DevServers rule).
  • v1 scope: Search Models + Search Benchmarks + AI tools. Menu-bar command and anything else → Later.
  • Data: ONE prerendered JSON endpoint on rankedagi.com (/api/export), fetched with stale-while-revalidate caching; all search/filter/rank happens client-side. Real scores only in v1 — no simulated estimates; missing cells render "—"; reasoning levels with zero real values never appear (matches the site's addendum-17 rule; filtered in the endpoint AND defensively in the extension).

Foundations

F1 — Data contract, /api/export v1. Full detail: Data Contract (consumer side) and docs/api-export.md in the site repo (producer side). Evolution rule: additive only.

F2 — Extension data layer. src/lib/ is shared by both view commands AND the AI tools — build once, consume three times. Detail: Architecture; rationale: Raycast Docs Research.

Phases

  • R1 — /api/export endpoint in the site repo [execute] — done 2026-07-02: src/routes/api/export/+server.js + pure helper src/lib/server/exportDataset.js + unit tests (site commits 3b452b2, 930f221); LIVE in production the same day (verified: 208 models / 78 benchmarks / 13 families with levels, empty-levels fix in effect). Prod gotcha found & handled: the static host serves it as application/octet-stream — see Data Contract.

  • R2 — Extension scaffold + the two core commands [execute] — BUILT 2026-07-02 (extension commits 15302b7, 8ee126d); npm run build + npm run lint clean. Unticked until Tav browses models + benchmarks in Raycast via npm run dev. Search Models: List + isShowingDetail, dropdown re-rank by RAGI Overall / Code / Agentic / Reasoning / Math, rank + composite accessories, detail = composites table, benchmark-scores table (only benchmarks with values), reasoning-levels table (only levels/columns with values), metadata (org, released, license, cost, links). Search Benchmarks: sections (RAGI Composites first, then per category), detail = description + top-20 ranking + metadata. Both: useFetch stale-while-revalidate, Data URL preference for local dev, store-review rules baked in (Title Case, isLoading, placeholders, real icon, MIT, no analytics, US English).

  • R3 — AI tools [execute] — BUILT 2026-07-02 (extension commit 2ecce42). Unticked until Tav's end-to-end @rankedagi try. Four read-only tools in src/tools/ (search-models, get-model, rank-models-by-benchmark, compare-models) + ai.yaml (instructions + 3 evals — current docs put AI properties in root ai.yaml, NOT package.json.ai). Tools use the non-hook data path (src/lib/dataset.ts: fetch + Cache, 1 h TTL, stale fallback). Fuzzy resolution echoes back exactly what was matched; outputs capped.

  • R4 — Store submission [execute, needs Tav]

    • author is "tavlean" — CONFIRMED (DevServers ships under this handle; fork tavlean/raycast-extensions already exists and is the submission vehicle).
    • Screenshots: 3–6 PNGs at 2000×1250 in a metadata/ folder at the extension root (DevServers convention: metadata/dev-servers-1.png etc.).
    • Icon light/dark check (icon@dark.png only if the blue-on-blue ever fails), CHANGELOG.md uses {PR_MERGE_DATE} placeholder (already does), README present.
    • Submit: npm run publish (opens the PR into raycast/extensions via the fork) — verify the PR carries ONLY extension files (no CLAUDE.md, no .claude/; docs already live in this wiki so there's nothing to leak). Depends on: R2 + R3 ticked. Done when: extension live on the Raycast Store.

Later — deliberately (tracks the site's upcoming features)

The site is mid-rebuild of what a "score" is (score-record v2). v1 already rides part of that foundation (per-reasoning-level scores render in the model detail, only where values exist). Each entry below becomes buildable when the matching site phase ships; each is additive on the extension side.

  • Per-score sources (receipts) — nearest-term. The site attaches ordered sources to every real score: first source = the shown value, optional note, a second value from a disagreeing source, retroactive-revision context (Tav's footnote model). The data side already exists: the site serves a slimmed, prerendered /api/score-provenance (draft AI-ingested entries excluded) keyed by model slug × benchmark key. Once Tav's real entries fill the sidecar (site Phase 4 ticking), add: source links per score in the model detail panel, a "where these numbers come from" block in benchmark detail, and AI tools citing the source URL when asked. Implementation choice then: fetch /api/score-provenance as a second cached dataset (zero site changes) or fold slimmed sources into /api/export (additive either way).
  • Confidence / disagreeing-source display — rides the same provenance data: when a cell holds two values from disagreeing sources, surface the second value + note instead of silently showing one number. Wait until the site settles its own display treatment (its Phase 6 / Nerd view) and mirror the language.
  • Best-level semantics updates — the site's Phase 6 decides collapsed-row semantics (coherent best level vs per-benchmark best) and expanded-level modes. The extension inherits collapse changes automatically through /api/export (it reuses the site's $lib/families.js), but re-check the level tables here after that phase lands.
  • Menu-bar command — current #1 model (or a pinned model) in the macOS menu bar; background refresh at a conservative interval (6–12 h). Tav explicitly deferred this from v1.
  • Category leaderboard commands — dedicated commands per composite if the R2 dropdown proves not enough.
  • Simulated-score toggle — would need the sidecar or a slimmed variant in the export; only if users ask.
  • Compare command (view) — side-by-side model comparison UI; pairs with the site's own future compare view.
  • Windows — Raycast is macOS-first; revisit platforms when Raycast Windows extension support matures.

Parallel-work rules (why this can run alongside score model v2)

  • The extension repo + this wiki are the workspace; the site repo is touched ONLY by /api/export (route + helper + tests + docs/api-export.md) and one-line doc pointers. No shared files with the site roadmap's phases.
  • The data contract consumes the site's public helpers ($lib/families.js); if score-model work changes collapse semantics, the endpoint inherits it automatically — that's a feature, not a clash.
  • Site-repo sessions treat /api/export as a public contract: additive changes only (their ROADMAP + docs/api-export.md both say so).

For future executors

  • Docs discipline (Tav's rule): the extension repo carries NO documentation — roadmap, status, decisions, worklog all live in THIS wiki; keep the wiki in sync when extension commits land, and mirror anything non-trivial into the site repo's docs when it concerns the data contract. .claude/PROJECT_BRIEF.md in the extension repo is local-only (untracked), DevServers-style.
  • Read Data Contract before touching data code; read Raycast Docs Research before adopting new Raycast API surface, and re-verify against the raycast-api MCP server (the API moves fast; report was current 2026-07-02).
  • Versions at research time: @raycast/api 1.104.21, @raycast/utils 2.2.7. Manifest command names map to src/<name>.tsx; tool names to src/tools/<name>.ts.
  • Update this page's checkboxes + "Current phase" as phases complete — this binds every model reading it.