-
Notifications
You must be signed in to change notification settings - Fork 0
Roadmap
Roadmap — RankedAGI Raycast extension (founded 2026-07-02 by Claude Fable 5; moved to the wiki 2026-07-02)
This page is the single source of truth for this initiative's plan and status. The "Current phase" line and the checkboxes ARE the status. History lives in Worklog. The main site's roadmap lives in the site repo (~/Development/Tavlean/RankedAGI/docs/fable-plans/ROADMAP.md) and covers a different initiative (score model v2); this initiative deliberately stays out of its way — see "Parallel-work rules" below.
Current phase: R1 done; R2 + R3 BUILT (2026-07-02) — both unticked until Tav tries them in Raycast (npm run dev in the extension repo, browse both commands, ask @rankedagi a scores question). The endpoint goes live with the next site deploy; until then set the extension's Data URL preference to a local site dev server (http://127.0.0.1:<port>/api/export). Same-day follow-ups landed: repo moved to ~/Development/RaycastExtensions/rankedagi-raycast, empty-reasoning-levels fix (both layers), docs moved to this wiki. Next after Tav's confirmation: R4 store submission.
RankedAGI in the launcher: hit the Raycast hotkey, type a model name, and see how it ranks — RAGI composites, every benchmark score, pricing, links — without opening a browser. Same for benchmarks: pick SWE-Bench Pro and see how every model ranks on it. Published on the public Raycast Store, where it doubles as a discovery/marketing surface for rankedagi.com. AI tools let Raycast AI answer questions like "how does Opus 4.8 score on SWE-Bench Pro?" from the same data.
-
Separate repo at
~/Development/RaycastExtensions/rankedagi-raycast— zero clash with the score-model sessions running in the site repo. The only site-repo touch is one small additive endpoint (R1). - Built to public Raycast Store standards from day one; docs live in THIS wiki, never in the extension repo (it gets PR'd to the store verbatim — DevServers rule).
- v1 scope: Search Models + Search Benchmarks + AI tools. Menu-bar command and anything else → Later.
-
Data: ONE prerendered JSON endpoint on rankedagi.com (
/api/export), fetched with stale-while-revalidate caching; all search/filter/rank happens client-side. Real scores only in v1 — no simulated estimates; missing cells render "—"; reasoning levels with zero real values never appear (matches the site's addendum-17 rule; filtered in the endpoint AND defensively in the extension).
F1 — Data contract, /api/export v1. Full detail: Data Contract (consumer side) and docs/api-export.md in the site repo (producer side). Evolution rule: additive only.
F2 — Extension data layer. src/lib/ is shared by both view commands AND the AI tools — build once, consume three times. Detail: Architecture; rationale: Raycast Docs Research.
-
R1 —
/api/exportendpoint in the site repo [execute] — done 2026-07-02:src/routes/api/export/+server.js+ pure helpersrc/lib/server/exportDataset.js+ unit tests (site commits3b452b2,930f221); verified live (208 models / 78 benchmarks / 13 families with levels after the empty-levels fix). -
R2 — Extension scaffold + the two core commands [execute] — BUILT 2026-07-02 (extension commits
15302b7,8ee126d);npm run build+npm run lintclean. Unticked until Tav browses models + benchmarks in Raycast vianpm run dev. Search Models:List+isShowingDetail, dropdown re-rank by RAGI Overall / Code / Agentic / Reasoning / Math, rank + composite accessories, detail = composites table, benchmark-scores table (only benchmarks with values), reasoning-levels table (only levels/columns with values), metadata (org, released, license, cost, links). Search Benchmarks: sections (RAGI Composites first, then per category), detail = description + top-20 ranking + metadata. Both:useFetchstale-while-revalidate, Data URL preference for local dev, store-review rules baked in (Title Case,isLoading, placeholders, real icon, MIT, no analytics, US English). -
R3 — AI tools [execute] — BUILT 2026-07-02 (extension commit
2ecce42). Unticked until Tav's end-to-end @rankedagi try. Four read-only tools insrc/tools/(search-models,get-model,rank-models-by-benchmark,compare-models) +ai.yaml(instructions + 3 evals — current docs put AI properties in rootai.yaml, NOTpackage.json.ai). Tools use the non-hook data path (src/lib/dataset.ts: fetch +Cache, 1 h TTL, stale fallback). Fuzzy resolution echoes back exactly what was matched; outputs capped. -
R4 — Store submission [execute, needs Tav]
-
authoris "tavlean" — CONFIRMED (DevServers ships under this handle; forktavlean/raycast-extensionsalready exists and is the submission vehicle). - Screenshots: 3–6 PNGs at 2000×1250 in a
metadata/folder at the extension root (DevServers convention:metadata/dev-servers-1.pngetc.). - Icon light/dark check (
icon@dark.pngonly if the blue-on-blue ever fails),CHANGELOG.mduses{PR_MERGE_DATE}placeholder (already does), README present. - Submit:
npm run publish(opens the PR intoraycast/extensionsvia the fork) — verify the PR carries ONLY extension files (no.claude/; docs already live in this wiki so there's nothing to leak). Depends on: R2 + R3 ticked. Done when: extension live on the Raycast Store.
-
The site is mid-rebuild of what a "score" is (score-record v2). v1 already rides part of that foundation (per-reasoning-level scores render in the model detail, only where values exist). Each entry below becomes buildable when the matching site phase ships; each is additive on the extension side.
-
Per-score sources (receipts) — nearest-term. The site attaches ordered sources to every real score: first source = the shown value, optional note, a second value from a disagreeing source, retroactive-revision context (Tav's footnote model). The data side already exists: the site serves a slimmed, prerendered
/api/score-provenance(draft AI-ingested entries excluded) keyed by model slug × benchmark key. Once Tav's real entries fill the sidecar (site Phase 4 ticking), add: source links per score in the model detail panel, a "where these numbers come from" block in benchmark detail, and AI tools citing the source URL when asked. Implementation choice then: fetch/api/score-provenanceas a second cached dataset (zero site changes) or fold slimmed sources into/api/export(additive either way). - Confidence / disagreeing-source display — rides the same provenance data: when a cell holds two values from disagreeing sources, surface the second value + note instead of silently showing one number. Wait until the site settles its own display treatment (its Phase 6 / Nerd view) and mirror the language.
-
Best-level semantics updates — the site's Phase 6 decides collapsed-row semantics (coherent best level vs per-benchmark best) and expanded-level modes. The extension inherits collapse changes automatically through
/api/export(it reuses the site's$lib/families.js), but re-check the level tables here after that phase lands. - Menu-bar command — current #1 model (or a pinned model) in the macOS menu bar; background refresh at a conservative interval (6–12 h). Tav explicitly deferred this from v1.
- Category leaderboard commands — dedicated commands per composite if the R2 dropdown proves not enough.
- Simulated-score toggle — would need the sidecar or a slimmed variant in the export; only if users ask.
- Compare command (view) — side-by-side model comparison UI; pairs with the site's own future compare view.
-
Windows — Raycast is macOS-first; revisit
platformswhen Raycast Windows extension support matures.
- The extension repo + this wiki are the workspace; the site repo is touched ONLY by
/api/export(route + helper + tests +docs/api-export.md) and one-line doc pointers. No shared files with the site roadmap's phases. - The data contract consumes the site's public helpers (
$lib/families.js); if score-model work changes collapse semantics, the endpoint inherits it automatically — that's a feature, not a clash. - Site-repo sessions treat
/api/exportas a public contract: additive changes only (their ROADMAP +docs/api-export.mdboth say so).
- Docs discipline (Tav's rule): the extension repo carries NO documentation — roadmap, status, decisions, worklog all live in THIS wiki; keep the wiki in sync when extension commits land, and mirror anything non-trivial into the site repo's docs when it concerns the data contract.
.claude/PROJECT_BRIEF.mdin the extension repo is local-only (untracked), DevServers-style. - Read Data Contract before touching data code; read Raycast Docs Research before adopting new Raycast API surface, and re-verify against the raycast-api MCP server (the API moves fast; report was current 2026-07-02).
- Versions at research time:
@raycast/api1.104.21,@raycast/utils2.2.7. Manifest command names map tosrc/<name>.tsx; tool names tosrc/tools/<name>.ts. - Update this page's checkboxes + "Current phase" as phases complete — this binds every model reading it.