A local-first notes app — a quiet place to think.
This repository exists primarily as a fixture for evaluating AI code-generation tools. The application itself is a working React + TypeScript + TipTap + Dexie notes app with no backend, but the main reason it lives here is to provide a non-trivial real-world codebase against which AI agents can be benchmarked.
The repository serves as the baseline checkout in experiments that compare the output of different LLM coding agents (different models × different reasoning-effort levels) on the same well-specified feature task.
The flow:
- Clone this repo.
- Create N git worktrees off
main, one per (model, effort) combination you want to compare. - Run an agent (e.g. OpenAI Codex CLI, Claude Code, Cursor) in each worktree with the exact same prompt — see
outline-prompt.txtfor the reference prompt used in recent runs. - Score the resulting implementations on shared axes (spec adherence, type safety, pattern conformity, performance, constraint adherence, etc.) and compare across cells.
The included outline-prompt.txt asks each agent to add a Table-of-Contents / Outline panel to the editor. The task is intentionally chosen to be:
- Medium build complexity — touches ~6 files, ~250 LOC of changes, integrates with TipTap and the existing Zustand store / Dexie meta-table.
- Easy to UI-verify — every requirement is something a human can eyeball in seconds.
- Constrained — has an explicit out-of-scope fence, which is itself a useful signal for measuring how well agents respect spec boundaries.
If you're using this repo for your own evaluations, feel free to adapt the prompt or swap in your own.
npm install
npm run devVite serves on http://localhost:5173 by default.
- React 18 + TypeScript 5.7 (strict)
- Vite 5 + Tailwind v4 (beta)
- TipTap 2.10 (StarterKit + ~10 extensions)
- Dexie 4 (IndexedDB)
- Zustand 5 (state)
- MiniSearch (client-side full-text search)
- Radix UI primitives, cmdk, Framer Motion
No license attached — treat as private code unless one is added.