Skip to content
Merged
10 changes: 4 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
# Case harness marker files (created during pipeline runs)
.case-active
.case-tested
.case-manual-tested
.case-reviewed

# Repo-local Case runtime state. Target repos should ignore this directory too;
# `ca bootstrap` adds the rule automatically.
.case/

# User-local project manifest (lives in ~/.config/case/projects.json)
projects.json

# Legacy in-repo runtime state retained only as a read fallback during migration.
tasks/active/
tasks/done/
docs/learnings/
docs/proposed-amendments/*.md
!docs/proposed-amendments/.gitkeep
!docs/proposed-amendments/README.md
docs/run-log.jsonl
docs/agent-versions/

Expand Down
6 changes: 3 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Case — WorkOS OSS Harness
# Case — Agent Harness

Spine repo for orchestrating agent work across WorkOS open source projects.
Humans steer. Agents execute. When agents struggle, fix the harness.
Expand All @@ -21,9 +21,9 @@ echo "$SESSION"
| authkit-session | `../authkit-session` | Framework-agnostic session management | TS/pnpm |
| authkit-tanstack-start | `../authkit-tanstack-start` | AuthKit TanStack Start SDK | TS/pnpm |
| authkit-nextjs | `../authkit-nextjs` | AuthKit Next.js SDK | TS/pnpm |
| workos-node | `../workos-node/main` | WorkOS Node.js SDK | TS/pnpm |
| workos-node | `../workos-node/main` | WorkOS Node.js SDK | TS/npm |

Full metadata (commands, remotes, language): `projects.json`
Full metadata (commands, remotes, evidence strategy): `~/.config/case/projects.json`

## Navigation

Expand Down
15 changes: 7 additions & 8 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ It provides the cross-cutting knowledge, conventions, and task dispatch that no

## Philosophy

- **Case exists to make agent-authored WorkOS OSS PRs reliable, reviewable, and self-improving.** Keep the core loop small unless reliability requires more.
- **Case exists to make agent-authored PRs reliable, reviewable, and self-improving.** Keep the core loop small unless reliability requires more.
- **Humans steer, agents execute.** Engineers define goals and acceptance criteria. Agents implement.
- **Never write code directly.** All code changes in target repos flow through agents. Engineers only improve this harness.
- **When agents struggle, fix the harness.** The fix is never "try harder" — it's a missing doc, playbook, convention, or enforcement rule.
Expand Down Expand Up @@ -48,34 +48,33 @@ Case depends on the skills plugin for product knowledge. They are complementary,
```
AGENTS.md # Entry point for agents (routing map)
CLAUDE.md # This file (meta-instructions for case itself)
projects.json # Manifest of target repos
projects.schema.json # JSON Schema for the manifest
projects.schema.json # JSON Schema for the project manifest
docs/
architecture/ # Canonical patterns per repo type
conventions/ # Shared rules (commits, testing, PRs)
golden-principles.md # Invariants enforced across all repos
playbooks/ # Step-by-step guides for recurring operations
tasks/
active/ # Current task files for agent execution
done/ # Completed tasks (moved after PR merge)
templates/ # Reusable task templates
src/commands/
check.ts # Cross-repo convention enforcement
bootstrap.ts # Per-repo readiness verification
onboard.ts # Human-facing onboarding for a new repo
```

## Commands

```bash
# Validate manifest
node -e "JSON.parse(require('fs').readFileSync('projects.json','utf8'))"

# Check conventions across repos
# Check conventions across repos (also validates the manifest)
ca check

# Check a single repo
ca check --repo cli

# Bootstrap a repo for agent work
ca bootstrap cli

# Onboard a new repo
ca onboard <path>
```
3 changes: 2 additions & 1 deletion CONTEXT.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Canonical vocabulary for the case pipeline. Every term used in code, specs, and
| Term | Definition | Rejected Alternatives |
| ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
| **task** | A unit of agent work dispatched by the pipeline. Has a `taskId`, status, and associated event log. | `job`, `run` (too generic) |
| **phase** | A named pipeline stage that produces one `AgentResult`. One of: implement, verify, review, approve, close, retrospective. | `step` (too generic), `stage` (ambiguous with CI) |
| **phase** | A named pipeline stage that produces one `AgentResult`. One of: implement, verify, review, close, retrospective. | `step` (too generic), `stage` (ambiguous with CI) |
| **node** | A DAG vertex representing one phase execution at a specific revision cycle. E.g., `implement_0`, `verify_1`. Introduced in Phase 3. | `vertex` (too academic) |
| **status** | The lifecycle position of a task, derived from pipeline state. One of: active, implementing, verifying, reviewing, evaluating, closing, pr-opened, merged. | `state` (reserved for `PipelineState`, the full reconstructible object) |
| **state** | The full reconstructible pipeline state object (`PipelineState`), produced by `reduceEvents()`. | `snapshot` (used in mill for a different concept) |
Expand All @@ -18,6 +18,7 @@ Canonical vocabulary for the case pipeline. Every term used in code, specs, and
| **evaluator** | Collective term for verifier and reviewer — the two phases that assess implementation quality. | `assessor`, `checker` |
| **marker** | A file written to `.case/<task-slug>/` as evidence of a completed phase. E.g., `tested`, `reviewed`. | `flag`, `sentinel` |
| **evidence** | Proof that a phase completed successfully. Includes marker files, SHA-256 hashed test output, screenshots. | `artifact` (too broad) |
| **evidence strategy** | One of: ui-screenshot, scenario-script, test-output. Declared per project in projects.json. Drives what kind of verification evidence the pipeline requires. | |
| **ast-grep rule** | A YAML file defining a structural code pattern to match or ban. Processed by ast-grep against TypeScript ASTs. Lives in `ast-rules/`. | `lint rule` (too generic — we also have oxlint) |
| **target rule** | An ast-grep rule enforcing golden principles in target repos. Run by the implementer before committing. Lives in `ast-rules/target/`. | `repo rule`, `external rule` |
| **self-enforcement rule** | An ast-grep rule enforcing case's own codebase invariants. Run in CI and pre-commit. Lives in `ast-rules/self/`. | `internal rule`, `meta rule` |
Expand Down
40 changes: 24 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

<img width="500" height="500" alt="Case" src="docs/case-logo.svg" />

Case is the reliability layer for agent-authored WorkOS OSS pull requests.
Case is the reliability layer for agent-authored pull requests.

Its job is narrow: turn a clearly scoped WorkOS OSS task into a reviewed PR with evidence, and make the next run better when this one fails. Case is not a generic agent platform, a dashboard product, or a place to accumulate every possible workflow idea. Humans steer. Agents execute. The harness keeps the work reviewable.
Its job is narrow: turn a clearly scoped task into a reviewed PR with evidence, and make the next run better when this one fails. Case is not a generic agent platform, a dashboard product, or a place to accumulate every possible workflow idea. Humans steer. Agents execute. The harness keeps the work reviewable.

## Why It Exists

Agents are useful when the surrounding system makes good work easier than bad work. Case provides that surrounding system for the WorkOS open source repos:
Agents are useful when the surrounding system makes good work easier than bad work. Case provides that surrounding system:

- A shared map of target repos, commands, architecture notes, and conventions.
- A task format that separates human intent from machine-updated state.
Expand All @@ -18,7 +18,7 @@ Agents are useful when the surrounding system makes good work easier than bad wo

The north star:

> Case exists to make agent-authored WorkOS OSS PRs reliable, reviewable, and self-improving.
> Case exists to make agent-authored PRs reliable, reviewable, and self-improving.

## Core Loop

Expand Down Expand Up @@ -106,6 +106,7 @@ ca 1234 # create or resume a GitHub issue run
ca DX-1234 # create or resume a Linear issue run
ca --agent # interactive steering session
ca --agent 1234 # steering session with issue context
ca onboard <path> # add a repo to projects.json
ca run --task <file> # run an existing task JSON
ca watch <task-slug> # live-tail the event log
```
Expand All @@ -120,7 +121,7 @@ ca mark-manual-tested
ca mark-reviewed --critical 0
ca upload <file>
ca snapshot <agent-name>
ca create --repo <name> --title <title> --description <text>
ca create --repo <name> --title <title> --description <text> --evidence <expectations>
ca analyze-failure <task.json> <agent> <error>
ca bootstrap <repo>
ca check [--repo <repo>]
Expand Down Expand Up @@ -167,6 +168,8 @@ CASE_DATA_DIR=/tmp/case-test ca init

Static package assets are versioned with Case and embedded into the standalone binary: `agents/`, markdown under `docs/`, and text rules under `ast-rules/`. When running from a checkout, disk files win so local prompt/doc edits are picked up immediately; set `CASE_PACKAGE_ROOT=/path/to/case` to force a specific checkout as the disk override.

Each entry in `projects.json` may optionally include `credentials` (per-repo secrets needed for verification) and `verificationNotes` (free-form context the verifier should know about the repo).

For portable binary installs, keep `projects.json` in `~/.config/case/` via `ca init --projects <path>` or `ca init --migrate-from <case-checkout>`. Repo paths in a portable `projects.json` should be absolute or relative to that `projects.json` file.

## Pipeline
Expand All @@ -182,6 +185,8 @@ Revision loops are evaluator-driven. A verifier or reviewer rubric failure can s

Every run writes an append-only event log under `<target-repo>/.case/<task-slug>/events/`. `ca watch <task-slug>` renders those events while a run is active.

Every task carries `evidenceExpectations` — the concrete artifacts the verifier must produce. The orchestrator writes these based on the target repo's `evidenceStrategy` so the verifier knows what counts as proof up front.

## Agent Roles

| Agent | Responsibility | Does Not Do |
Expand All @@ -193,7 +198,7 @@ Every run writes an append-only event log under `<target-repo>/.case/<task-slug>
| Closer | Creates the PR after evidence gates pass | Implement or test |
| Retrospective | Records learnings and proposes harness improvements | Edit target repo code |

¹ The orchestrator is TypeScript runtime code (`src/agent/orchestrator-session.ts`), not an LLM agent prompt like the others.
¹ The orchestrator runs as an LLM agent session via `ca --agent`, or as TypeScript runtime code for direct `ca <issue>` dispatch.

The key boundary is context isolation. Implementer context includes task details, playbooks, repo learnings, and revision feedback. Verifier context is intentionally fresher. Reviewer context is focused on the diff and principles.

Expand All @@ -207,6 +212,12 @@ Evidence markers live under the target repo's `.case/<task-slug>/` directory:

The closer checks these markers before opening a PR. The point is not ceremony; it is making the PR auditable without trusting a chat transcript.

Each repo declares an `evidenceStrategy` in `projects.json` that drives what the verifier produces:

- `ui-screenshot`: Playwright before/after screenshots for user-facing UI changes.
- `scenario-script`: a consumer script that exercises the specific user-facing scenario.
- `test-output`: automated test output only (for libraries and non-UI code).

## Self-Improvement

After a run, the retrospective agent should leave the harness smarter:
Expand Down Expand Up @@ -240,18 +251,15 @@ Priority:

## Repository Map

Target repos are listed in `projects.json`.
Target repos are listed in `~/.config/case/projects.json` (created by `ca init` + `ca onboard`). The schema is `projects.schema.json` in this repo.

| Repo | Path | Purpose |
| ---------------------- | --------------------------- | ------------------------------------- |
| cli | `../cli/main` | WorkOS CLI |
| skills | `../skills` | WorkOS integration skills |
| authkit-session | `../authkit-session` | Framework-agnostic session management |
| authkit-tanstack-start | `../authkit-tanstack-start` | AuthKit TanStack Start SDK |
| authkit-nextjs | `../authkit-nextjs` | AuthKit Next.js SDK |
| workos-node | `../workos-node/main` | WorkOS Node.js SDK |
Add a repo with:

```bash
ca onboard <path>
```

Add a repo by updating `projects.json`, adding any needed architecture notes under `docs/architecture/`, and verifying with:
Then add any needed architecture notes under `docs/architecture/` and verify with:

```bash
ca check --repo <name>
Expand Down
Loading