Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 10 additions & 33 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,11 @@
# vstackv2 — Personal AI Coding Toolkit
# vstack — Personal AI Coding Toolkit

vstackv2 is a lean skill pack for AI coding agents. The default surface is small:
keep the browser runtime, a few high-leverage workflow skills, and only enough
transition compatibility to avoid breaking old habits.
vstack is a lean skill pack for AI coding agents. Single-tier surface: the
browser runtime plus a small set of high-leverage workflow skills.

## Core layers
## Skills

1. Browser/runtime
2. Core skills
3. Optional legacy/transition skills

## Core skills

Skills live in `.agents/skills/`. The default install emphasizes this smaller set.
Skills live in `.agents/skills/`.

| Skill | What it does |
|-------|-------------|
Expand All @@ -21,28 +14,12 @@ Skills live in `.agents/skills/`. The default install emphasizes this smaller se
| `/investigate` | Root-cause debugging and implementation troubleshooting. |
| `/review` | Diff-focused code review before landing changes. |
| `/qa` | Browser-driven QA loop that tests and fixes issues. |
| `/ship` | Ship workflow for tests, review, PR prep, and release hygiene. |
| `/guard` | Combined safety mode for destructive commands and scoped edits. |
| `/ship` | Direct push to main with a generated commit message. |
| `/connect-chrome` | Launch visible Chrome with the vstack side panel. |
| `/vstack-upgrade` | Update the toolkit. |

## Transition skills

These still work in v2, but they are no longer the primary public surface:

- `/plan-ceo-review`
- `/plan-eng-review`
- `/qa-only`
- `/careful`
- `/freeze`
- `/unfreeze`
- `/codex`

## Legacy skills
| `/retro` | Weekly engineering retrospective from git history. |

The repo still retains a broader legacy layer for now, but those skills are
unsupported by default in the v2 install surface. Use `./setup --legacy` if you
explicitly want the broader historical toolkit.
The Phase 2 work in `PLAN.md` adds `/simplify`, `/sketch`, `/design-audit`, and
`/quiz` to bring the surface to twelve skills.

## Build commands

Expand All @@ -58,4 +35,4 @@ bun run test:core

- The browser command registry remains the source of truth for browse commands.
- Generated skill docs still exist where code-coupled sections must stay in sync.
- Setup now defaults to the v2 core surface. Legacy skills are opt-in.
- `config/skill-surface.sh` is the single source of truth for which skills install.
11 changes: 5 additions & 6 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,13 +211,12 @@ This is structurally sound — if a command exists in code, it appears in docs.

### The preamble

Every skill starts with a `{{PREAMBLE}}` block that runs before the skill's own logic. It handles five things in a single bash command:
Every skill starts with a `{{PREAMBLE}}` block that runs before the skill's own logic. It handles four things in a single bash command:

1. **Update check** — calls `vstack-update-check`, reports if an upgrade is available.
2. **Session tracking** — touches `~/.vstack/sessions/$PPID` and counts active sessions (files modified in the last 2 hours). When 3+ sessions are running, all skills enter "ELI16 mode" — every question re-grounds the user on context because they're juggling windows.
3. **Contributor mode** — reads `vstack_contributor` from config. When true, the agent files casual field reports to `~/.vstack/contributor-logs/` when vstack itself misbehaves.
4. **AskUserQuestion format** — universal format: context, question, `RECOMMENDATION: Choose X because ___`, lettered options. Consistent across all skills.
5. **Search Before Building** — before building infrastructure or unfamiliar patterns, search first. Three layers of knowledge: tried-and-true (Layer 1), new-and-popular (Layer 2), first-principles (Layer 3). When first-principles reasoning reveals conventional wisdom is wrong, the agent names the "eureka moment" and logs it. See `ETHOS.md` for the full builder philosophy.
1. **Session tracking** — touches `~/.vstack/sessions/$PPID` and counts active sessions (files modified in the last 2 hours). When 3+ sessions are running, all skills enter "ELI16 mode" — every question re-grounds the user on context because they're juggling windows.
2. **Local invocation log** — appends a JSONL line to `~/.vstack/analytics/skill-usage.jsonl`. Local-only, consumed by `/retro`. No remote sync, no consent prompt, no version check.
3. **AskUserQuestion format** — universal format: context, question, `RECOMMENDATION: Choose X because ___`, lettered options. Consistent across all skills.
4. **Search Before Building** — before building infrastructure or unfamiliar patterns, search first. Three layers of knowledge: tried-and-true (Layer 1), new-and-popular (Layer 2), first-principles (Layer 3). When first-principles reasoning reveals conventional wisdom is wrong, the agent names the "eureka moment" and logs it. See `ETHOS.md` for the full builder philosophy.

### Why committed, not generated at runtime?

Expand Down
6 changes: 2 additions & 4 deletions BROWSER.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ The window has a subtle green shimmer line at the top edge and a floating "vstac
| `focus` | Bring Chrome to foreground (macOS). `focus @e3` also scrolls element into view |
| `status` | Shows `Mode: cdp` when connected, `Mode: launched` when headless |

**CDP-aware skills:** When in real-browser mode, `/qa` and `/design-review` automatically skip cookie import prompts and headless workarounds.
**CDP-aware skills:** When in real-browser mode, `/qa` and `/design-audit` automatically skip cookie import prompts and headless workarounds.

### Chrome extension (Side Panel)

Expand Down Expand Up @@ -242,9 +242,7 @@ The Chrome side panel includes a chat interface. Type a message and a child Clau

**Session isolation:** Each sidebar session runs in its own git worktree. The sidebar agent won't interfere with your main Claude Code session.

**Authentication:** The sidebar agent uses the same browser session as headed mode. Two options:
1. Log in manually in the headed browser ... your session persists for the sidebar agent
2. Import cookies from your real Chrome via `/setup-browser-cookies`
**Authentication:** The sidebar agent uses the same browser session as headed mode. Log in manually in the headed browser; the session persists for the sidebar agent and across `$B` invocations.

**Random delays:** If you need the agent to pause between actions (e.g., to avoid rate limits), use `sleep` in bash or `$B wait <milliseconds>`.

Expand Down
45 changes: 45 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,50 @@
# Changelog

## [0.13.0.0] - 2026-05-08 — vstack v2: distillation

vstack v2 is a major redesign. The toolkit shrinks from 28 skills to a single
tier of 12, drops every piece of remote infrastructure (telemetry sync, update
checker, Supabase functions), and scrubs every line of recruitment / YC /
marketing prose from the surface. The skills that remain are the ones that
actually pull weight on a personal project. The browser runtime is unchanged.

### Surface (12 skills)

- `/browse` — persistent browser for QA, screenshots, evidence capture, dogfooding.
- `/office-hours` — shape an idea before coding (now without the YC plea).
- `/sketch` — **new**. Translate a feature description into McConnell PPP-level pseudocode before any real code. Saves to `~/.vstack/projects/<slug>/sketches/`.
- `/investigate` — root-cause debugging.
- `/review` — pre-landing diff review.
- `/qa` — browser-driven test-and-fix loop.
- `/design-audit` — **new**. Senior product designer audit of a live UI: drives `/browse` to capture configured flows × viewports, names visual tropes (gradient hero, 3-col feature grids, glassmorphism, uniform radius), interaction clarity, spacing, typography, visual a11y. Optional second pass applies fixes with atomic commits and before/after screenshots.
- `/quiz` — **new**. Five questions designed to surface gaps in your mental model of the current codebase. Stateless, picks fresh concepts every run.
- `/simplify` — **new**. Sweeping audit for yuck and dead code. Names redundant functions, bad naming, unused imports, unreachable branches, speculative generality. Proposes a plan, applies removals one bisectable commit at a time, re-runs tests after each. Removes code only with proof.
- `/ship` — rewritten as direct push to main. No PR, no coverage gate, no review ceremony. Tests pass → `git add` → `git commit` (generated message you can edit) → push. From a feature branch, fast-forwards into main and deletes the branch.
- `/connect-chrome` — visible Chrome with the side panel.
- `/retro` — weekly engineering retrospective from git history.

### Removed

- 20 skills hard-deleted: `/cso`, `/land-and-deploy`, `/canary`, `/benchmark`, `/codex`, `/careful`, `/freeze`, `/guard`, `/unfreeze`, `/setup-browser-cookies`, `/setup-deploy`, `/vstack-upgrade`, `/design-consultation`, `/design-review`, `/plan-design-review`, `/autoplan`, `/qa-only`, `/plan-ceo-review`, `/plan-eng-review`, `/document-release`. References that survived will 404 — that's the point.
- All remote telemetry plumbing: `bin/vstack-update-check`, `bin/vstack-telemetry-sync`, `bin/vstack-telemetry-log`, `bin/vstack-analytics`, `bin/vstack-community-dashboard`, the entire `supabase/` directory (telemetry-ingest function, update-check function, community-pulse function, two RLS migrations).
- Auto-update checking entirely. v2 updates via `git pull` on your terms.
- The first-run telemetry consent prompt and the `telemetry: <tier>` config key.
- All YC / recruitment / marketing prose: "We're hiring" block, `ycombinator.com/apply` links, the `Garry's Personal Plea` block in `/office-hours` (top/middle/base-tier CTAs), the Founder Signal Synthesis phase that fed into it, the "Garry Tan / YC partner energy" framing in the skill preamble Voice section, the `garryslist.org` link in the Lake intro.

### Changed

- The skill surface collapses from three tiers (core / transition / legacy) to a single tier of peers in `config/skill-surface.sh`. The `--legacy` install flag is a no-op now; nothing lives outside the surface.
- The skill preamble runs the local invocation log inline (`echo … >> ~/.vstack/analytics/skill-usage.jsonl`). No binary needed. `/retro` reads that file unchanged.
- `/office-hours` Phase 6 collapses from a three-beat closing sequence (signal reflection + golden age + Garry's plea) to a one-paragraph handoff and three next-skill suggestions: `/sketch`, `/investigate`, `/review`.
- `/ship` template drops from 648 lines to 252. Allowed tools shrink from 8 (Bash, Read, Write, Edit, Grep, Glob, Agent, AskUserQuestion, WebSearch) to 4 (Bash, Read, Edit, AskUserQuestion).
- README rewrites to a one-paragraph "what this is" and an install command pointing at `https://github.com/vedthebear/vstack`.

### For contributors

- `test:core` is the default development loop (free, fast, 418 tests). The legacy `test:legacy` script is gone — every E2E test that depended on a deleted skill was removed.
- `scripts/resolvers/preamble.ts` no longer composes `generateUpgradeCheck` or `generateTelemetryPrompt`; the section list shrinks from 11 sections to 8.
- VERSION bumps to `0.13.0.0`. Tags `v2-subtract` and `v2-add` mark the end of Phase 1 and Phase 2.

## [0.12.12.0] - 2026-03-27 — Security Audit Compliance

Fixes 20 Socket alerts and 3 Snyk findings from the skills.sh security audit. Your skills are now cleaner, your telemetry is transparent, and 2,000 lines of dead code are gone.
Expand Down
57 changes: 18 additions & 39 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
```bash
bun install # install dependencies
bun test # broad free test sweep
bun run test:core # fast v2 core test sweep
bun run test:legacy # optional legacy/eval-heavy surface
bun run test:core # fast v2 test sweep
bun run test:evals # run paid evals: LLM judge + E2E (diff-based, ~$4/run max)
bun run test:evals:all # run ALL paid evals regardless of diff
bun run test:gate # run gate-tier tests only (CI default, blocks merge)
Expand Down Expand Up @@ -52,9 +51,9 @@ bun run test:evals # run before shipping when the change touches eval-sensitiv
```

`test:core` is the default v2 confidence loop: browser-safe unit tests, registry and
generation checks, install-surface checks, and worktree helpers. `test:legacy` and the
paid eval tiers exist for the broader historical surface, but they are no longer the
default development loop for v2 work.
generation checks, install-surface checks, and worktree helpers. The paid eval tiers
exist for E2E coverage of the workflow skills, but they are not the default
development loop.

## Project structure

Expand All @@ -78,33 +77,14 @@ vstack/
│ ├── gen-skill-docs.test.ts # Tier 1: generator quality (free, <1s)
│ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run)
│ └── skill-e2e-*.test.ts # Tier 2: E2E via claude -p (~$3.85/run, split by category)
├── office-hours/ # Core planning/idea-shaping skill
├── investigate/ # Core build/debug skill
├── review/ # Core review skill
├── qa/ # Core QA skill
├── ship/ # Core shipping skill
├── guard/ # Core safety mode
├── connect-chrome/ # Core visible-Chrome companion
├── codex/ # Transition skill
├── plan-ceo-review/ # Transition skill
├── plan-eng-review/ # Transition skill
├── qa-only/ # Transition skill
├── careful/ # Transition skill
├── freeze/ # Transition skill
├── unfreeze/ # Transition skill
├── autoplan/ # Legacy skill
├── benchmark/ # Legacy skill
├── canary/ # Legacy skill
├── cso/ # Legacy skill
├── design-consultation/ # Legacy skill
├── design-review/ # Legacy skill
├── bin/ # CLI utilities (vstack-repo-mode, vstack-slug, vstack-config, etc.)
├── document-release/ # Legacy skill
├── land-and-deploy/ # Legacy skill
├── plan-design-review/ # Legacy skill
├── retro/ # Legacy skill
├── setup-browser-cookies/ # Legacy skill
├── setup-deploy/ # Legacy skill
├── office-hours/ # Idea-shaping skill
├── investigate/ # Build/debug skill
├── review/ # Pre-landing review skill
├── qa/ # Browser-driven QA skill
├── ship/ # Ship skill (direct push to main)
├── connect-chrome/ # Visible-Chrome companion
├── retro/ # Weekly retrospective skill
├── bin/ # CLI utilities (vstack-config, vstack-slug, etc.)
├── .github/ # CI workflows + Docker image
│ ├── workflows/ # evals.yml (E2E on Ubicloud), skill-docs.yml, actionlint.yml
│ └── docker/ # Dockerfile.ci (pre-baked toolchain + Playwright/Chromium)
Expand All @@ -115,14 +95,13 @@ vstack/
└── package.json # Build scripts for browse
```

## vstackv2 workflow
## vstack v2 workflow

v2 keeps generation only where drift is genuinely dangerous.
v2 is a single-tier surface. Every skill in `config/skill-surface.sh` is a peer.

- Browser command syntax still comes from code.
- Host-specific skill transforms still come from `gen-skill-docs.ts`.
- The default public install surface comes from `config/skill-surface.sh`.
- Legacy skills may remain in-repo without being part of the default install.
- The install surface comes from `config/skill-surface.sh`.

## SKILL.md workflow

Expand Down Expand Up @@ -155,9 +134,9 @@ project-specific behavior. The project owns its config; vstack reads it.

## v2 maintenance rule

When making changes, prefer the lean public surface unless there is a strong reason
to invest in legacy skills. The repo still contains a broader historical toolkit, but
the default product is the small personal operating kit described in `docs/VSTACKV2.md`.
The default product is the small personal operating kit listed in
`config/skill-surface.sh`. There is no legacy tier — anything that isn't a peer
in the surface either gets folded in or gets deleted.

## Writing SKILL templates

Expand Down
1 change: 0 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,6 @@ bun run build
| Frontmatter | Full (name, description, allowed-tools, hooks, version) | Minimal (name + description only) |
| Paths | `~/.claude/skills/vstack` | `$VSTACK_ROOT` (`.agents/skills/vstack` in a repo, otherwise `~/.codex/skills/vstack`) |
| Hook skills | `hooks:` frontmatter (enforced by Claude) | Inline safety advisory prose (advisory only) |
| `/codex` skill | Included (Claude wraps codex exec) | Excluded (self-referential) |

### Testing Codex output

Expand Down
Loading
Loading