Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 24 additions & 4 deletions changelog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,34 @@ description: "Stay up to date with UseZombie product updates, new features, and
UseZombie is in **Early Access Preview**. Features below are live in the current release. APIs and agent behavior may evolve before GA.
</Tip>

<Update label="April 5, 2026" tags={["New releases", "Improvements"]}>
<Update label="v0.4.0 — April 6, 2026" tags={["New releases", "Improvements"]}>
## Steer running agents mid-run

Interrupt a running agent without aborting it. Send a message via `zombiectl runs interrupt <run_id> <message>` or `POST /v1/runs/{id}:interrupt` — the agent picks it up at the next gate checkpoint. Two modes: **queued** (next checkpoint) and **instant** (IPC delivery).

## Live run streaming (CLI)

`zombiectl run --spec <file> --watch` now streams gate results in real time. Reconnect with `Last-Event-ID` replays only missed events — no duplicate floods. Ctrl+C works cleanly.

## Run replay (CLI)

`zombiectl runs replay <run_id>` prints a per-gate narrative for completed runs — exit codes, stdout/stderr, wall time, step by step.

## Workspace billing breakdown

`zombiectl workspace billing --workspace-id <id>` shows completed, non-billable, and score-gated runs with optional `--period` and `--json` flags. Backed by `GET /v1/workspaces/{id}/billing/summary`.

## Agent run observability

Every run now produces a full trace tree in Grafana Tempo — query `{run.id="<id>"}` for a waterfall of agent calls and gate checks. Per-workspace Prometheus metrics: token consumption, run outcomes, and gate repair loop distribution.

## Resource efficiency scoring

Agent runs are now scored on actual memory and CPU usage. Agents that stay within their resource limits score higher. Agents that max out memory or hit CPU caps score lower — giving you visibility into wasteful runs.
Agent runs are now scored on actual memory and CPU usage. Agents that stay within their resource limits score higher. Score formula updated to v2 with real resource data.

## Score formula v2
## Breaking change

The scoring formula has been updated to v2. The resource axis now uses real data instead of a fixed value. All other axes are unchanged and existing scores are preserved.
SSE `id:` field on live events changed from sequential counter to `created_at` Unix milliseconds. Clients parsing `Last-Event-ID` as a sequence number must update.
</Update>

<Update label="March 30, 2026" tags={["New releases"]}>
Expand Down
8 changes: 6 additions & 2 deletions index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,13 @@ description: Submit a spec. Get a validated PR.
---

<Tip>
🧟 **Early Access Preview** · Launching **April 5, 2026**
🧟 **Early Access Preview** · Pre-release — revised release coming up by April 11. APIs, CLI, and behavior may change without notice before general availability.

You're reading the docs for an unreleased product. Everything here works today in dev, but APIs, CLI surface, and agent behavior may shift before GA. [Join the waitlist →](https://usezombie.com)
UseZombie is in a product pivot. The focus is practical operator leverage, not tunnel-vision optimization around one narrow bottleneck that frontier models may erase soon.

Submit a spec. An agent implements it, self-repairs until quality gates pass, and opens a PR with a scorecard. You review one PR instead of babysitting ten agent sessions.

[Join the waitlist →](https://usezombie.com)
</Tip>

## Getting started
Expand Down
186 changes: 186 additions & 0 deletions operator/observability/posthog-events.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# PostHog Product Analytics — Event Catalogue

SDKs:
- `posthog-js` in `ui/packages/website`
- `posthog-js` in `ui/packages/app`
- `posthog-node` in `zombiectl`
- [posthog-zig](https://github.com/usezombie/posthog-zig) in `zombied`

> PostHog is one of two observability tools (with Grafana). Langfuse was removed in M12_001. See `docs/done/v1/M12_001_OBSERVABILITY_CONSOLIDATION.md`.

## Provisioning Contract

One PostHog project key is stored per environment:

| Vault | Item | Field |
|---|---|---|
| `ZMB_CD_DEV` | `posthog-dev` | `credential` |
| `ZMB_CD_PROD` | `posthog-prod` | `credential` |

These values are propagated through the standard playbook/check flow:
- bootstrap: `playbooks/001_bootstrap/001_playbook.md`
- preflight: `playbooks/002_preflight/001_playbook.md`
- credential gate: `playbooks/002_preflight/001_gate.sh`
- M2 gate section: `playbooks/gates/m2_001/section-2-procurement-readiness.sh`

## Surface Configuration

| Surface | Env var | SDK | Default host |
|---|---|---|---|
| Website | `VITE_POSTHOG_KEY` | `posthog-js` | `https://us.i.posthog.com` |
| App | `NEXT_PUBLIC_POSTHOG_KEY` | `posthog-js` | `https://us.i.posthog.com` |
| `zombied` API + worker | `POSTHOG_API_KEY` | `posthog-zig` | `https://us.i.posthog.com` |
| `zombiectl` CLI | `ZOMBIE_POSTHOG_KEY` | `posthog-node` | `https://us.i.posthog.com` |

Optional host overrides remain available in code, but vault provisioning only requires the key.

## Website Events

Emitter: `ui/packages/website/src/analytics/posthog.ts`

| Event | Typical properties | Description |
|---|---|---|
| `signup_started` | `source`, `surface`, `mode`, `path` | Website signup funnel entered |
| `signup_completed` | `source`, `surface`, `mode`, `path` | Website signup flow completed |
| `navigation_clicked` | `source`, `surface`, `target`, `path` | Primary website navigation interaction |
| `lead_capture_clicked` | `source`, `surface`, `cta_id`, `path` | Lead form CTA clicked |
| `lead_capture_opened` | `source`, `surface`, `component`, `path` | Lead form/modal opened |
| `lead_capture_submitted` | `source`, `surface`, `status`, `utm_*`, `path` | Lead form submitted |
| `lead_capture_failed` | `source`, `surface`, `status`, `path` | Lead form submit failed |

## App Events

Emitters:
- `ui/packages/app/instrumentation-client.ts`
- `ui/packages/app/components/analytics/AnalyticsBootstrap.tsx`
- dashboard pages and cards in `ui/packages/app/app/(dashboard)` and `ui/packages/app/components/domain`

Allowed app properties are allowlisted in `ui/packages/app/lib/analytics/posthog.ts`.

| Event | Typical properties | Description |
|---|---|---|
| `page_navigation_started` | `source`, `surface`, `path` | Next.js router transition started |
| `ui_runtime_error` | `source`, `surface`, `path`, `error_message` | Browser runtime error or unhandled rejection |
| `navigation_clicked` | `source`, `surface`, `target`, `path` | Dashboard shell navigation clicked |
| `workspace_list_viewed` | `source`, `surface`, `workspace_count`, `path` | Workspace index viewed |
| `workspace_card_clicked` | `source`, `surface`, `workspace_id`, `workspace_plan`, `paused`, `path` | Workspace selected from card |
| `workspace_detail_viewed` | `source`, `surface`, `workspace_id`, `active_run_id`, `active_run_status`, `path` | Workspace detail page viewed |
| `run_row_clicked` | `source`, `surface`, `workspace_id`, `run_id`, `run_status`, `run_attempts`, `path` | Run selected from table/list |
| `run_detail_viewed` | `source`, `surface`, `workspace_id`, `run_id`, `run_status`, `has_error`, `has_pr_url`, `path` | Run detail page viewed |

Identity:
- user identification happens in `AnalyticsBootstrap.tsx`
- distinct fields: `user_id`, `email`

## CLI Events

Emitters:
- lifecycle: `zombiectl/src/cli.js`
- domain commands: `zombiectl/src/commands/*.js`

CLI properties are sanitized to strings before capture. Shared command context may include `user_id`, `email`, `workspace_id`, `run_id`, `agent_id`, `proposal_id`, `score`, `error_code`, `reason`.

| Event | Typical properties | Description |
|---|---|---|
| `cli_command_started` | `command`, `args`, context ids | Command invocation started |
| `cli_command_finished` | `command`, `exit_code`, context ids | Command invocation completed |
| `cli_error` | `command`, `error_code`, `reason` | Top-level CLI failure |
| `user_authenticated` | `user_id`, `email` | Login or auth success |
| `login_completed` | `user_id`, `email` | Device/browser login completed |
| `logout_completed` | `user_id` | Local logout completed |
| `workspace_add_completed` | `workspace_id` | Workspace create/add completed |
| `workspace_list_viewed` | `workspace_id` or counts | Workspace list displayed |
| `workspace_removed` | `workspace_id` | Workspace removed |
| `specs_synced` | `workspace_id` | Specs sync completed |
| `run_queued` | `workspace_id`, `run_id` | Run trigger completed |
| `run_status_viewed` | `workspace_id`, `run_id`, `run_status` | Run status displayed |
| `runs_list_viewed` | `workspace_id` | Runs list displayed |
| `harness_compiled` | `workspace_id`, `agent_id` | Harness compile completed |
| `harness_active_viewed` | `workspace_id`, `agent_id` | Active harness shown |
| `harness_activated` | `workspace_id`, `agent_id` | Harness profile activated |
| `harness_source_uploaded` | `workspace_id`, `agent_id` | Harness source uploaded |
| `agent_scores_viewed` | `workspace_id`, `agent_id`, `score` | Agent score view displayed |
| `agent_profile_viewed` | `workspace_id`, `agent_id` | Agent profile displayed |
| `agent_improvement_report_viewed` | `workspace_id`, `agent_id` | Improvement report displayed |
| `agent_proposals_viewed` | `workspace_id`, `agent_id` | Proposal list displayed |
| `agent_proposal_approved` | `workspace_id`, `agent_id`, `proposal_id` | Proposal approved |
| `agent_proposal_rejected` | `workspace_id`, `agent_id`, `proposal_id`, `reason` | Proposal rejected |
| `agent_proposal_vetoed` | `workspace_id`, `agent_id`, `proposal_id`, `reason` | Proposal vetoed |
| `agent_harness_reverted` | `workspace_id`, `agent_id`, `proposal_id` | Harness revert executed |

## Runtime Events

Emitters live in `src/observability/posthog_events.zig`.

### Startup Lifecycle

| Event | Emitter | Properties | Description |
|---|---|---|---|
| `server_started` | `cmd/serve.zig` | `port`, `worker_concurrency` | HTTP server ready to accept traffic |
| `worker_started` | `cmd/worker.zig` | `concurrency` | Worker threads spawned |
| `startup_failed` | `cmd/worker.zig` | `command`, `phase`, `reason` | Fatal startup failure (after PostHog init) |

### Auth Lifecycle

| Event | Emitter | Properties | Description |
|---|---|---|---|
| `auth_login_completed` | `auth_sessions_http.zig` | `session_id`, `request_id` | CLI auth session completed (OIDC device flow) |
| `auth_rejected` | `common.zig` | `reason`, `request_id` | Bearer token auth failed |

### Workspace Lifecycle

| Event | Emitter | Properties | Description |
|---|---|---|---|
| `workspace_created` | `workspaces_lifecycle.zig` | `workspace_id`, `tenant_id`, `repo_url`, `request_id` | New workspace provisioned via API |
| `workspace_github_connected` | `github_callback.zig` | `workspace_id`, `installation_id`, `request_id` | GitHub App OAuth callback completed |

### Run Lifecycle

| Event | Emitter | Properties | Description |
|---|---|---|---|
| `run_started` | `runs/start.zig` | `run_id`, `workspace_id`, `spec_id`, `mode`, `request_id` | Run enqueued |
| `run_retried` | `runs/retry.zig` | `run_id`, `workspace_id`, `attempt`, `request_id` | Run retry enqueued |
| `run_completed` | `pipeline/worker_stage_executor.zig` | `run_id`, `workspace_id`, `verdict`, `duration_ms` | Run finished |
| `run_failed` | `pipeline/worker_stage_executor.zig` | `run_id`, `workspace_id`, `reason`, `duration_ms` | Run failed |

### Agent & Scoring

| Event | Emitter | Properties | Description |
|---|---|---|---|
| `agent_completed` | `pipeline/worker_stage_executor.zig` | `run_id`, `workspace_id`, `actor`, `tokens`, `duration_ms`, `exit_status` | Agent stage execution finished |
| `agent.run.scored` | `pipeline/scoring.zig` | `run_id`, `workspace_id`, `agent_id`, `score`, `tier`, `score_formula_version`, `axis_scores`, `weight_snapshot`, `scored_at`, `axis_completion`, `axis_error_rate`, `axis_latency`, `axis_resource` | Quality score computed |
| `agent.scoring.failed` | `pipeline/scoring.zig` | `run_id`, `workspace_id`, `error` | Scoring computation failed |
| `agent.trust.earned` | `pipeline/scoring.zig` | `run_id`, `workspace_id`, `agent_id`, `consecutive_count_at_event` | Consecutive positive score streak |
| `agent.trust.lost` | `pipeline/scoring.zig` | `run_id`, `workspace_id`, `agent_id`, `consecutive_count_at_event` | Trust state dropped |
| `agent.harness.changed` | `pipeline/scoring.zig` | `agent_id`, `proposal_id`, `workspace_id`, `approval_mode`, `trigger_reason`, `fields_changed` | Harness config mutation applied |
| `agent.improvement.stalled` | `pipeline/scoring.zig` | `run_id`, `workspace_id`, `agent_id`, `proposal_id`, `consecutive_negative_deltas` | Score declining after proposal |

### Policy & Billing

| Event | Emitter | Properties | Description |
|---|---|---|---|
| `entitlement_rejected` | handler-level | `workspace_id`, `boundary`, `reason_code`, `request_id` | Plan limit hit |
| `profile_activated` | handler-level | `workspace_id`, `agent_id`, `config_version_id`, `run_snapshot_version`, `request_id` | Agent harness profile activated |
| `billing_lifecycle_event` | handler-level | `workspace_id`, `event_type`, `reason`, `plan_tier`, `billing_status`, `request_id` | Plan transitions and payment events |
| `api_error` | handler-level | `error_code`, `message`, `request_id`, `workspace_id` | UZ-* error code fired at HTTP boundary |

## Error Code Coverage

| Code Prefix | PostHog Coverage |
|---|---|
| `UZ-AUTH-*` | `auth_rejected` captures auth failures |
| `UZ-ENTL-*` | `entitlement_rejected` and `api_error` |
| `UZ-BILLING-*` | `billing_lifecycle_event` and `api_error` |
| `UZ-STARTUP-*` | `startup_failed` |
| `UZ-WORKSPACE-*` | `api_error` for billing/workspace enforcement failures |
| Browser/runtime UI failures | `ui_runtime_error` in Next.js app |
| CLI command failures | `cli_error` in `zombiectl` |
| `UZ-SANDBOX-*` | structured logs and metrics first; correlate with PostHog run lifecycle via `trace_id` and `run_id` |

## Adding New Events

1. Add or extend the surface helper closest to the emitter (`posthog.ts`, CLI analytics helper, or `posthog_events.zig`).
2. Keep analytics fail-open and non-blocking.
3. Allowlist or sanitize properties before capture.
4. Add unit or integration coverage for the new event path.
5. Update this document and the relevant done-spec if the surface contract changed.
Loading