test-capabilities

summary

Project overview and operator guide for test-capabilities.

read_when

You are onboarding to test-capabilities

You need commands, structure, or current repo scope

type

reference

test-capabilities

Fail-closed testing capability framework for CLI, browser, property, healing, and diagnostic root-cause workflows.

Previously tracked in this workspace as testers.

Vision

We don't build tests. We build the immune system of software.

See docs/project/vision.md for the durable north-star vision and docs/project/product-posture.md for the current product maturity snapshot.

Components

Path	Description
`src/`	TEST-CAPABILITIES testing framework (operation kernel, orchestrator, self-healing, quantum simulator, prediction engine)
`bin/test-capabilities`	TEST-CAPABILITIES CLI
`external/bombadil`	Parked repo-local Bombadil-compatible binary fallback, intentionally excluded from packed npm artifacts
`examples/demo/`	Built-in zero-external-dependency demo fixture for first functional proof
`prompts/`	LLM testing prompts (cli-tester, web-tester, api-tester)
`docs/`	Testing guides and frameworks

Documentation

Doc	Description
docs/project/vision.md	Durable product vision and strategic direction
docs/project/product-posture.md	Current product maturity, supported/unsupported boundary, and major gaps
docs/TEST-CAPABILITIES-FRAMEWORK.md	TEST-CAPABILITIES autonomous testing framework
docs/LLM-TESTING-GUIDE.md	Guide for LLM-driven testing
docs/DECISION-MATRIX.md	Tool selection decision matrix
docs/adoption/	Greenfield, brownfield, minimal first-run, and Bombadil 0.5 adoption guides
docs/dev/ts-quality-screening.md	Repo-local deterministic screening proof path via `ts-quality`
docs/npm-publishing-checklist.md	Public npm release readiness checklist
docs/releases/release-workflow.md	GitHub Release → npm Trusted Publishing workflow
docs/api/	TEST-CAPABILITIES API reference docs

Polished core use case

The first public release is centered on one zero-external-dependency flow: CLI smoke + observation diagnostics.

node ./bin/test-capabilities doctor --json
node ./bin/test-capabilities demo --json

This proves the package can load, run a real CLI command through cli-tester, and emit observation.v1 diagnostic evidence without Surf Go, Bombadil, network access, or a target application. For the full doctor -> init -> demo -> test --json path, use Minimal CLI smoke walkthrough. For adoption strategy, use Greenfield bootstrap or Brownfield integration. See examples/demo/README.md for the packaged demo fixture.

Capability Contract

The runtime is now fail-closed. If a config section, agent, command, or flag is not wired to a real implementation path, the CLI errors instead of pretending success.

First public happy path:

npx test-capabilities doctor
# or from a checkout
node ./bin/test-capabilities doctor --json

doctor has no Surf/Bombadil requirement: missing optional external runtimes are reported as warnings while package/runtime readiness checks remain required.

The shipped CLI verbs now run through a typed operation kernel exposed at src/core/operations.ts and implemented in trust-sized modules under src/core/operations/. That registry owns the supported routes, their input schemas, their executors, and their structured result shapes so the CLI wrapper stays thin. For Surf-backed web exploration, Surf Go is the standard runtime. The supported orchestrator resolves it through TEST_CAPABILITIES_SURF_GO_BIN, a source checkout referenced by TEST_CAPABILITIES_SURF_GO_REPO, or surf-go on PATH. A Surf Go source checkout can run via go -C <repo>/go run ./cmd/surf-go; build surf-go first for faster runs. Explicit Surf Go repo env vars fail closed when invalid instead of silently switching to a different runtime. surf explore now runs explicit browser-state/DOM/link probes, supports bounded same-origin --depth exploration from 1-3, and reports graded user-flow coverage from verified probe counts; empty output, help text, warning-only output, and target URLs without a matching browser-state probe fail closed as unverified coverage. For Bombadil-backed web exploration, the supported orchestrator resolves the binary through TEST_CAPABILITIES_BOMBADIL_BIN, a built source checkout referenced by TEST_CAPABILITIES_BOMBADIL_REPO, repo-local external/bombadil, or bombadil on PATH. A Bombadil-compatible source checkout only overrides the parked repo-local fallback once it has a built target/release/bombadil or target/debug/bombadil; upstream Bombadil 0.5 centralizes builds and no longer requires esbuild, though local source builds may still need project-specific prerequisites such as trunk or the project Nix shell. Bombadil 0.5 request headers, trace output paths, trace reproduction, viewport/instrumentation/permission knobs, and test-external debugger settings are exposed through agents.<name>.bombadil config. Bombadil's disabled-control skipping, quiescence timers, and dialog auto-accept behavior come from the resolved Bombadil binary itself. A bounded experimental terminal-fuzzer agent wraps bombadil terminal test -- <command> [args...] and emits normalized observation.v1 runtime evidence whose subject is the resolved terminal command, without claiming production stability or autonomy. Packed npm consumers should treat Bombadil as an external tool requirement: the package intentionally excludes external/bombadil, and npm run consumer:smoke verifies that a packed consumer without TEST_CAPABILITIES_BOMBADIL_BIN, TEST_CAPABILITIES_BOMBADIL_REPO, or bombadil on PATH receives a clear failing Bombadil finding instead of a fake pass. See docs/project/bombadil-distribution-posture.md and docs/adoption/bombadil-0.5-how-to.md. The same packed-consumer smoke also proves calibrated root_cause and low-calibration non-authoritative propagation observations survive through the distributed library API.

Implemented today

Surface	Status	Notes
`doctor` command	Implemented	Zero-external-dependency package and environment diagnostics; optional Surf Go/Bombadil-compatible runtimes warn when absent instead of failing
`init` command	Implemented	Generates a minimal valid `test-capabilities.yaml` for the zero-external-dependency `cli-tester` path and refuses overwrites without `--force`
`demo` command	Implemented	Built-in zero-external-dependency functional demo for the polished `cli-smoke-observation` use case
`test` command	Implemented	Supports `--config`, `--target`, `--quick`; URL targets apply when `quantum.enabled: true` or a supported `bombadil`/`surf` agent is enabled, and they only replace `targets.cli` when no `cli-tester` smoke is enabled
`bombadil` orchestrator agent	Implemented	Runs a bounded Bombadil exploration budget against `targets.web`; resolves the binary through explicit env, a built source checkout, repo-local parked fallback, or `PATH`
`terminal-fuzzer` orchestrator agent	Implemented	Experimental bounded wrapper for `bombadil terminal test -- <command> [args...]`; emits `observation.v1` runtime evidence for the resolved CLI/terminal command and fails closed on missing Bombadil or target command
`surf` orchestrator agent	Implemented	Runs the supported `surf explore` operation against `targets.web`; resolves Surf Go from explicit env, a source checkout, or `surf-go` on `PATH`, then reports graded user-flow coverage from verified browser-state/DOM probes
`cli-tester` orchestrator agent	Implemented	Executes `<targets.cli> --help` as a capability-backed smoke
`quantum` command	Implemented	Uses the shared simulator path
`surf explore`	Implemented	Runs Surf Go navigation plus explicit browser-state/DOM probes, optionally follows same-origin links with `--depth 1..3`, and fails closed unless the seed page verifies browser-state evidence
`heal` command	Implemented	Heuristic selector repair workflow; `--findings-input` accepts orchestrator findings JSON so proposals cite diagnostic evidence as `triggeringFindingId`; apply mode can consume a reviewed `--proposal-input` artifact and still requires an external `--checkpoint-ref`
normalized observations	Implemented	Supported orchestrator agents emit `observation.v1` diagnostic events for Surf coverage, Bombadil property exploration, and CLI smoke execution; when correlation is enabled, runs can also include component-level semantic synthesis, suite-level observation correlation, deterministic `root_cause` observations for at least two same-component independent failed-or-errored observed evidence units that agree on the same failure class, and low-calibration non-authoritative `propagation` observations across configured dependency edges
finding correlation	Implemented	Cross-finding synthesis inside the orchestrator; observation-native synthesis and calibrated root-cause observations summarize multi-sensor meaning without becoming pass/fail authority or prediction

Explicitly unsupported for now

These surfaces fail clearly when enabled or invoked:

orchestrator agents: api-fuzzer
orchestrator intelligence flags: self_healing, prediction, collective
chaos execution
CLI commands: predict, visualize, report
test flags: --autonomous, --self-heal, --predict, --fail-threshold, --upload-artifacts, --report
surf actions: flow, assert, compare, replay

Commands

# Quality gates
npm run check          # Full CI check (lint + test)
npm run lint           # Lint check
npm run fix            # Auto-fix lint issues
npm run consumer:smoke # Packed-artifact consumer contract smoke
npm run truth:gate       # Cross-check portable runtime/package/docs/passport truth surfaces
npm run release:check    # Release preflight (quality + truth-gated diagnostic corpora + packed-artifact verification)

# First-run proof (full walkthrough: docs/adoption/minimal-cli-smoke-walkthrough.md)
node ./bin/test-capabilities doctor
node ./bin/test-capabilities demo --json

# Build
npm run build          # TypeScript build

# TEST-CAPABILITIES CLI
npm run test-capabilities                # Run TEST-CAPABILITIES CLI
node ./bin/test-capabilities init --output ./test-capabilities.yaml --target node --force
node ./bin/test-capabilities doctor --config ./test-capabilities.yaml --target node --json
node ./bin/test-capabilities test --config ./test-capabilities.yaml --json
node ./bin/test-capabilities quantum --target https://example.com
node ./bin/test-capabilities surf explore --url https://example.com
node ./bin/test-capabilities heal --dir ./tests --dry-run
node ./bin/test-capabilities heal --dir ./tests --dry-run \
  --proposal-output artifacts/heal-proposals.json \
  --verification-output artifacts/heal-verification.json
node ./bin/test-capabilities heal --dir ./tests --dry-run \
  --findings-input artifacts/orchestrator-findings.json \
  --proposal-output artifacts/heal-proposals.json
node ./bin/test-capabilities heal --dir ./tests --checkpoint-ref checkpoint/test-capabilities/heal-001
node ./bin/test-capabilities heal --dir ./tests \
  --proposal-input artifacts/heal-proposals.json \
  --checkpoint-ref checkpoint/test-capabilities/heal-001

# Testing
npm test                  # Run node contract tests
npm run test:property     # fast-check invariant lane for config, route selection, and orchestrator behavior
npm run test:behavior     # cucumber-backed CLI workflow scenarios mapped to docs/examples
npm run test:ci-targeted  # CI-targeted smoke tests
npm run capability:drill  # Repo-local end-to-end drill for shipped capabilities
npm run root-cause:corpus        # Dogfood calibrated root-cause diagnosis invariants
npm run runtime-diagnostic:corpus # Dogfood calibrated diagnosis through real cli-tester subprocesses
npm run bombadil:smoke           # Richer Bombadil regression smoke against a deterministic local fixture

# Docs discovery
npm run docs:list            # List relevant docs for a task
npm run docs:list:workspace  # Workspace-wide doc scan

Screening proof path

Repo-local ts-quality screening currently starts with a narrow operation-kernel slice over the test-capabilities source surface. For the wrapper, commands, and changed-scope rules, see docs/dev/ts-quality-screening.md.

Capability drill

To exercise the shipped capabilities against deterministic local fixtures, run:

npm run capability:drill

What it checks today:

test succeeds on a real CLI smoke target and rejects inert URL overrides in quick mode
quantum succeeds on an explicit local URL and fails closed when --target is missing
heal proposes selector fixes without rewriting payload literals or custom-helper strings
heal --dry-run --proposal-output <file> --verification-output <file> writes durable proposal and in-memory verification artifacts for review or future replay-ledger follow-through without mutating files
heal requires --checkpoint-ref from an external checkpoint/restore authority before applying proposals that mutate files
heal --proposal-input <artifact> --checkpoint-ref <ref> applies proposals from a previously emitted proposal artifact instead of recomputing them during apply
surf explore runs through the shipped wrapper path and rejects invalid URLs
test runs a surf-backed orchestrator agent against a deterministic local fixture
library drills for orchestrator correlation and prediction input validation; contract tests cover calibrated root-cause observation synthesis

Surf modes:

# Auto-detect: use Surf Go when available, otherwise a deterministic shim
npm run capability:drill

# Force the deterministic shim path
bash ./scripts/capability-drill.sh --surf-mode shim

# Require a real Surf Go runtime (surf-go or source-checkout Surf Go)
bash ./scripts/capability-drill.sh --surf-mode real

# Emit machine-readable JSON for automation
bash ./scripts/capability-drill.sh --json --surf-mode shim --skip-build

The JSON mode returns a structured summary with ok, surfMode, summary, and per-check status entries so CI or agent tooling can consume the drill result without scraping terminal text.

Root-cause calibration corpus

To dogfood the calibrated diagnostic layer against deterministic fixture cases, use:

npm run root-cause:corpus

To prove the same root-cause synthesis path through real supported cli-tester subprocess execution, use:

npm run runtime-diagnostic:corpus

What it checks today:

single-agent CLI or Surf failures do not emit root_cause
two independent observed CLI command-resolution or timeout failures classify as command_resolution or timeout_or_latency, including shell not-found wording, while app crashes do not masquerade as command resolution
two independent observation-only API signals can classify as contract_mismatch, including API contract violation, property-kind payload evidence, and response-payload/required-field wording that must not masquerade as Bombadil/property failure or selector drift
bounded non-contract classes include API authentication/authorization evidence as auth_or_permission, API/web connection/DNS/TLS evidence as network_connectivity, API/CLI/web rate-limit/quota/memory/disk/file-descriptor/pool-exhaustion evidence as resource_exhaustion, and API/CLI/web missing environment/configuration evidence as configuration_error
executable-resolution evidence still classifies as command_resolution even when the missing executable is named config or app-config; real config-file/value evidence remains configuration_error
generic API runtime, property-kind runtime, stack-trace, validation, or schema exceptions without contract/auth/network/resource/configuration evidence remain component_failure_surface; recommendation-only keywords are not classifying evidence
two independent observed Surf failures classify as browser_coverage_gap, including generic DOM coverage wording that must not masquerade as selector drift
two independent observed selector/DOM drift failures classify as selector_or_dom_drift, including selector-contract wording, while single-sensor or unobserved selector drift does not emit root_cause
two independent observed Bombadil failures classify as property_violation, including required-property validation wording that must not masquerade as API contract mismatch
two independent sensors linked to the same API finding classify as contract_mismatch, even when generic browser words appear in the observations
finding-only, mixed-class evidence, all-passing linked sensors, single-sensor multi-finding, unobserved conflicting findings, partially observed evidence pairs, linked finding/current-run evidence disagreement, and same-component mixed CLI/API failure classes do not emit root_cause
unrelated ambiguous signals in one component, including a suppressed same-component mixed-class ambiguity, do not suppress a calibrated same-run diagnosis for another component
independent CLI and API failures can emit simultaneous component-scoped root_cause observations
three-sensor agreement produces high-calibration root_cause with signalCount and sensorCount reflecting all three observers
independent Bombadil + CLI failures emit two component-scoped root_cause observations (property_violation + command_resolution)
three-way simultaneous Surf + CLI + API failures emit three component-scoped root_cause observations
propagation synthesis covers default api -> web, cli -> api, and cli -> web edges, including API latency links with web runtime failures, same-timeout shared-infra links, and API schema-drift-to-UI links with web runtime failures; it supports intelligence.propagationTopology overrides for custom edges, suppresses generic component-failure-only, non-latency same-class, and Surf evidence-gap overclaims, and stays low-calibration/non-authoritative
root-cause and propagation output exclude prediction language and synthetic corr-* IDs; auth-boundary, network-connectivity, resource-exhaustion, or configuration failures do not imply downstream propagation without a separately promoted bounded link

Machine-readable mode emits aggregate coverage floors, exact release truth-lock counts, per-case expected/actual classification, root-cause count, calibration counts, linked finding IDs, propagation counts, propagation subjects, propagation links, and no-propagation guardrail markers for automation without scraping terminal text. Runtime observations also expose structured semantics.failureClass for root_cause and semantics.propagationLink for propagation while retaining evidence strings for backward-compatible inspection:

npm run --silent root-cause:corpus -- --json
npm run --silent runtime-diagnostic:corpus -- --json

The runtime diagnostic corpus is intentionally narrower than the fixture corpus: it creates temporary local CLI fixtures and checks single-sensor suppression, two-sensor command_resolution, two-sensor timeout_or_latency, same-component mixed-class suppression, correlation-disabled suppression, unique observation IDs, and absence of prediction/causal/repair-order language without using network, databases, or external binaries.

Bombadil richer smoke fixture

To run a richer local Bombadil regression against a deterministic multi-control fixture, use:

npm run bombadil:smoke

What it does:

serves examples/bombadil-rich/site/ on a temporary local port
runs Bombadil directly and expects trace artifacts under a temporary output directory
runs test-capabilities test --quick with a Bombadil-backed config against the same local fixture

Useful options:

# Reuse an already-built dist/
bash ./scripts/bombadil-rich-smoke.sh --skip-build

# Run only the direct Bombadil phase
bash ./scripts/bombadil-rich-smoke.sh --direct-only

# Run only the TEST-CAPABILITIES wrapper phase
bash ./scripts/bombadil-rich-smoke.sh --tc-only

# Keep the generated fixture/output directory for inspection
bash ./scripts/bombadil-rich-smoke.sh --keep-temp

The richer fixture currently lives at examples/bombadil-rich/site/ and includes intra-origin navigation, toggles, select inputs, a form, and stateful UI so Bombadil can explore more than the minimal capability-drill page.

Structure

test-capabilities/
├── bin/               # TEST-CAPABILITIES CLI
├── docs/
│   ├── api/           # TEST-CAPABILITIES API reference
│   ├── project/       # Vision, goals
│   ├── _core/         # Immutable core docs
│   ├── org_context/   # Organizational context
│   ├── learnings/     # Crystallized patterns
│   └── system4d/      # System4D framework docs
├── examples/          # Test patterns, sample specs
├── external/          # Vendored tools (bombadil)
├── flows/             # Test flow definitions
├── ontology/          # Generated test artifacts
├── policy/            # Stack lane, security policies
├── prompts/           # LLM testing prompts
├── src/               # TEST-CAPABILITIES framework source
│   ├── core/          # Orchestrator
│   ├── healing/       # Self-healing
│   ├── integrations/  # External tool clients
│   ├── prediction/    # Prediction engine
│   └── quantum/       # Quantum simulator
├── scripts/           # CI, quality gates, tooling
└── tests/             # Test files

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.githooks		.githooks
.github/workflows		.github/workflows
.migration-backup/pre-template		.migration-backup/pre-template
.ts-quality		.ts-quality
bin		bin
docs		docs
examples		examples
external		external
flows		flows
gitlab/ci		gitlab/ci
governance		governance
ontology		ontology
policy		policy
prompts		prompts
scripts		scripts
src		src
tests		tests
.copier-answers.yml		.copier-answers.yml
.env.example		.env.example
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.migration-notes.txt		.migration-notes.txt
.nvmrc		.nvmrc
AGENTS.md		AGENTS.md
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
biome.jsonc		biome.jsonc
package-lock.json		package-lock.json
package.json		package.json
test-capabilities.yaml		test-capabilities.yaml
ts-quality.config.json		ts-quality.config.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

test-capabilities

Vision

Components

Documentation

Polished core use case

Capability Contract

Implemented today

Explicitly unsupported for now

Commands

Screening proof path

Capability drill

Root-cause calibration corpus

Bombadil richer smoke fixture

Structure

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

test-capabilities

Vision

Components

Documentation

Polished core use case

Capability Contract

Implemented today

Explicitly unsupported for now

Commands

Screening proof path

Capability drill

Root-cause calibration corpus

Bombadil richer smoke fixture

Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages