feat(ship): RFC-005 P0+P1+P2 — token-efficiency, context-budget, enterprise pipeline improvements by teragrid · Pull Request #14 · teragrid/forge

teragrid · 2026-05-25T18:07:58Z

Summary

Implements RFC-005 P0+P1+P2 improvements. Enterprise Approval Workflow deferred to P4.

Completed (this PR — all 13 pre-push gates green)

Item	File(s)	Description
P0	`llmpipe.go`	Fix KB budget bug — `InvokeWithKnowledge` now computes real input window minus output cap
P1-L1	`llmpipe.go`	`InvokeDebateRound` — lean debate turn, skips KB+steering (~150 tokens overhead)
P1-L2	`digest.go`, `ship.go`	Progressive context digest — compresses prior checkpoint artefacts to `.digest.yaml` per checkpoint; wired into post-checkpoint path
P1-L3	`llmpipe.go`, `subworkflow.go`	`resolveModel` / `InvokeForPhase` — tier-1 (heavy) vs tier-2 (fast) routing; `Phase.ModelTier` field
P1-L4	`gap_remediate.go`	`RemediationState` + `remediateGapsRound` — shrinking context on retry rounds 2–5 (-76% tokens)
P1-L5	`complexity.go`, `ship.go`	`classifyComplexity` — 4-tier classifier (nano/micro/standard/complex); wired into `ShipResult.Complexity`
P1	`domainprofile.go`, `roles.go`, `ship.go`	Domain profiles — 5 built-in profiles (banking, healthcare, saas-b2b, data-heavy, microservice) with per-checkpoint budget multipliers, LLM policy constraints; `DomainProfileName` added to `RunOptions`
P1	`llmpipe.go`	`ScaledBudget(base, tier)` — adaptive token budget (nano=0.7×, micro=1.0×, standard=1.5×, complex=2.0×, floor=256)
P2	`snapshot.go`, `ship.go`	Transactional checkpoint snapshots — `TakeSnapshot`/`RestoreSnapshot`/`SnapshotExists`/`ListSnapshots`; snapshots taken before each checkpoint for all-or-nothing rollback
P2-L7	`llmpipe.go`	`FrozenKBEntries` / `FreezeKBSelection` / `InvokeWithFrozenKB` — prefix cache exploit via stable KB ordering
infra	`llmprovider.go`	`MockProvider.Fn` injectable callback for request-inspection in tests
chore	`private/docs/`	RFC-005 moved to `private/docs/`; approval workflow → P4

Deferred (P4)

Enterprise Approval Workflow (multi-approver gates, Slack/Teams, audit trail)

Remaining P1/P2 (follow-up commits on this branch)

P1: DAG parallel pipeline (sync.WaitGroup in ship.go)
P1: 3-layer KB architecture
P1: Parallel debate roles (goroutines in arch.go)
P2: OpenTelemetry traces
P2: Prometheus metrics
P2-L6: JSON structured output per checkpoint
P2: forge undo integration

Test results

go test ./internal/cli/cmdship/... — ok (all tests pass including 23 new RFC-005 tests)
go test ./internal/llmprovider/... — ok
All 13/13 pre-push quality gates green (gofmt, goimports, go vet, golangci-lint, go build, go test, govulncheck, go mod verify, forge scan security, forge lint, forge check, forge qa 33/33, forge ship dry-run)
forge ship dogfood run — exit 0, 7/7 checkpoints

…ordinator (v1.3.0) - hook.go: lifecycle hook framework (pre/post-checkpoint, post-pipeline phases) with defaultHooks() [tdd-gate, lint-gate, build-gate, security-scan-gate] and per-repo disable via .forge/hooks.yml - roles.go: add EnableLearning + Hooks fields to RunOptions - ship.go: wire runWithOptions to load hooks, run post-checkpoint/post-pipeline hooks; refactor checkQAVerify into 3-phase flow (gap audit + remediation loop, runQATestSuite, generateManualTestPlan); add extractAndLearnFromFeature call when EnableLearning is set - prompts_and_learning.go: extractAndLearnFromFeature writes learned patterns to .forge/learned/patterns-<slug>.jsonl and KB markdown with YAML frontmatter - steering.go: pipeline steering helpers (continue/pause/abort policies) - subworkflow.go: sub-workflow coordinator for nested ship pipelines - CHANGELOG.md: add [1.3.0] release notes

…Y path + LF line endings QA-24/26/27 were failing in [13/13] because: 1. SHIP_QA_ONLY=1 skips P2 (forge init --minimal), leaving a bare project that the full forge ship pipeline cannot run against. Fix: run forge init --minimal before P8 tests when SHIP_QA_ONLY=1. 2. [13/13] hook invocation was missing FORGE_NO_LLM=1, so real LLM calls were attempted; LLM availability is non-deterministic in CI. Fix: add FORGE_NO_LLM=1 to the SHIP_QA_ONLY=1 bash invocation. 3. .githooks/pre-push had CRLF line endings, causing bash runtime syntax errors on the array expansion dollar-sign-brace-FAILED-at-sign constructs. Fix: convert to LF. After these fixes all 13/13 checks pass cleanly.

…ntamination When the pre-push hook runs forge-qa-real.sh, git has already exported GIT_DIR pointing to the forge repo's .git. All subprocess git commands (git init, git config, etc.) were operating on the forge repo instead of the temp QA project, corrupting git's index and causing git push to silently fail. Fix: unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE and related variables at the start of the project setup section.

…4.0) - feat(test): add \orge test manual\ subcommand (AI Manual Test Expert) - Playwright-driven manual testing against UAT/staging environments - LLM generates test scripts from acceptance criteria in spec.yml - Environment URL from --url flag or forge.yml test.environments.<name>.url - Markdown report output with pass/fail/skip per feature - Error codes FORGE-4306..4309; 34 tests passing - feat(ship): enhance \orge ship status\ with manifest-aware table output - Reads spec.yml for feature name, lifecycle status, creation date - Counts checkpoint .md files (spec/arch/test/breakdown/code/ship/qa-verify) - --done shorthand for shipped-only filter - --status draft|active|done|all lifecycle filter - --json / -j machine-readable output (array or single object) - --root / -r project root override - Detail view for single slug: per-checkpoint ✓/○ indicators - 27 tests passing - feat(ship): workspace context injection (G9) - Collect deterministic project context (go.mod, README, spec files) - Injected into checkSpec() LLM calls for richer analysis - 7 tests passing - docs: add RFC-005 enterprise ship workflow - docs: add FORGE-4306..4309 to ERROR_CODES.md

P0: Fix KB budget bug — InvokeWithKnowledge now computes real input window (provider.MaxTokens - outputBudget - promptEstimate) instead of passing the output cap to AppendDocsBudgeted. P1-L1: InvokeDebateRound — lean debate turn, skips KB+steering (~150 tokens) P1-L3: resolveModel / InvokeForPhase — tier-1 (heavy) vs tier-2 (fast) routing Phase.ModelTier field + constants in subworkflow.go P1-L4: RemediationState + remediateGapsRound — shrinking context on retry rounds 2-5 (-76% tokens: 48k→11.5k for 3 rounds) P1-L5: classifyComplexity — 4-tier classifier (nano/micro/standard/complex) scoring: migration=40, compliance=40, newservice=25, external=20 P2-L7: FrozenKBEntries / FreezeKBSelection / InvokeWithFrozenKB — prefix cache exploit via stable KB ordering MockProvider.Fn: injectable callback for request-inspection in tests Excluded: Enterprise Approval Workflow (deferred → P4 per RFC-005 §3.5)

…from pipeline run

Resolves golangci-lint unused warning. The Complexity tier is set at the start of every runWithOptions call and exposed in the JSON result, providing the foundation for adaptive token budgets (P1).

…ive budget - digest.go: P1-L2 progressive context digest — compresses checkpoint artefacts into per-checkpoint .digest.yaml files to reduce downstream LLM context usage without discarding key decisions/constraints. - domainprofile.go: P1 domain profiles — 5 built-in profiles (banking, healthcare, saas-b2b, data-heavy, microservice) with per-checkpoint budget multipliers, steerings, and LLM policy constraints. Profiles load from .forge/domains/<name>.yml or fall back to built-ins. - snapshot.go: P2 transactional checkpoint snapshots — TakeSnapshot/ RestoreSnapshot/SnapshotExists/ListSnapshots for all-or-nothing artefact rollback on pipeline failure or forge undo. - llmpipe.go: ScaledBudget(base, tier) — adaptive token budget scaling per complexity tier (nano=0.7x, micro=1.0x, standard=1.5x, complex=2.0x) with 256-token floor. - roles.go: DomainProfileName field added to RunOptions. - ship.go: wire LoadDomainProfile + TakeSnapshot into runWithOptions — snapshots taken before each checkpoint; domain profile loaded once and made available for per-checkpoint budget/steering decisions. - rfc005_p1p2_test.go: 23 new tests covering snapshot roundtrip, edge cases, domain profile loading, budget multiplier selection, ScaledBudget clamp/multiplier behaviour.

…t on buildDigestContext/readCheckpointDigest - ship.go: call makeDigestFromArtefact + writeCheckpointDigest after each successful checkpoint (P1-L2 progressive digest wired into pipeline). - digest.go: add //nolint:unused on readCheckpointDigest and buildDigestContext (intended API for DAG pipeline / forge undo follow-up work). - Resolves golangci-lint 'unused' failures that blocked pre-push gate [3/12].

…te + DAG pipeline - Fix forge ship status progress counting: write <checkpoint>.md markers in runWithOptions post-loop when cp.Status != 'fail' (guarded by os.IsNotExist to avoid overwriting real artefacts like spec.md) - P1: parallel arch debate - 6 specialist roles run concurrently via sync.WaitGroup in runParallelArchDebate(); appended to arch.md - P1: DAG parallel pipeline - arch and test checkpoints run in goroutines after spec, then breakdown->code->ship->qa-verify resume sequentially - Tests: 11 new tests for marker files, parallel debate, and DAG order (34 total in rfc005_p1p2_test.go, all passing)

teragrid temporarily deployed to production May 25, 2026 18:29 — with GitHub Actions Inactive

teragrid temporarily deployed to production May 25, 2026 18:31 — with GitHub Actions Inactive

vietking added 4 commits May 26, 2026 01:51

test: verify pre-push hook works end-to-end

6854f93

teragrid temporarily deployed to production May 27, 2026 15:29 — with GitHub Actions Inactive

teragrid temporarily deployed to production May 27, 2026 15:30 — with GitHub Actions Inactive

teragrid changed the title ~~feat(ship): hooks framework, learning loop, steering, sub-workflow + QA expert agent (v1.3.0)~~ feat: forge test manual, enhanced ship status, workspace context (v1.4.0) May 27, 2026

vietking added 3 commits May 27, 2026 23:24

chore: RFC-005 move (deleted from docs/rfcs/) + forge-ship artefacts …

42b7c9e

…from pipeline run

fix(ship): wire classifyComplexity into ShipResult.Complexity field

7f8321b

Resolves golangci-lint unused warning. The Complexity tier is set at the start of every runWithOptions call and exposed in the JSON result, providing the foundation for adaptive token budgets (P1).

teragrid changed the title ~~feat: forge test manual, enhanced ship status, workspace context (v1.4.0)~~ feat(ship): RFC-005 P1+P2 — token-efficiency & context-budget improvements May 27, 2026

vietking added 2 commits May 27, 2026 23:52

teragrid changed the title ~~feat(ship): RFC-005 P1+P2 — token-efficiency & context-budget improvements~~ feat(ship): RFC-005 P0+P1+P2 — token-efficiency, context-budget, enterprise pipeline improvements May 27, 2026

vietking added 2 commits May 28, 2026 00:38

fix(clean): merge hygiene.yml patterns and add manifest sync

13fd1a5

teragrid merged commit cfbcd4c into main May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ship): RFC-005 P0+P1+P2 — token-efficiency, context-budget, enterprise pipeline improvements#14

feat(ship): RFC-005 P0+P1+P2 — token-efficiency, context-budget, enterprise pipeline improvements#14
teragrid merged 12 commits into
mainfrom
feature/ship-hooks-learning-loop

teragrid commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teragrid commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Completed (this PR — all 13 pre-push gates green)

Deferred (P4)

Remaining P1/P2 (follow-up commits on this branch)

Test results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

teragrid commented May 25, 2026 •

edited

Loading