sphinx-needs-demo feedback: setup, tailoring, and atomic-skill alignment gaps

## Summary

Bringing Pharaoh up on `useblocks/sphinx-needs-demo` (PR `useblocks/sphinx-needs-demo#51`) surfaced a series of structural gaps between what `pharaoh-setup` produces, what the canonical schemas in `examples/score/.pharaoh/project/schemas/` expect, what the atomic-skill vocabulary supports, and what an industry V-model project (SYS/SWE split + ISO 26262 safety V) actually needs.

These are not bugs in any single skill but recurring patterns across the bootstrap → tailoring → review → atomic-author chain. Filing as one consolidated feedback issue rather than splitting into per-skill bugs, because the same underlying decisions surface in multiple places. Two narrow follow-up bugs from the same exercise are tracked separately as #11 and #12.

## 1. Bootstrap output violates the canonical JSON Schema shipped in `examples/score/`

`examples/score/.pharaoh/project/schemas/` ships canonical JSON Schemas for `artefact-catalog.yaml`, `workflows.yaml`, `id-conventions.yaml`, and `checklists-frontmatter.schema.json`. They are well-formed and `score`'s own tailoring validates against them. Bootstrap-generated files do not:

* `artefact-catalog.yaml`: schema declares `additionalProperties: false`. `pharaoh-tailor-bootstrap` emits `child_of` and `lifecycle_ref` keys that are not in the schema. The skill's expected_output is therefore schema-invalid by Pharaoh's own published contract. Every project bootstrapped today inherits the violation.
* `workflows.yaml`: schema requires top-level `lifecycle_states: [...]` plus a flat `transitions:` array. `pharaoh-tailor-bootstrap` emits per-type maps with inline `{from, to, gate}` transitions and per-type `initial:`/`final:` strings.
* `id-conventions.yaml`: schema accepts both bootstrap and `tailor-fill` shapes. The `prefixes` value is permissive enough that bootstrap stores identifier prefixes (`FEAT_`) and `tailor-fill` SKILL.md template stores human descriptions (`"requirement (guide-level)"`). A draft skill computing `f"{prefix}{tail}"` produces garbage on the description form.

The schemas themselves are also hidden: shipped only in `examples/score/`, not at any canonical install path, not loaded at runtime by any skill, not referenced by `shared/tailoring-access.md`. `pharaoh-tailor-review` claims to enforce schema rules (cross-file C1, C2, C5) but reads no schema file. If wired up properly it would reject most real-world catalogs except `score`'s.

## 2. `pharaoh-tailor-bootstrap` and `pharaoh-tailor-fill` emit incompatible shapes

Even setting the canonical schema aside, the two sibling tailoring authors disagree on both files in different directions:

* `workflows.yaml`: bootstrap emits per-type maps with `{from, to, gate}`. `tailor-fill` emits flat `lifecycle_states` map plus `transitions[*].requires`. Predicate vocabulary differs (`gate: "<string>"` vs `requires: [<list>]`). `pharaoh-lifecycle-check` Step 4b iterates `transitions[*].requires` and would silently see undefined on every gate-style transition.
* `id-conventions.yaml`: both emitters use a flat shape, but `tailor-fill` emits `id_regex_exceptions:`. `pharaoh-id-convention-check` reads `id_regex_by_type:`. The two never meet, so per-type regex overrides authored by `tailor-fill` are silently ignored.

A project bootstrapped today and re-tailored with `tailor-fill` later (the natural maturation path the gate-enablement ladder anticipates) would have both files rewritten in incompatible directions.

## 3. Release-gate fields are consumed but no emitter writes them

`pharaoh-link-completeness-check` reads `required_links` / `optional_links`. `pharaoh-output-validate` reads `required_metadata_fields`. `pharaoh-review-completeness` reads `required_roles`. `pharaoh-quality-gate` aggregates these. None of them appear in any tailoring emitter or in the canonical schema. The release-gate is a silent no-op on every project that follows the documented setup path.

## 4. `pharaoh.toml` has three undocumented sections

Three sections are consumed by 14+ skills but absent from `pharaoh.toml.example`:

* `[pharaoh.codelink_comments]`: read by `pharaoh-req-codelink-annotate` and `pharaoh-req-from-code` (branch on `.mode = "codelinks" | "backref"`). The annotate skill ships a `tailoring_patch` proposal pointing at this section, an extension point that has been undocumented since landing.
* `[pharaoh.diagrams]` / `[pharaoh.diagrams.<type>]` / `[pharaoh.diagrams.type_styles]`: read by 11 diagram-draft skills plus `pharaoh-feat-component-extract` plus `pharaoh-feat-flow-extract`. Documented only in `shared/diagram-tailoring.md`, invisible to a project author copying the example.
* `[pharaoh.quality_gate].strict`: read by `pharaoh-diagram-lint`. SKILL.md says "Plans wire this to ...". The section does not exist anywhere.

Plus `[pharaoh.codelinks].src_dir` consumed by `pharaoh-write-plan` and `pharaoh-feat-file-map`, while the example documents only `enabled` under `[pharaoh.codelinks]`.

Workflow-default disagreement on top: `shared/strictness.md` defaults `(require_change_analysis=true, require_verification=true, require_mece_on_release=false)`. `pharaoh-gate-advisor` defaults all three to `false`. The example ships `(true, true, false)`. A skill that fills missing values from one default-source reaches different gate verdicts than one that fills from the other on identical TOML.

The `mode` field documented in `pharaoh-setup` Step 2a.bis (`reverse-eng | greenfield | steady-state`) is persisted only as a single-line comment above `[pharaoh.workflow]`. No skill parses it. The classification is non-load-bearing after setup ends.

## 5. Two user-facing slash commands are dead

`/pharaoh.author` and `/pharaoh.verify` prompts under `.github/prompts/` declare `agent: pharaoh.author` and `agent: pharaoh.verify` respectively in their frontmatter. Neither agent exists under `.github/agents/`. Two of seven user-entry slash commands dispatch to nothing.

## 6. Dangling chain reference

`pharaoh-arch-review.SKILL.md` declares `chains_from: [pharaoh-arch-draft, pharaoh-arch-regenerate]`. `pharaoh-arch-regenerate` does not exist (only `pharaoh-req-regenerate` ships). Probably a typo or unimplemented sibling.

## 7. Two parallel disjoint dependency declarations

Across 71 skills, two non-overlapping vocabularies coexist:

* SKILL.md frontmatter uses `chains_from:` and `chains_to:` (atomic mechanics).
* agent.md frontmatter uses `handoffs: [{label, agent, prompt}]` (Copilot UX prompts).

The two never agree on a single edge. 22 of 71 skill pairs have asymmetric chain declarations (one side empty). 16 of 71 SKILL.md files declare `chains_from` with no agent.md equivalent, so Copilot users have no view of skill prerequisites at all.

45 of 71 skills are graph-orphans relative to the 5 user-entry prompts (`change`, `mece`, `plan`, `release`, `trace`). The atomic-skill layer (diagrams, tailor mechanics, audit mechanics, reverse-engineering, plan orchestration) has no inbound path from any user entry. Discovery routes only through `pharaoh-flow` / `pharaoh-write-plan` / `pharaoh-audit-fanout`, which are themselves orphans.

## 8. Setup invents structure instead of reading the project

`pharaoh-setup` Step 2 generates `pharaoh.toml` and `.pharaoh/project/*.yaml` from heuristic defaults rather than from the project's declared conventions. On `useblocks/sphinx-needs-demo`:

* `id_scheme.pattern` defaulted to `{TYPE}_{NUMBER}` while observed IDs use domain prefixes (`BRAKE_CTRL_01`, `FSR_POWER_01`). Pattern should reflect the observed `{DOMAIN}_{NUMBER}` shape.
* `workflows.yaml` lifecycle defaulted to `draft → reviewed → approved` (Pharaoh-internal). The corpus actually uses `open` (145 needs), `closed` (16), `passed` (7), `approved` (2). Setup never read RST status histograms.
* `artefact-catalog.yaml optional_fields` was populated with Pharaoh-internal `reviewer`, `approved_by`, `source_doc` only. The project's declared `[needs.fields.X]` (16 fields including `asil`, `severity`, `exposure`, `controllability`, `scenario`, `safe_state`, `customer`, `effort`, `approved`, `jira`, `github`, `role`, `contact`, `image`, `date`) was not consulted at all.
* `id-conventions.yaml prefixes` mirrored the broken declarations in `[[needs.types]]` verbatim, including three real-world collisions (`R_` for `req`+`release`, `T_` for `test`+`team`, `_` for `arch`+`need`). Setup did not detect the collisions.
* `traceability.required_links` was inferred from a heuristic name table (`implements -> "spec -> impl"`). Direction was inverted on every chain in this project. After correcting per-corpus, real coverage was 100% on `spec → req`, `arch → req`, `safety_goal → hazard`, `fsr → safety_goal` (tracked separately as #11).
* `mode` was heuristically classified as `reverse-eng` because `needs.json` did not exist on disk. `needs.json` is a gitignored build artefact, so a fresh clone of any project (even one with thousands of declared needs) is classified `reverse-eng` until the user runs `sphinx-build`.

For a `reverse-eng` mode project the design goal is to capture what exists, not impose a new standard. The current setup defaults are prescriptive rather than descriptive.

## 9. Atomic-skill vocabulary doesn't cover V-model SYS/SWE projects

`useblocks/sphinx-needs-demo` declares 19 sphinx-needs types corresponding to ASPICE-style SYS/SWE separation (`req`, `sysreq`, `sys-arch`, `swreq`, `swarch`) plus classical V-model (`spec`, `impl`, `test`) plus ISO 26262 safety V (`hazard`, `safety_goal`, `fsr`) plus structural (`arch`, `component`, `interface`, `seq_msg`, `person`, `team`, `release`, `need`).

Pharaoh atomic skills mix three vocabulary-binding strategies:

* Fully parameterised by `target_level`: `pharaoh-feat-draft-from-docs` works cleanly on any top-level type.
* Parameterised but with feat/comp_req-flavoured prose throughout (`pharaoh-req-from-code`): mechanically OK, prose examples mislead on non-feat/CREQ projects.
* Hardcoded vocabulary:
  * `pharaoh-arch-draft` accepts only `module` / `component` / `interface` as `arch_type` and FAILs on any other value. Drafting `swarch`, `sys-arch`, or higher-level architecture is unsupported.
  * `pharaoh-vplan-draft` hardcodes `tc__` prefix and `tc` catalog key. On a project whose `test` type uses prefix `T_`, the skill generates IDs that violate the project's `id_regex`.

There are no drafting skills at all for safety V artefact types (`hazard`, `safety_goal`, `fsr`). `useblocks/sphinx-needs-demo`'s 1 hazard, 20 safety_goals, and 36 FSRs cannot be reverse-engineered or drafted by Pharaoh atomics. They can only be reviewed by generic `pharaoh-req-review`.

Of the 19 types declared in `useblocks/sphinx-needs-demo`, only 3 have a Pharaoh-skill authoring path (`req`, partially `arch`, partially `test`). The remaining 16 sit in the catalog as declarations with no skill bridge. A user invoking `@pharaoh.req-draft` to write a "sw-level safety requirement" today receives a `req`-typed directive with `R_` prefix regardless of whether they wanted `swreq` (`SWREQ_`) or `fsr` (`FSR_`).

## 10. `pharaoh-flow` skips the safety V

`pharaoh-flow.SKILL.md` description claims to orchestrate the V-model end-to-end (req → arch → vplan → fmea). The body never references `swreq`/`sysreq`/`sys-arch`/`swarch` and skips the safety V entirely (no `hazard`, `safety_goal`, `fsr` stages). End-to-end V-model orchestration with SYS/SWE split is unimplemented.

## 11. Copilot agents have no runtime

`.github/agents/pharaoh.*.agent.md` and `.github/prompts/pharaoh.*.prompt.md` are read by Copilot Chat as plain markdown instructions. No Jinja templating, no YAML/TOML parser, no Python runner. Anything an SKILL.md specifies as "read X, parse Y, emit Z" must be performed inline by the LLM.

Consequences:

* Every fixture-validated deterministic output (`pharaoh-tailor-bootstrap` byte-exact equality, `pharaoh-output-validate` schema validation, `pharaoh-id-allocate` regex matches) is non-binding under Copilot. The LLM emits "approximately" the right shape.
* Schema fragmentation (sections 1, 2, 4) is structurally amplified: under a real runner, a single canonical schema could be enforced. Under markdown-only execution, the schema lives in the LLM's head and drifts per call.
* Atomicity contracts (skill criterion (a) "indivisible, one input → one output") cannot be enforced. The LLM may inline-fan-out or merge tasks based on context budget.

## Suggested direction

Each section above could split into its own bug, but the underlying pattern is one of design assumptions getting baked into different layers without a mechanism to keep them coherent: schemas in `examples/score/`, generators in `pharaoh-tailor-*`, type-vocabulary hardcoded in atomic skills, defaults documented in three places that disagree.

A productive next step might be:

1. Promote `examples/score/.pharaoh/project/schemas/` to a canonical install path and have `pharaoh-tailor-review` actually load and apply the schemas. Update the `pharaoh-tailor-bootstrap` and `pharaoh-tailor-fill` outputs to validate.
2. Make every atomic skill fully parameterised by `target_level` (the `pharaoh-feat-draft-from-docs` model) and remove hardcoded type-name allow-lists from `pharaoh-arch-draft` and `pharaoh-vplan-draft`.
3. Document the consumed `pharaoh.toml` sections (`[pharaoh.codelink_comments]`, `[pharaoh.diagrams]`, `[pharaoh.quality_gate]`) in `pharaoh.toml.example` and reconcile workflow-flag defaults to a single source.
4. Add drafting skills for the safety V (`pharaoh-hazard-draft`, `pharaoh-safety-goal-draft`, `pharaoh-fsr-draft`) or document `pharaoh-req-draft` as the canonical drafter for any requirement-shaped artefact regardless of safety level.
5. Have `pharaoh-setup` consult `[needs.fields.X]` from `ubproject.toml` and the RST status histogram from existing needs before writing tailoring defaults.

Repro context: `useblocks/sphinx-needs-demo` PR #51, commit history reflects the iterative correction of per-section findings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sphinx-needs-demo feedback: setup, tailoring, and atomic-skill alignment gaps #13

Summary

1. Bootstrap output violates the canonical JSON Schema shipped in `examples/score/`

2. `pharaoh-tailor-bootstrap` and `pharaoh-tailor-fill` emit incompatible shapes

3. Release-gate fields are consumed but no emitter writes them

4. `pharaoh.toml` has three undocumented sections

5. Two user-facing slash commands are dead

6. Dangling chain reference

7. Two parallel disjoint dependency declarations

8. Setup invents structure instead of reading the project

9. Atomic-skill vocabulary doesn't cover V-model SYS/SWE projects

10. `pharaoh-flow` skips the safety V

11. Copilot agents have no runtime

Suggested direction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sphinx-needs-demo feedback: setup, tailoring, and atomic-skill alignment gaps #13

Description

Summary

1. Bootstrap output violates the canonical JSON Schema shipped in examples/score/

2. pharaoh-tailor-bootstrap and pharaoh-tailor-fill emit incompatible shapes

3. Release-gate fields are consumed but no emitter writes them

4. pharaoh.toml has three undocumented sections

5. Two user-facing slash commands are dead

6. Dangling chain reference

7. Two parallel disjoint dependency declarations

8. Setup invents structure instead of reading the project

9. Atomic-skill vocabulary doesn't cover V-model SYS/SWE projects

10. pharaoh-flow skips the safety V

11. Copilot agents have no runtime

Suggested direction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Bootstrap output violates the canonical JSON Schema shipped in `examples/score/`

2. `pharaoh-tailor-bootstrap` and `pharaoh-tailor-fill` emit incompatible shapes

4. `pharaoh.toml` has three undocumented sections

10. `pharaoh-flow` skips the safety V