Skip to content

sphinx-needs-demo feedback: setup, tailoring, and atomic-skill alignment gaps #13

@bburda

Description

@bburda

Summary

Bringing Pharaoh up on useblocks/sphinx-needs-demo (PR useblocks/sphinx-needs-demo#51) surfaced a series of structural gaps between what pharaoh-setup produces, what the canonical schemas in examples/score/.pharaoh/project/schemas/ expect, what the atomic-skill vocabulary supports, and what an industry V-model project (SYS/SWE split + ISO 26262 safety V) actually needs.

These are not bugs in any single skill but recurring patterns across the bootstrap → tailoring → review → atomic-author chain. Filing as one consolidated feedback issue rather than splitting into per-skill bugs, because the same underlying decisions surface in multiple places. Two narrow follow-up bugs from the same exercise are tracked separately as #11 and #12.

1. Bootstrap output violates the canonical JSON Schema shipped in examples/score/

examples/score/.pharaoh/project/schemas/ ships canonical JSON Schemas for artefact-catalog.yaml, workflows.yaml, id-conventions.yaml, and checklists-frontmatter.schema.json. They are well-formed and score's own tailoring validates against them. Bootstrap-generated files do not:

  • artefact-catalog.yaml: schema declares additionalProperties: false. pharaoh-tailor-bootstrap emits child_of and lifecycle_ref keys that are not in the schema. The skill's expected_output is therefore schema-invalid by Pharaoh's own published contract. Every project bootstrapped today inherits the violation.
  • workflows.yaml: schema requires top-level lifecycle_states: [...] plus a flat transitions: array. pharaoh-tailor-bootstrap emits per-type maps with inline {from, to, gate} transitions and per-type initial:/final: strings.
  • id-conventions.yaml: schema accepts both bootstrap and tailor-fill shapes. The prefixes value is permissive enough that bootstrap stores identifier prefixes (FEAT_) and tailor-fill SKILL.md template stores human descriptions ("requirement (guide-level)"). A draft skill computing f"{prefix}{tail}" produces garbage on the description form.

The schemas themselves are also hidden: shipped only in examples/score/, not at any canonical install path, not loaded at runtime by any skill, not referenced by shared/tailoring-access.md. pharaoh-tailor-review claims to enforce schema rules (cross-file C1, C2, C5) but reads no schema file. If wired up properly it would reject most real-world catalogs except score's.

2. pharaoh-tailor-bootstrap and pharaoh-tailor-fill emit incompatible shapes

Even setting the canonical schema aside, the two sibling tailoring authors disagree on both files in different directions:

  • workflows.yaml: bootstrap emits per-type maps with {from, to, gate}. tailor-fill emits flat lifecycle_states map plus transitions[*].requires. Predicate vocabulary differs (gate: "<string>" vs requires: [<list>]). pharaoh-lifecycle-check Step 4b iterates transitions[*].requires and would silently see undefined on every gate-style transition.
  • id-conventions.yaml: both emitters use a flat shape, but tailor-fill emits id_regex_exceptions:. pharaoh-id-convention-check reads id_regex_by_type:. The two never meet, so per-type regex overrides authored by tailor-fill are silently ignored.

A project bootstrapped today and re-tailored with tailor-fill later (the natural maturation path the gate-enablement ladder anticipates) would have both files rewritten in incompatible directions.

3. Release-gate fields are consumed but no emitter writes them

pharaoh-link-completeness-check reads required_links / optional_links. pharaoh-output-validate reads required_metadata_fields. pharaoh-review-completeness reads required_roles. pharaoh-quality-gate aggregates these. None of them appear in any tailoring emitter or in the canonical schema. The release-gate is a silent no-op on every project that follows the documented setup path.

4. pharaoh.toml has three undocumented sections

Three sections are consumed by 14+ skills but absent from pharaoh.toml.example:

  • [pharaoh.codelink_comments]: read by pharaoh-req-codelink-annotate and pharaoh-req-from-code (branch on .mode = "codelinks" | "backref"). The annotate skill ships a tailoring_patch proposal pointing at this section, an extension point that has been undocumented since landing.
  • [pharaoh.diagrams] / [pharaoh.diagrams.<type>] / [pharaoh.diagrams.type_styles]: read by 11 diagram-draft skills plus pharaoh-feat-component-extract plus pharaoh-feat-flow-extract. Documented only in shared/diagram-tailoring.md, invisible to a project author copying the example.
  • [pharaoh.quality_gate].strict: read by pharaoh-diagram-lint. SKILL.md says "Plans wire this to ...". The section does not exist anywhere.

Plus [pharaoh.codelinks].src_dir consumed by pharaoh-write-plan and pharaoh-feat-file-map, while the example documents only enabled under [pharaoh.codelinks].

Workflow-default disagreement on top: shared/strictness.md defaults (require_change_analysis=true, require_verification=true, require_mece_on_release=false). pharaoh-gate-advisor defaults all three to false. The example ships (true, true, false). A skill that fills missing values from one default-source reaches different gate verdicts than one that fills from the other on identical TOML.

The mode field documented in pharaoh-setup Step 2a.bis (reverse-eng | greenfield | steady-state) is persisted only as a single-line comment above [pharaoh.workflow]. No skill parses it. The classification is non-load-bearing after setup ends.

5. Two user-facing slash commands are dead

/pharaoh.author and /pharaoh.verify prompts under .github/prompts/ declare agent: pharaoh.author and agent: pharaoh.verify respectively in their frontmatter. Neither agent exists under .github/agents/. Two of seven user-entry slash commands dispatch to nothing.

6. Dangling chain reference

pharaoh-arch-review.SKILL.md declares chains_from: [pharaoh-arch-draft, pharaoh-arch-regenerate]. pharaoh-arch-regenerate does not exist (only pharaoh-req-regenerate ships). Probably a typo or unimplemented sibling.

7. Two parallel disjoint dependency declarations

Across 71 skills, two non-overlapping vocabularies coexist:

  • SKILL.md frontmatter uses chains_from: and chains_to: (atomic mechanics).
  • agent.md frontmatter uses handoffs: [{label, agent, prompt}] (Copilot UX prompts).

The two never agree on a single edge. 22 of 71 skill pairs have asymmetric chain declarations (one side empty). 16 of 71 SKILL.md files declare chains_from with no agent.md equivalent, so Copilot users have no view of skill prerequisites at all.

45 of 71 skills are graph-orphans relative to the 5 user-entry prompts (change, mece, plan, release, trace). The atomic-skill layer (diagrams, tailor mechanics, audit mechanics, reverse-engineering, plan orchestration) has no inbound path from any user entry. Discovery routes only through pharaoh-flow / pharaoh-write-plan / pharaoh-audit-fanout, which are themselves orphans.

8. Setup invents structure instead of reading the project

pharaoh-setup Step 2 generates pharaoh.toml and .pharaoh/project/*.yaml from heuristic defaults rather than from the project's declared conventions. On useblocks/sphinx-needs-demo:

  • id_scheme.pattern defaulted to {TYPE}_{NUMBER} while observed IDs use domain prefixes (BRAKE_CTRL_01, FSR_POWER_01). Pattern should reflect the observed {DOMAIN}_{NUMBER} shape.
  • workflows.yaml lifecycle defaulted to draft → reviewed → approved (Pharaoh-internal). The corpus actually uses open (145 needs), closed (16), passed (7), approved (2). Setup never read RST status histograms.
  • artefact-catalog.yaml optional_fields was populated with Pharaoh-internal reviewer, approved_by, source_doc only. The project's declared [needs.fields.X] (16 fields including asil, severity, exposure, controllability, scenario, safe_state, customer, effort, approved, jira, github, role, contact, image, date) was not consulted at all.
  • id-conventions.yaml prefixes mirrored the broken declarations in [[needs.types]] verbatim, including three real-world collisions (R_ for req+release, T_ for test+team, _ for arch+need). Setup did not detect the collisions.
  • traceability.required_links was inferred from a heuristic name table (implements -> "spec -> impl"). Direction was inverted on every chain in this project. After correcting per-corpus, real coverage was 100% on spec → req, arch → req, safety_goal → hazard, fsr → safety_goal (tracked separately as pharaoh-setup: required_links direction inferred from link option labels, not actual link semantics #11).
  • mode was heuristically classified as reverse-eng because needs.json did not exist on disk. needs.json is a gitignored build artefact, so a fresh clone of any project (even one with thousands of declared needs) is classified reverse-eng until the user runs sphinx-build.

For a reverse-eng mode project the design goal is to capture what exists, not impose a new standard. The current setup defaults are prescriptive rather than descriptive.

9. Atomic-skill vocabulary doesn't cover V-model SYS/SWE projects

useblocks/sphinx-needs-demo declares 19 sphinx-needs types corresponding to ASPICE-style SYS/SWE separation (req, sysreq, sys-arch, swreq, swarch) plus classical V-model (spec, impl, test) plus ISO 26262 safety V (hazard, safety_goal, fsr) plus structural (arch, component, interface, seq_msg, person, team, release, need).

Pharaoh atomic skills mix three vocabulary-binding strategies:

  • Fully parameterised by target_level: pharaoh-feat-draft-from-docs works cleanly on any top-level type.
  • Parameterised but with feat/comp_req-flavoured prose throughout (pharaoh-req-from-code): mechanically OK, prose examples mislead on non-feat/CREQ projects.
  • Hardcoded vocabulary:
    • pharaoh-arch-draft accepts only module / component / interface as arch_type and FAILs on any other value. Drafting swarch, sys-arch, or higher-level architecture is unsupported.
    • pharaoh-vplan-draft hardcodes tc__ prefix and tc catalog key. On a project whose test type uses prefix T_, the skill generates IDs that violate the project's id_regex.

There are no drafting skills at all for safety V artefact types (hazard, safety_goal, fsr). useblocks/sphinx-needs-demo's 1 hazard, 20 safety_goals, and 36 FSRs cannot be reverse-engineered or drafted by Pharaoh atomics. They can only be reviewed by generic pharaoh-req-review.

Of the 19 types declared in useblocks/sphinx-needs-demo, only 3 have a Pharaoh-skill authoring path (req, partially arch, partially test). The remaining 16 sit in the catalog as declarations with no skill bridge. A user invoking @pharaoh.req-draft to write a "sw-level safety requirement" today receives a req-typed directive with R_ prefix regardless of whether they wanted swreq (SWREQ_) or fsr (FSR_).

10. pharaoh-flow skips the safety V

pharaoh-flow.SKILL.md description claims to orchestrate the V-model end-to-end (req → arch → vplan → fmea). The body never references swreq/sysreq/sys-arch/swarch and skips the safety V entirely (no hazard, safety_goal, fsr stages). End-to-end V-model orchestration with SYS/SWE split is unimplemented.

11. Copilot agents have no runtime

.github/agents/pharaoh.*.agent.md and .github/prompts/pharaoh.*.prompt.md are read by Copilot Chat as plain markdown instructions. No Jinja templating, no YAML/TOML parser, no Python runner. Anything an SKILL.md specifies as "read X, parse Y, emit Z" must be performed inline by the LLM.

Consequences:

  • Every fixture-validated deterministic output (pharaoh-tailor-bootstrap byte-exact equality, pharaoh-output-validate schema validation, pharaoh-id-allocate regex matches) is non-binding under Copilot. The LLM emits "approximately" the right shape.
  • Schema fragmentation (sections 1, 2, 4) is structurally amplified: under a real runner, a single canonical schema could be enforced. Under markdown-only execution, the schema lives in the LLM's head and drifts per call.
  • Atomicity contracts (skill criterion (a) "indivisible, one input → one output") cannot be enforced. The LLM may inline-fan-out or merge tasks based on context budget.

Suggested direction

Each section above could split into its own bug, but the underlying pattern is one of design assumptions getting baked into different layers without a mechanism to keep them coherent: schemas in examples/score/, generators in pharaoh-tailor-*, type-vocabulary hardcoded in atomic skills, defaults documented in three places that disagree.

A productive next step might be:

  1. Promote examples/score/.pharaoh/project/schemas/ to a canonical install path and have pharaoh-tailor-review actually load and apply the schemas. Update the pharaoh-tailor-bootstrap and pharaoh-tailor-fill outputs to validate.
  2. Make every atomic skill fully parameterised by target_level (the pharaoh-feat-draft-from-docs model) and remove hardcoded type-name allow-lists from pharaoh-arch-draft and pharaoh-vplan-draft.
  3. Document the consumed pharaoh.toml sections ([pharaoh.codelink_comments], [pharaoh.diagrams], [pharaoh.quality_gate]) in pharaoh.toml.example and reconcile workflow-flag defaults to a single source.
  4. Add drafting skills for the safety V (pharaoh-hazard-draft, pharaoh-safety-goal-draft, pharaoh-fsr-draft) or document pharaoh-req-draft as the canonical drafter for any requirement-shaped artefact regardless of safety level.
  5. Have pharaoh-setup consult [needs.fields.X] from ubproject.toml and the RST status histogram from existing needs before writing tailoring defaults.

Repro context: useblocks/sphinx-needs-demo PR #51, commit history reflects the iterative correction of per-section findings.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions