You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bringing Pharaoh up on useblocks/sphinx-needs-demo (PR useblocks/sphinx-needs-demo#51) surfaced a series of structural gaps between what pharaoh-setup produces, what the canonical schemas in examples/score/.pharaoh/project/schemas/ expect, what the atomic-skill vocabulary supports, and what an industry V-model project (SYS/SWE split + ISO 26262 safety V) actually needs.
These are not bugs in any single skill but recurring patterns across the bootstrap → tailoring → review → atomic-author chain. Filing as one consolidated feedback issue rather than splitting into per-skill bugs, because the same underlying decisions surface in multiple places. Two narrow follow-up bugs from the same exercise are tracked separately as #11 and #12.
1. Bootstrap output violates the canonical JSON Schema shipped in examples/score/
examples/score/.pharaoh/project/schemas/ ships canonical JSON Schemas for artefact-catalog.yaml, workflows.yaml, id-conventions.yaml, and checklists-frontmatter.schema.json. They are well-formed and score's own tailoring validates against them. Bootstrap-generated files do not:
artefact-catalog.yaml: schema declares additionalProperties: false. pharaoh-tailor-bootstrap emits child_of and lifecycle_ref keys that are not in the schema. The skill's expected_output is therefore schema-invalid by Pharaoh's own published contract. Every project bootstrapped today inherits the violation.
workflows.yaml: schema requires top-level lifecycle_states: [...] plus a flat transitions: array. pharaoh-tailor-bootstrap emits per-type maps with inline {from, to, gate} transitions and per-type initial:/final: strings.
id-conventions.yaml: schema accepts both bootstrap and tailor-fill shapes. The prefixes value is permissive enough that bootstrap stores identifier prefixes (FEAT_) and tailor-fill SKILL.md template stores human descriptions ("requirement (guide-level)"). A draft skill computing f"{prefix}{tail}" produces garbage on the description form.
The schemas themselves are also hidden: shipped only in examples/score/, not at any canonical install path, not loaded at runtime by any skill, not referenced by shared/tailoring-access.md. pharaoh-tailor-review claims to enforce schema rules (cross-file C1, C2, C5) but reads no schema file. If wired up properly it would reject most real-world catalogs except score's.
2. pharaoh-tailor-bootstrap and pharaoh-tailor-fill emit incompatible shapes
Even setting the canonical schema aside, the two sibling tailoring authors disagree on both files in different directions:
workflows.yaml: bootstrap emits per-type maps with {from, to, gate}. tailor-fill emits flat lifecycle_states map plus transitions[*].requires. Predicate vocabulary differs (gate: "<string>" vs requires: [<list>]). pharaoh-lifecycle-check Step 4b iterates transitions[*].requires and would silently see undefined on every gate-style transition.
id-conventions.yaml: both emitters use a flat shape, but tailor-fill emits id_regex_exceptions:. pharaoh-id-convention-check reads id_regex_by_type:. The two never meet, so per-type regex overrides authored by tailor-fill are silently ignored.
A project bootstrapped today and re-tailored with tailor-fill later (the natural maturation path the gate-enablement ladder anticipates) would have both files rewritten in incompatible directions.
3. Release-gate fields are consumed but no emitter writes them
pharaoh-link-completeness-check reads required_links / optional_links. pharaoh-output-validate reads required_metadata_fields. pharaoh-review-completeness reads required_roles. pharaoh-quality-gate aggregates these. None of them appear in any tailoring emitter or in the canonical schema. The release-gate is a silent no-op on every project that follows the documented setup path.
4. pharaoh.toml has three undocumented sections
Three sections are consumed by 14+ skills but absent from pharaoh.toml.example:
[pharaoh.codelink_comments]: read by pharaoh-req-codelink-annotate and pharaoh-req-from-code (branch on .mode = "codelinks" | "backref"). The annotate skill ships a tailoring_patch proposal pointing at this section, an extension point that has been undocumented since landing.
[pharaoh.diagrams] / [pharaoh.diagrams.<type>] / [pharaoh.diagrams.type_styles]: read by 11 diagram-draft skills plus pharaoh-feat-component-extract plus pharaoh-feat-flow-extract. Documented only in shared/diagram-tailoring.md, invisible to a project author copying the example.
[pharaoh.quality_gate].strict: read by pharaoh-diagram-lint. SKILL.md says "Plans wire this to ...". The section does not exist anywhere.
Plus [pharaoh.codelinks].src_dir consumed by pharaoh-write-plan and pharaoh-feat-file-map, while the example documents only enabled under [pharaoh.codelinks].
Workflow-default disagreement on top: shared/strictness.md defaults (require_change_analysis=true, require_verification=true, require_mece_on_release=false). pharaoh-gate-advisor defaults all three to false. The example ships (true, true, false). A skill that fills missing values from one default-source reaches different gate verdicts than one that fills from the other on identical TOML.
The mode field documented in pharaoh-setup Step 2a.bis (reverse-eng | greenfield | steady-state) is persisted only as a single-line comment above [pharaoh.workflow]. No skill parses it. The classification is non-load-bearing after setup ends.
5. Two user-facing slash commands are dead
/pharaoh.author and /pharaoh.verify prompts under .github/prompts/ declare agent: pharaoh.author and agent: pharaoh.verify respectively in their frontmatter. Neither agent exists under .github/agents/. Two of seven user-entry slash commands dispatch to nothing.
6. Dangling chain reference
pharaoh-arch-review.SKILL.md declares chains_from: [pharaoh-arch-draft, pharaoh-arch-regenerate]. pharaoh-arch-regenerate does not exist (only pharaoh-req-regenerate ships). Probably a typo or unimplemented sibling.
7. Two parallel disjoint dependency declarations
Across 71 skills, two non-overlapping vocabularies coexist:
SKILL.md frontmatter uses chains_from: and chains_to: (atomic mechanics).
The two never agree on a single edge. 22 of 71 skill pairs have asymmetric chain declarations (one side empty). 16 of 71 SKILL.md files declare chains_from with no agent.md equivalent, so Copilot users have no view of skill prerequisites at all.
45 of 71 skills are graph-orphans relative to the 5 user-entry prompts (change, mece, plan, release, trace). The atomic-skill layer (diagrams, tailor mechanics, audit mechanics, reverse-engineering, plan orchestration) has no inbound path from any user entry. Discovery routes only through pharaoh-flow / pharaoh-write-plan / pharaoh-audit-fanout, which are themselves orphans.
8. Setup invents structure instead of reading the project
pharaoh-setup Step 2 generates pharaoh.toml and .pharaoh/project/*.yaml from heuristic defaults rather than from the project's declared conventions. On useblocks/sphinx-needs-demo:
id_scheme.pattern defaulted to {TYPE}_{NUMBER} while observed IDs use domain prefixes (BRAKE_CTRL_01, FSR_POWER_01). Pattern should reflect the observed {DOMAIN}_{NUMBER} shape.
workflows.yaml lifecycle defaulted to draft → reviewed → approved (Pharaoh-internal). The corpus actually uses open (145 needs), closed (16), passed (7), approved (2). Setup never read RST status histograms.
artefact-catalog.yaml optional_fields was populated with Pharaoh-internal reviewer, approved_by, source_doc only. The project's declared [needs.fields.X] (16 fields including asil, severity, exposure, controllability, scenario, safe_state, customer, effort, approved, jira, github, role, contact, image, date) was not consulted at all.
id-conventions.yaml prefixes mirrored the broken declarations in [[needs.types]] verbatim, including three real-world collisions (R_ for req+release, T_ for test+team, _ for arch+need). Setup did not detect the collisions.
mode was heuristically classified as reverse-eng because needs.json did not exist on disk. needs.json is a gitignored build artefact, so a fresh clone of any project (even one with thousands of declared needs) is classified reverse-eng until the user runs sphinx-build.
For a reverse-eng mode project the design goal is to capture what exists, not impose a new standard. The current setup defaults are prescriptive rather than descriptive.
useblocks/sphinx-needs-demo declares 19 sphinx-needs types corresponding to ASPICE-style SYS/SWE separation (req, sysreq, sys-arch, swreq, swarch) plus classical V-model (spec, impl, test) plus ISO 26262 safety V (hazard, safety_goal, fsr) plus structural (arch, component, interface, seq_msg, person, team, release, need).
Pharaoh atomic skills mix three vocabulary-binding strategies:
Fully parameterised by target_level: pharaoh-feat-draft-from-docs works cleanly on any top-level type.
Parameterised but with feat/comp_req-flavoured prose throughout (pharaoh-req-from-code): mechanically OK, prose examples mislead on non-feat/CREQ projects.
Hardcoded vocabulary:
pharaoh-arch-draft accepts only module / component / interface as arch_type and FAILs on any other value. Drafting swarch, sys-arch, or higher-level architecture is unsupported.
pharaoh-vplan-draft hardcodes tc__ prefix and tc catalog key. On a project whose test type uses prefix T_, the skill generates IDs that violate the project's id_regex.
There are no drafting skills at all for safety V artefact types (hazard, safety_goal, fsr). useblocks/sphinx-needs-demo's 1 hazard, 20 safety_goals, and 36 FSRs cannot be reverse-engineered or drafted by Pharaoh atomics. They can only be reviewed by generic pharaoh-req-review.
Of the 19 types declared in useblocks/sphinx-needs-demo, only 3 have a Pharaoh-skill authoring path (req, partially arch, partially test). The remaining 16 sit in the catalog as declarations with no skill bridge. A user invoking @pharaoh.req-draft to write a "sw-level safety requirement" today receives a req-typed directive with R_ prefix regardless of whether they wanted swreq (SWREQ_) or fsr (FSR_).
10. pharaoh-flow skips the safety V
pharaoh-flow.SKILL.md description claims to orchestrate the V-model end-to-end (req → arch → vplan → fmea). The body never references swreq/sysreq/sys-arch/swarch and skips the safety V entirely (no hazard, safety_goal, fsr stages). End-to-end V-model orchestration with SYS/SWE split is unimplemented.
11. Copilot agents have no runtime
.github/agents/pharaoh.*.agent.md and .github/prompts/pharaoh.*.prompt.md are read by Copilot Chat as plain markdown instructions. No Jinja templating, no YAML/TOML parser, no Python runner. Anything an SKILL.md specifies as "read X, parse Y, emit Z" must be performed inline by the LLM.
Consequences:
Every fixture-validated deterministic output (pharaoh-tailor-bootstrap byte-exact equality, pharaoh-output-validate schema validation, pharaoh-id-allocate regex matches) is non-binding under Copilot. The LLM emits "approximately" the right shape.
Schema fragmentation (sections 1, 2, 4) is structurally amplified: under a real runner, a single canonical schema could be enforced. Under markdown-only execution, the schema lives in the LLM's head and drifts per call.
Atomicity contracts (skill criterion (a) "indivisible, one input → one output") cannot be enforced. The LLM may inline-fan-out or merge tasks based on context budget.
Suggested direction
Each section above could split into its own bug, but the underlying pattern is one of design assumptions getting baked into different layers without a mechanism to keep them coherent: schemas in examples/score/, generators in pharaoh-tailor-*, type-vocabulary hardcoded in atomic skills, defaults documented in three places that disagree.
A productive next step might be:
Promote examples/score/.pharaoh/project/schemas/ to a canonical install path and have pharaoh-tailor-review actually load and apply the schemas. Update the pharaoh-tailor-bootstrap and pharaoh-tailor-fill outputs to validate.
Make every atomic skill fully parameterised by target_level (the pharaoh-feat-draft-from-docs model) and remove hardcoded type-name allow-lists from pharaoh-arch-draft and pharaoh-vplan-draft.
Document the consumed pharaoh.toml sections ([pharaoh.codelink_comments], [pharaoh.diagrams], [pharaoh.quality_gate]) in pharaoh.toml.example and reconcile workflow-flag defaults to a single source.
Add drafting skills for the safety V (pharaoh-hazard-draft, pharaoh-safety-goal-draft, pharaoh-fsr-draft) or document pharaoh-req-draft as the canonical drafter for any requirement-shaped artefact regardless of safety level.
Have pharaoh-setup consult [needs.fields.X] from ubproject.toml and the RST status histogram from existing needs before writing tailoring defaults.
Repro context: useblocks/sphinx-needs-demo PR #51, commit history reflects the iterative correction of per-section findings.
Summary
Bringing Pharaoh up on
useblocks/sphinx-needs-demo(PRuseblocks/sphinx-needs-demo#51) surfaced a series of structural gaps between whatpharaoh-setupproduces, what the canonical schemas inexamples/score/.pharaoh/project/schemas/expect, what the atomic-skill vocabulary supports, and what an industry V-model project (SYS/SWE split + ISO 26262 safety V) actually needs.These are not bugs in any single skill but recurring patterns across the bootstrap → tailoring → review → atomic-author chain. Filing as one consolidated feedback issue rather than splitting into per-skill bugs, because the same underlying decisions surface in multiple places. Two narrow follow-up bugs from the same exercise are tracked separately as #11 and #12.
1. Bootstrap output violates the canonical JSON Schema shipped in
examples/score/examples/score/.pharaoh/project/schemas/ships canonical JSON Schemas forartefact-catalog.yaml,workflows.yaml,id-conventions.yaml, andchecklists-frontmatter.schema.json. They are well-formed andscore's own tailoring validates against them. Bootstrap-generated files do not:artefact-catalog.yaml: schema declaresadditionalProperties: false.pharaoh-tailor-bootstrapemitschild_ofandlifecycle_refkeys that are not in the schema. The skill's expected_output is therefore schema-invalid by Pharaoh's own published contract. Every project bootstrapped today inherits the violation.workflows.yaml: schema requires top-levellifecycle_states: [...]plus a flattransitions:array.pharaoh-tailor-bootstrapemits per-type maps with inline{from, to, gate}transitions and per-typeinitial:/final:strings.id-conventions.yaml: schema accepts both bootstrap andtailor-fillshapes. Theprefixesvalue is permissive enough that bootstrap stores identifier prefixes (FEAT_) andtailor-fillSKILL.md template stores human descriptions ("requirement (guide-level)"). A draft skill computingf"{prefix}{tail}"produces garbage on the description form.The schemas themselves are also hidden: shipped only in
examples/score/, not at any canonical install path, not loaded at runtime by any skill, not referenced byshared/tailoring-access.md.pharaoh-tailor-reviewclaims to enforce schema rules (cross-file C1, C2, C5) but reads no schema file. If wired up properly it would reject most real-world catalogs exceptscore's.2.
pharaoh-tailor-bootstrapandpharaoh-tailor-fillemit incompatible shapesEven setting the canonical schema aside, the two sibling tailoring authors disagree on both files in different directions:
workflows.yaml: bootstrap emits per-type maps with{from, to, gate}.tailor-fillemits flatlifecycle_statesmap plustransitions[*].requires. Predicate vocabulary differs (gate: "<string>"vsrequires: [<list>]).pharaoh-lifecycle-checkStep 4b iteratestransitions[*].requiresand would silently see undefined on every gate-style transition.id-conventions.yaml: both emitters use a flat shape, buttailor-fillemitsid_regex_exceptions:.pharaoh-id-convention-checkreadsid_regex_by_type:. The two never meet, so per-type regex overrides authored bytailor-fillare silently ignored.A project bootstrapped today and re-tailored with
tailor-filllater (the natural maturation path the gate-enablement ladder anticipates) would have both files rewritten in incompatible directions.3. Release-gate fields are consumed but no emitter writes them
pharaoh-link-completeness-checkreadsrequired_links/optional_links.pharaoh-output-validatereadsrequired_metadata_fields.pharaoh-review-completenessreadsrequired_roles.pharaoh-quality-gateaggregates these. None of them appear in any tailoring emitter or in the canonical schema. The release-gate is a silent no-op on every project that follows the documented setup path.4.
pharaoh.tomlhas three undocumented sectionsThree sections are consumed by 14+ skills but absent from
pharaoh.toml.example:[pharaoh.codelink_comments]: read bypharaoh-req-codelink-annotateandpharaoh-req-from-code(branch on.mode = "codelinks" | "backref"). The annotate skill ships atailoring_patchproposal pointing at this section, an extension point that has been undocumented since landing.[pharaoh.diagrams]/[pharaoh.diagrams.<type>]/[pharaoh.diagrams.type_styles]: read by 11 diagram-draft skills pluspharaoh-feat-component-extractpluspharaoh-feat-flow-extract. Documented only inshared/diagram-tailoring.md, invisible to a project author copying the example.[pharaoh.quality_gate].strict: read bypharaoh-diagram-lint. SKILL.md says "Plans wire this to ...". The section does not exist anywhere.Plus
[pharaoh.codelinks].src_dirconsumed bypharaoh-write-planandpharaoh-feat-file-map, while the example documents onlyenabledunder[pharaoh.codelinks].Workflow-default disagreement on top:
shared/strictness.mddefaults(require_change_analysis=true, require_verification=true, require_mece_on_release=false).pharaoh-gate-advisordefaults all three tofalse. The example ships(true, true, false). A skill that fills missing values from one default-source reaches different gate verdicts than one that fills from the other on identical TOML.The
modefield documented inpharaoh-setupStep 2a.bis (reverse-eng | greenfield | steady-state) is persisted only as a single-line comment above[pharaoh.workflow]. No skill parses it. The classification is non-load-bearing after setup ends.5. Two user-facing slash commands are dead
/pharaoh.authorand/pharaoh.verifyprompts under.github/prompts/declareagent: pharaoh.authorandagent: pharaoh.verifyrespectively in their frontmatter. Neither agent exists under.github/agents/. Two of seven user-entry slash commands dispatch to nothing.6. Dangling chain reference
pharaoh-arch-review.SKILL.mddeclareschains_from: [pharaoh-arch-draft, pharaoh-arch-regenerate].pharaoh-arch-regeneratedoes not exist (onlypharaoh-req-regenerateships). Probably a typo or unimplemented sibling.7. Two parallel disjoint dependency declarations
Across 71 skills, two non-overlapping vocabularies coexist:
chains_from:andchains_to:(atomic mechanics).handoffs: [{label, agent, prompt}](Copilot UX prompts).The two never agree on a single edge. 22 of 71 skill pairs have asymmetric chain declarations (one side empty). 16 of 71 SKILL.md files declare
chains_fromwith no agent.md equivalent, so Copilot users have no view of skill prerequisites at all.45 of 71 skills are graph-orphans relative to the 5 user-entry prompts (
change,mece,plan,release,trace). The atomic-skill layer (diagrams, tailor mechanics, audit mechanics, reverse-engineering, plan orchestration) has no inbound path from any user entry. Discovery routes only throughpharaoh-flow/pharaoh-write-plan/pharaoh-audit-fanout, which are themselves orphans.8. Setup invents structure instead of reading the project
pharaoh-setupStep 2 generatespharaoh.tomland.pharaoh/project/*.yamlfrom heuristic defaults rather than from the project's declared conventions. Onuseblocks/sphinx-needs-demo:id_scheme.patterndefaulted to{TYPE}_{NUMBER}while observed IDs use domain prefixes (BRAKE_CTRL_01,FSR_POWER_01). Pattern should reflect the observed{DOMAIN}_{NUMBER}shape.workflows.yamllifecycle defaulted todraft → reviewed → approved(Pharaoh-internal). The corpus actually usesopen(145 needs),closed(16),passed(7),approved(2). Setup never read RST status histograms.artefact-catalog.yaml optional_fieldswas populated with Pharaoh-internalreviewer,approved_by,source_doconly. The project's declared[needs.fields.X](16 fields includingasil,severity,exposure,controllability,scenario,safe_state,customer,effort,approved,jira,github,role,contact,image,date) was not consulted at all.id-conventions.yaml prefixesmirrored the broken declarations in[[needs.types]]verbatim, including three real-world collisions (R_forreq+release,T_fortest+team,_forarch+need). Setup did not detect the collisions.traceability.required_linkswas inferred from a heuristic name table (implements -> "spec -> impl"). Direction was inverted on every chain in this project. After correcting per-corpus, real coverage was 100% onspec → req,arch → req,safety_goal → hazard,fsr → safety_goal(tracked separately as pharaoh-setup: required_links direction inferred from link option labels, not actual link semantics #11).modewas heuristically classified asreverse-engbecauseneeds.jsondid not exist on disk.needs.jsonis a gitignored build artefact, so a fresh clone of any project (even one with thousands of declared needs) is classifiedreverse-enguntil the user runssphinx-build.For a
reverse-engmode project the design goal is to capture what exists, not impose a new standard. The current setup defaults are prescriptive rather than descriptive.9. Atomic-skill vocabulary doesn't cover V-model SYS/SWE projects
useblocks/sphinx-needs-demodeclares 19 sphinx-needs types corresponding to ASPICE-style SYS/SWE separation (req,sysreq,sys-arch,swreq,swarch) plus classical V-model (spec,impl,test) plus ISO 26262 safety V (hazard,safety_goal,fsr) plus structural (arch,component,interface,seq_msg,person,team,release,need).Pharaoh atomic skills mix three vocabulary-binding strategies:
target_level:pharaoh-feat-draft-from-docsworks cleanly on any top-level type.pharaoh-req-from-code): mechanically OK, prose examples mislead on non-feat/CREQ projects.pharaoh-arch-draftaccepts onlymodule/component/interfaceasarch_typeand FAILs on any other value. Draftingswarch,sys-arch, or higher-level architecture is unsupported.pharaoh-vplan-drafthardcodestc__prefix andtccatalog key. On a project whosetesttype uses prefixT_, the skill generates IDs that violate the project'sid_regex.There are no drafting skills at all for safety V artefact types (
hazard,safety_goal,fsr).useblocks/sphinx-needs-demo's 1 hazard, 20 safety_goals, and 36 FSRs cannot be reverse-engineered or drafted by Pharaoh atomics. They can only be reviewed by genericpharaoh-req-review.Of the 19 types declared in
useblocks/sphinx-needs-demo, only 3 have a Pharaoh-skill authoring path (req, partiallyarch, partiallytest). The remaining 16 sit in the catalog as declarations with no skill bridge. A user invoking@pharaoh.req-draftto write a "sw-level safety requirement" today receives areq-typed directive withR_prefix regardless of whether they wantedswreq(SWREQ_) orfsr(FSR_).10.
pharaoh-flowskips the safety Vpharaoh-flow.SKILL.mddescription claims to orchestrate the V-model end-to-end (req → arch → vplan → fmea). The body never referencesswreq/sysreq/sys-arch/swarchand skips the safety V entirely (nohazard,safety_goal,fsrstages). End-to-end V-model orchestration with SYS/SWE split is unimplemented.11. Copilot agents have no runtime
.github/agents/pharaoh.*.agent.mdand.github/prompts/pharaoh.*.prompt.mdare read by Copilot Chat as plain markdown instructions. No Jinja templating, no YAML/TOML parser, no Python runner. Anything an SKILL.md specifies as "read X, parse Y, emit Z" must be performed inline by the LLM.Consequences:
pharaoh-tailor-bootstrapbyte-exact equality,pharaoh-output-validateschema validation,pharaoh-id-allocateregex matches) is non-binding under Copilot. The LLM emits "approximately" the right shape.Suggested direction
Each section above could split into its own bug, but the underlying pattern is one of design assumptions getting baked into different layers without a mechanism to keep them coherent: schemas in
examples/score/, generators inpharaoh-tailor-*, type-vocabulary hardcoded in atomic skills, defaults documented in three places that disagree.A productive next step might be:
examples/score/.pharaoh/project/schemas/to a canonical install path and havepharaoh-tailor-reviewactually load and apply the schemas. Update thepharaoh-tailor-bootstrapandpharaoh-tailor-filloutputs to validate.target_level(thepharaoh-feat-draft-from-docsmodel) and remove hardcoded type-name allow-lists frompharaoh-arch-draftandpharaoh-vplan-draft.pharaoh.tomlsections ([pharaoh.codelink_comments],[pharaoh.diagrams],[pharaoh.quality_gate]) inpharaoh.toml.exampleand reconcile workflow-flag defaults to a single source.pharaoh-hazard-draft,pharaoh-safety-goal-draft,pharaoh-fsr-draft) or documentpharaoh-req-draftas the canonical drafter for any requirement-shaped artefact regardless of safety level.pharaoh-setupconsult[needs.fields.X]fromubproject.tomland the RST status histogram from existing needs before writing tailoring defaults.Repro context:
useblocks/sphinx-needs-demoPR #51, commit history reflects the iterative correction of per-section findings.