fix: 3 LLM-fragility bugs exposed by cheap-Haiku atomic live test#54
Merged
fix: 3 LLM-fragility bugs exposed by cheap-Haiku atomic live test#54
Conversation
Fixes two bugs surfaced by the first live atomic-evolution run on the cheap Haiku tier (see journal #17). 1. Engineer oversize description (assembly-killer) ------------------------------------------------- Haiku routinely overshoots the 250-char composite description cap by 10–40%. The previous behavior raised ValueError, which triggered an ~$0.20 LLM retry that often produced the same class of overshoot. For 2-of-3 runs it killed the whole atomic pipeline. Fix: add `_try_truncate_description` helper that repairs oversize descriptions at a word boundary when it can do so without clobbering the "Use when" pushy-pattern marker. `_validate_composite_shape` now repairs in place before raising; only truly unsalvageable cases (no "Use when" in the first 250 chars) still raise. Covered by 3 new unit tests plus the updated existing oversize test. 2. Managed-agents skill upload YAML frontmatter 400 (upload-killer) ------------------------------------------------------------------- ~1% of `managed_agents.upload_skill` calls returned ``400 SKILL.md must start with YAML frontmatter (---)`` — the Anthropic Skills API is byte-strict about the leading ``---``. Models occasionally prepend a UTF-8 BOM or stray whitespace that the structural validator (which uses `startswith("---")` after standard string handling) happens to tolerate. Fix: `upload_skill` now `lstrip`s BOM + whitespace from the payload before calling the API. If the normalized content still doesn't start with ``---``, raise ValueError with a clear message instead of round-tripping to a generic 400. Caller (`competitor_managed`) still falls back to inline for genuine bad content; BOM/whitespace-only damage now uploads cleanly instead of falling back. Covered by 2 new unit tests (reject + BOM-strip). QA -- ruff check skillforge - clean mypy skillforge - 65 files pass pytest tests/ - 408 passed (+5), 2 skipped Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Third LLM-fragility bug from the cheap-Haiku atomic run (same class as
the two fixed in the previous commit): the Spawner routinely emits
SKILL.md bodies that reference references/*-guide.md files in prose
but forgets to include the file in supporting_files. Structural rule 8
rejects those genomes, and in atomic mode (pop=2, 1 retry) this was
killing the whole dimension 1-of-3 times.
Fix: new _auto_repair_missing_references helper runs before
_validate_genomes. For each ${CLAUDE_SKILL_DIR}/<path> reference
missing from supporting_files, stub a placeholder file with a clear
auto-generated marker. The skill renders, the reference resolves at
runtime, validation passes, and the Breeder can flesh out the stub
in later generations if the signal warrants it.
Same defensive-repair pattern as the Engineer description truncation:
cheap LLM produces almost-valid output, we repair at the boundary
instead of burning another ~\$0.20 retry that often reproduces the
same oversight.
Covered by 2 new unit tests (stubs-missing + noop-when-present).
QA: ruff + mypy + 410 pytest (+2 from new tests) — all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes three LLM-fragility bugs uncovered by the first live atomic-evolution runs. Each one was killing the whole pipeline in 1-of-3 cheap-Haiku runs; each one now repairs at the boundary instead of escalating to an LLM retry.
Bug 1: Engineer oversize composite description
Haiku routinely overshoots the 250-char composite description cap by 10–40%. Previous behavior: raise
ValueError, burn a ~$0.20 retry that often produced the same class of overshoot.Fix:
_try_truncate_descriptionhelper inagents/engineer.pyrepairs oversize descriptions at a word boundary, preserving the "Use when" pushy-pattern marker._validate_composite_shapenow repairs in place before raising; only truly unsalvageable cases (no "Use when" in the first 250 chars) still raise.Bug 2: Managed-agents skill upload 400 on missing YAML frontmatter
~1% of
managed_agents.upload_skillcalls returned400 SKILL.md must start with YAML frontmatter (---). Models occasionally prepend a UTF-8 BOM or stray whitespace that our structural validator tolerates but the API doesn't.Fix:
upload_skillnowlstrips BOM + whitespace before calling the API. If the normalized content still doesn't start with---, raiseValueErrorwith a clear message instead of round-tripping to a generic 400. Caller fallback-to-inline still works for genuinely bad content; BOM/whitespace damage now uploads cleanly.Bug 3: Spawner missing
${CLAUDE_SKILL_DIR}reference filesSpawner emits SKILL.md bodies that reference
references/*-guide.mdin prose but forget to include the file insupporting_files. Structural rule 8 rejected them, killing 1-of-3 atomic dimensions.Fix: new
_auto_repair_missing_referenceshelper inagents/spawner.pyruns before_validate_genomes. For each missing reference, stub a placeholder file with a clear auto-generated marker. Skill renders, reference resolves at runtime, validation passes; the Breeder can flesh out stubs in later generations.Verification — live run passed
Full end-to-end atomic evolution on this branch:
Before this PR, the same config failed at Engineer assembly. After this PR, the pipeline runs clean end-to-end.
Test plan
uv run ruff check skillforge— cleanuv run mypy skillforge— 65 files passuv run pytest tests/— 410 passed (+7 from new tests), 2 skippedUnit tests added (7 total)
test_try_truncate_description_noop_when_under_captest_try_truncate_description_repairs_oversize_at_word_boundarytest_try_truncate_description_returns_none_when_use_when_is_past_captest_validate_composite_shape_truncates_oversize_description_in_place(replaces old reject-test)test_validate_composite_shape_rejects_unsalvageable_oversize_descriptiontest_upload_skill_rejects_payload_without_frontmatter(replaces old fallback-test)test_upload_skill_strips_bom_and_leading_whitespacetest_auto_repair_missing_references_stubs_missing_filestest_auto_repair_missing_references_noop_when_all_present🤖 Generated with Claude Code