v2.0 Phase 3: Variant evolution orchestrator + Scientist + Spawner#4
Merged
v2.0 Phase 3: Variant evolution orchestrator + Scientist + Spawner#4
Conversation
Wave 3-1 — Variant evolution orchestrator - skillforge/engine/variant_evolution.py — new module. run_variant_evolution reads variant_evolutions rows for the parent run, sorts foundation-first, runs each dimension as a mini-evolution: design focused challenge -> spawn N variants -> run through Competitor -> judging pipeline -> pick winner -> persist as Variant row -> mark VariantEvolution complete. Capability dimensions receive the winning foundation as grounding. Calls a stub assembly that returns the foundation winner unchanged (Phase 4 will replace with the real Engineer). - skillforge/engine/evolution.py — new top-level dispatcher. When run.evolution_mode == "atomic", delegates to run_variant_evolution, emits evolution_complete with evolution_mode="atomic", and fires the post-run report. Falls back to molecular cleanly if no rows exist. Wave 3-2 — Scientist (focused per-dimension challenge) - skillforge/agents/challenge_designer.py — new design_variant_challenge function. Single streaming Anthropic call returning ONE focused Challenge per dimension. Reuses the existing JSON parser + retry path. Wave 3-3 — Spawner variant scope - skillforge/agents/spawner.py — new spawn_variant_gen0 function. Focused mini-SKILL.md package generator. For capability tier, the winning foundation is injected into the system prompt so spawned variants are compatible with the foundation's directory layout. Stamps dimension + tier keys into each spawned genome's frontmatter. Routes integration - skillforge/api/routes.py — _classify_run_via_taxonomist now also persists VariantEvolution rows (one per dimension) when the final run mode is atomic. Reorders save_run before the INSERTs so the FK on parent_run_id resolves. Test isolation fixes (latent bugs from earlier waves) - tests/test_taxonomy_api.py::test_list_family_variants_empty_by_default pinned to a known seed family (terraform-module-full) instead of fams[0], which Phase 3 tests now populate. - tests/test_evolve_taxonomist_integration.py mocks bypass real persistence, so routes.py downstream variant_evolution INSERTs needed manual taxonomy + family seeding. Mock slugs prefixed with test-fixture- to avoid colliding with the bootstrap loader's testing/unit-tests/python triple. tests/test_variant_evolution.py — 8 new tests covering tier sort, fitness aggregation, happy-path orchestration with full mock stack, empty-pending fallback, design_variant_challenge happy + multi-rejection, spawn_variant_gen0 happy + all-invalid rejection. QA - 81 isolated v2.0 tests pass (55 Phase 1 + 18 Phase 2 + 8 Phase 3) - ruff clean on all Phase 3 files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 11, 2026
ty13r
pushed a commit
that referenced
this pull request
Apr 12, 2026
Full pipeline seed run #4 — no shortcuts. 12 Spawner + 48 Competitor dispatches + 1 Engineer. Both v1 (seed) and v2 (spawn) competed against 2 challenges per dimension (24 challenges total, balanced medium/hard). Real competition results: - v1 wins 2 dims (testing-workers 0.40>0.23, transactional-jobs 0.64>0.26) - v2 wins 5 dims (retry-strategy, cron-scheduling, return-values, recurring-jobs, worker-philosophy) - 5 ties (perform, args, unique, queues, telemetry) - Mean winning fitness: 0.487 Composite: 619 lines, 12 capability sections, 3 cross-cutting examples, 10 common mistakes. Foundation: transactional-saga philosophy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
ty13r
added a commit
that referenced
this pull request
Apr 12, 2026
…#31) Full pipeline seed run #4 — no shortcuts. 12 Spawner + 48 Competitor dispatches + 1 Engineer. Both v1 (seed) and v2 (spawn) competed against 2 challenges per dimension (24 challenges total, balanced medium/hard). Real competition results: - v1 wins 2 dims (testing-workers 0.40>0.23, transactional-jobs 0.64>0.26) - v2 wins 5 dims (retry-strategy, cron-scheduling, return-values, recurring-jobs, worker-philosophy) - 5 ties (perform, args, unique, queues, telemetry) - Mean winning fitness: 0.487 Composite: 619 lines, 12 capability sections, 3 cross-cutting examples, 10 common mistakes. Foundation: transactional-saga philosophy. Co-authored-by: Matt (via Claude Code) <matt@skillforge.local> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 of v2.0 — the variant evolution orchestrator. This is the wave where atomic evolution actually starts running. One commit, three waves, stacks on Phase 2 (PR #3 /
ca7d154).skillforge/engine/variant_evolution.py—run_variant_evolution(run)readsvariant_evolutionsrows for the parent run, sorts foundation-first, runs each dimension as a mini-evolution (design focused challenge → spawn N variants → run via Competitor → judging pipeline → pick winner → persist asVariantrow → markVariantEvolutioncomplete), then calls a stub assembly that returns the foundation winner unchanged. Wired intoevolution.py::run_evolutionas a top-level dispatcher:run.evolution_mode == "atomic"delegates here, falls back to molecular cleanly if no rows exist.design_variant_challenge(specialization, dimension)added tochallenge_designer.py. Single streaming Anthropic call returning ONE focused Challenge per dimension.spawn_variant_gen0(specialization, dimension, foundation_genome, pop_size)added tospawner.py. Focused mini-SKILL.md generator. For capability tier, the winning foundation is injected into the system prompt so spawned variants are compatible with the foundation's directory layout. Stampsdimension+tierkeys into each spawned genome's frontmatter.Routes wiring
_classify_run_via_taxonomistinroutes.pynow persistsVariantEvolutionrows (one per dimension) when the final run mode is atomic, so the orchestrator has work to pick up. The route handler reorderssave_runto come before the variant_evolution INSERTs so the FK onparent_run_idresolves.Stub assembly (Phase 4 will replace)
The Engineer agent's real merge logic is Wave 4-1. Phase 3 ships with a stub that:
assembly_started+assembly_completeevents withsynergy_ratio: null.This unblocks the full atomic event sequence (
taxonomy_classified→decomposition_complete→variant_evolution_started× N →variant_evolution_complete× N →assembly_started→assembly_complete→evolution_complete) without blocking on Phase 4.Bug fixes (latent in earlier phases)
Phase 3 surfaced two test-isolation bugs from earlier waves:
test_list_family_variants_empty_by_defaultasserted thatfams[0]had zero variants — but Phase 3's happy-path test now persists variants under afam_phase3family, polluting the global DB. Pinned the assertion to a known seed family (terraform-module-full) that no other test populates.test_evolve_taxonomist_integration— the mocks bypass the realclassify_and_decomposepersistence path, so the routes.py downstream code that persistsVariantEvolutionrows (with FK toskill_families) failed because the mocked family was never inserted. Fix: tests now manually seed the mocked taxonomy + family before the request fires. Mock slugs prefixed withtest-fixture-to avoid colliding with the bootstrap loader'stesting/unit-tests/pythontriple.Quantitative signal
What's not in this PR
Test plan
uv run pytest tests/test_variant_evolution.py -q— 8/8 greenuv run pytest tests/test_taxonomist.py tests/test_evolve_taxonomist_integration.py -q— 18/18 green (Phase 2 baseline preserved)uv run pytest tests/test_models_v2.py tests/test_db_v2.py tests/test_taxonomy_queries.py tests/test_taxonomy_api.py tests/test_report.py -q— 55/55 green (Phase 1 baseline preserved)uv run ruff check skillforge/engine/variant_evolution.py skillforge/engine/evolution.py skillforge/agents/challenge_designer.py skillforge/agents/spawner.py skillforge/api/routes.py tests/test_variant_evolution.py— clean/newform submission withevolution_mode: atomic)Next up
v2.0/phase4-engineerbranch — the Engineer agent's real assembly logic, composite skill validation, integration test, and one refinement pass per assembly. Stacks on this PR's main.🤖 Generated with Claude Code