refactor(agents): decompose 4 backend hotspot files into packages#57
Merged
refactor(agents): decompose 4 backend hotspot files into packages#57
Conversation
Seven-submodule decomposition along SDK-resource-lifecycle seams:
managed_agents/__init__.py barrel + full docstring with all
the Step-0 smoke test SDK quirks
managed_agents/_constants.py (35) beta headers + make_client + the
$0.08/hr session rate
managed_agents/environments.py (54) create / archive env
managed_agents/skills.py (167) upload + 3-step archive dance +
archive_skill_safe + name extractor
managed_agents/agents.py (57) create / archive competitor agent
managed_agents/sessions.py (124) create / archive session +
send_user_message + event polling
managed_agents/output.py (211) post-run trace introspection —
written_files, bash-write parsing,
token usage, runtime cost
Every public name is re-exported from the package __init__ so 38 call
sites keep their ``from skillforge.agents import managed_agents`` +
``managed_agents.upload_skill(...)`` usage unchanged.
Tests against two private helpers (_extract_skill_name_from_md,
_normalize_output_path) were accessing them on the module directly;
those are re-exported through the barrel so test patches continue to
resolve.
QA: ruff + mypy + 411 pytest (unchanged) all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decomposed by orchestration level — the mini-evolution loop, the
assembly step, and the top-level run entry each live in their own file:
variant_evolution/__init__.py barrel + re-exports run_variant_evolution
variant_evolution/_helpers.py constants + _tier_sort_key
+ _aggregate_fitness
variant_evolution/dimension.py _run_dimension_mini_evolution
(challenge -> spawn -> compete ->
score -> judge -> breed -> pick winner)
variant_evolution/assembly.py _real_assembly (Engineer call +
integration check)
variant_evolution/main.py run_variant_evolution orchestrator
Largest submodule is dimension.py at 345 LOC, under the 500-LOC ceiling
in docs/clean-code.md §2. Prior to this split, the monolith held a
single 311-LOC function (_run_dimension_mini_evolution) alongside the
assembly logic and the main loop — the file was 620 LOC and every
refactor touched everything.
Test-access surface preserved: tests/test_variant_evolution.py imports
_aggregate_fitness and _tier_sort_key directly from the package, so the
__init__ re-exports them.
Also rolled in: _extract_skill_name_from_md and _normalize_output_path
added to the managed_agents package __all__ (they were already
re-exported for test access, just needed the __all__ entry to satisfy
F401).
QA: ruff + mypy + 411 pytest all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decomposed by responsibility. The six section comments in the monolith
("slot allocation", "ranking", "main breed", "mutation prompts",
"lessons + reports", "bible publishing") each correspond to a submodule:
breeder/__init__.py barrel — re-exports breed + compute_slots +
rank_skills + publish_findings_to_bible, plus
legacy aliases (breed_next_gen, spawn_gen0,
BIBLE_DIR) that tests patch on the package root
breeder/_ranking.py compute_slots + rank_skills + _aggregate_fitness
breeder/_prompts.py diagnostic + crossover instruction templates
+ breeding-context formatter (pure)
breeder/_reports.py _extract_lessons_and_report + siblings (LLM calls)
breeder/main.py breed() + _carry_elite (orchestrator)
breeder/bible.py publish_findings_to_bible (disk I/O)
Largest submodule is _reports.py at 213 LOC, under the 500-LOC Python
ceiling in docs/clean-code.md §2.
Test-patch compatibility
------------------------
Tests patch three functions on the package root:
``breeder.breed_next_gen``, ``breeder.spawn_gen0``,
``breeder._extract_lessons_and_report``. Those patches don't propagate
to bindings in submodules, so ``main.breed()`` now resolves each
through the package namespace at call time (``_pkg().breed_next_gen``
etc.). BIBLE_DIR follows the same pattern in bible.py.
QA: ruff + mypy + 411 pytest (unchanged) all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decomposed along the pure-planner / thin-I/O-shell seam called out in
docs/clean-code.md §7. Four submodules:
spawner/__init__.py barrel — re-exports four entry points plus
every helper tests patch on the package root
(_generate, _read_bible_patterns, BIBLE_DIR, ...)
spawner/_helpers.py _generate (LLM streaming) + _parse_genomes
+ _auto_repair_missing_references
+ _validate_genomes + _read_bible_patterns
spawner/_prompts.py all _build_*_system_prompt string templates
+ embedded JSON schema constants
(pure — no I/O, no LLM calls)
spawner/main.py four public entry points:
spawn_gen0, breed_next_gen,
spawn_from_parent, spawn_variant_gen0
Largest submodule is main.py at 411 LOC, under the 500-LOC ceiling.
Test-patch compatibility
------------------------
Same pattern as the breeder split: tests patch ``spawner._generate``,
``spawner._read_bible_patterns``, and ``spawner.BIBLE_DIR`` on the
package root. Those patches do not propagate to direct imports in
submodules, so ``main._generate`` / ``main._read_bible_patterns``
and ``_helpers._read_bible_patterns`` now resolve the reference
through the package namespace at call time.
Without these shims the test suite made real LLM calls for 11 minutes
before first failure — the fix is load-bearing for both test speed
and API-cost safety.
QA: ruff + mypy (83 files) + 411 pytest all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Decomposes the four remaining backend hotspots — each agent/engine file over the 500-LOC Python ceiling in
docs/clean-code.md§2. Every split follows the pure-planner / thin-I/O-shell pattern: prompt builders + data helpers in private modules, LLM/SDK/disk I/O in one clearly-labeled module, top-level orchestrators call through.managed_agents.py_constants,environments,skills,agents,sessions,outputvariant_evolution.py_helpers,dimension,assembly,mainbreeder.py_ranking,_prompts,_reports,main,biblespawner.py_helpers,_prompts,mainLargest submodule anywhere is 416 LOC, under the 500-LOC cap.
Test-patch compatibility
Two of the four packages required a lazy-lookup shim for tests that
patch("skillforge.agents.breeder.breed_next_gen")(or similar) on the package root. Binding the reference at import time in a submodule shadows the patch; calling through the package namespace at runtime lets it take effect.This was load-bearing — without the shim the breeder/spawner test suites silently made real LLM calls (11 minutes of API spend before first failure on the spawner).
Also re-exported a handful of private helpers (
_extract_skill_name_from_md,_normalize_output_path,_extract_lessons_and_report,_read_bible_patterns,BIBLE_DIR) through each package's__init__so existingmonkeypatch.setattr(module, "PRIVATE_NAME", ...)calls resolve.Test plan
uv run ruff check skillforge— cleanuv run mypy skillforge— 83 files pass (up from 65)uv run pytest tests/— 411 passed, 2 skipped (unchanged)cd frontend && npm run build / lint / format:check / test— untouched, still greenPublic API unchanged
Every import site keeps its original path —
from skillforge.agents import breeder; breeder.breed(...),from skillforge.agents.managed_agents import upload_skill,from skillforge.engine.variant_evolution import run_variant_evolutionall still work.🤖 Generated with Claude Code