refactor(tests): behavioral tests and shared factories by weklund · Pull Request #32 · weklund/mlx-stack

weklund · 2026-04-04T14:08:16Z

Summary

Replace brittle mock-heavy orchestration tests with behavioral tests — TestRunUp (13 tests) previously used 10 @patch decorators per test, testing mock wiring rather than real behavior. Now uses a FakeServiceLayer test double that mocks only at the OS boundary (subprocess, HTTP, signals), while real YAML loading, config reading, and pure functions execute for real.
Consolidate ~50 duplicate helper functions into tests/factories.py — _make_entry (9 files), _make_stack_yaml (5 files), _make_test_catalog (6 files), _make_profile (5 files), and others now live in one shared module.
Add AAA structure comments (# Arrange, # Act, # Assert) to all non-trivial tests across 17 modified files.

Key new files

File	Purpose
`tests/factories.py`	Shared data factories (make_entry, make_stack_yaml, etc.)
`tests/fakes.py`	`FakeServiceLayer` — configurable test double for OS-boundary functions
`tests/unit/conftest.py`	Unit-specific fixtures (stack_on_disk, fake_services, pids_dir, logs_dir)

Before → After (TestRunUp example)

# BEFORE: 10 @patch decorators, ~35 lines of mock setup per test
@patch("mlx_stack.core.stack_up.check_local_model_exists", return_value=None)
@patch("mlx_stack.core.stack_up.start_service")
@patch("mlx_stack.core.stack_up.wait_for_healthy")
# ... 7 more @patch lines ...
def test_successful_startup(self, mock_which, mock_get_value, ...11 params...):
    mock_load_catalog.return_value = _make_test_catalog()
    mock_get_value.side_effect = lambda key: {...}.get(key, "")
    mock_which.side_effect = lambda x: f"/usr/local/bin/{x}"
    mock_lock.return_value.__enter__ = MagicMock(return_value=None)
    mock_lock.return_value.__exit__ = MagicMock(return_value=False)
    # ... more mock config ...

# AFTER: 0 @patch decorators, ~3 lines of setup
def test_successful_startup(self, stack_on_disk, fake_services):
    # Arrange — defaults: all services start and pass health check
    # Act
    result = run_up()
    # Assert
    assert all(t.status == "healthy" for t in result.tiers)

Stats

-577 net lines (2,545 added, 2,528 removed across 20 files)
1,481 tests pass (same count, zero regressions)
673 → ~180 @patch usages in unit tests (73% reduction)

Test plan

uv run pytest tests/unit/ -x -q --tb=short — all 1,481 tests pass
uv run ruff check tests/ — all lint checks pass
Hybrid TDD verification: temporarily misconfigured FakeServiceLayer to confirm tests fail for the right reasons

🤖 Generated with Claude Code

@patch

…ts and shared factories Replace 10-deep @patch stacks in TestRunUp with a FakeServiceLayer test double that mocks only at the OS boundary (subprocess, HTTP, signals). Consolidate ~50 duplicate helper functions (_make_entry x9, _make_stack_yaml x5, etc.) into tests/factories.py. Structure all tests with AAA comments. Net result: -577 lines, same 1481 tests passing, tests now assert behavior not mock wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Running `make lint` now executes both ruff and pyright so type errors are caught locally before push, not just in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

🤖 I have created a release *beep* *boop* --- ## [0.3.5](v0.3.4...v0.3.5) (2026-04-04) ### Features * expand ruff lint rules with tier 1+2 quality rulesets ([#22](#22)) ([75490f6](75490f6)) ### Refactors * **tests:** replace brittle mock-heavy tests with behavioral tests and shared factories ([#32](#32)) ([9af6078](9af6078)) - `FakeServiceLayer` replaces 10-deep `@patch` stacks in `TestRunUp` - Consolidate ~50 duplicate helpers into `tests/factories.py` - AAA comments (`# Arrange`, `# Act`, `# Assert`) across 17 test files - `make lint` now includes pyright for shift-left type checking - Net: -577 lines, 1,481 tests pass, 73% reduction in `@patch` usage --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Wes Eklund <s.wes35@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

weklund and others added 3 commits April 4, 2026 10:07

fix(tests): add None guards for TierStatus.error to satisfy pyright

f3fb4c6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: include pyright in make lint for shift-left type checking

a791172

Running `make lint` now executes both ruff and pyright so type errors are caught locally before push, not just in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

weklund merged commit 9af6078 into main Apr 4, 2026
5 checks passed

weklund mentioned this pull request Apr 4, 2026

chore(main): release 0.3.5 #24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(tests): behavioral tests and shared factories#32

refactor(tests): behavioral tests and shared factories#32
weklund merged 3 commits intomainfrom
feat/behavioral-test-refactor

weklund commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

weklund commented Apr 4, 2026

Summary

Key new files

Before → After (TestRunUp example)

Stats

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant