feat(batch,ui): activate max_parallel + cooperative cancellation (PRP-34)#291
Conversation
Activate the three forward-compat columns PRP-33 shipped on `batch_job`
(`max_parallel`, `running_items`, `cancelled_items`) by routing
`BatchService.submit` through a new `app/features/batch/runner.py` — a
single `asyncio.Semaphore(effective_parallel)` inside an
`asyncio.TaskGroup` with per-child `AsyncSession`s and cooperative
cancellation via a per-batch `asyncio.Event` + tracked `Task` refs. No
new Alembic migration (the three columns already exist).
`DELETE /batch/{batch_id}` cancels what hasn't started and bounds the
drain of what has — 200 settled / 404 missing / 409 terminal / 504
drain-timeout (RFC 7807 via the new `GatewayTimeoutError`). The frontend
`visualize/batch.tsx` gains a max-parallel `Slider`, live `running_items`
+ `parallel` chips, and a "Cancel batch" `AlertDialog`. The settings
defaults are `BATCH_GLOBAL_MAX_PARALLEL=4`,
`BATCH_CANCEL_DRAIN_TIMEOUT_SECONDS=30`.
`BatchSubmitResponse.effective_max_parallel` is a `@computed_field`
resolved from `result_summary["effective_max_parallel"]` (legacy rows
return 0) — JSONB-only, no schema migration.
Reviewer's GuideImplements PRP-34 by introducing an asyncio-based bounded-concurrency batch runner with cooperative cancellation, wiring BatchService.submit through it, exposing DELETE /batch/{batch_id} with RFC 7807 errors, and adding corresponding UI controls (max_parallel slider and cancel flow) plus settings and tests, all without schema changes. Sequence diagram for cooperative batch cancellation via DELETE /batch/{batch_id}sequenceDiagram
actor User
participant FrontendBatchPage as FrontendBatchPage
participant UseCancelBatch as useCancelBatch
participant BatchRoutes as cancel_batch_route
participant BatchService as BatchService
participant Runner as runner
participant DB as Postgres
User->>FrontendBatchPage: Click Cancel batch
FrontendBatchPage->>UseCancelBatch: useCancelBatch.mutate(batch_id)
UseCancelBatch->>BatchRoutes: DELETE /batch/{batch_id}
BatchRoutes->>BatchService: get(db, batch_id)
BatchService-->>BatchRoutes: BatchJob or None
alt Batch not found
BatchRoutes-->>UseCancelBatch: 404 NotFoundError
UseCancelBatch-->>FrontendBatchPage: isError (404)
else Batch terminal
BatchRoutes-->>UseCancelBatch: 409 ConflictError
UseCancelBatch-->>FrontendBatchPage: isError (409)
else Batch running
BatchRoutes->>Runner: cancel_batch(batch_id)
alt cancel_batch returns False
BatchRoutes-->>UseCancelBatch: 409 ConflictError
else cancel_batch returns True
BatchRoutes->>Runner: await_drain(batch_id, timeout_seconds)
alt drain timeout
Runner-->>BatchRoutes: drained = False
BatchRoutes-->>UseCancelBatch: 504 GatewayTimeoutError
else drain success
Runner-->>BatchRoutes: drained = True
BatchRoutes->>BatchService: get(db, batch_id)
BatchService-->>BatchRoutes: settled BatchJob
BatchRoutes-->>UseCancelBatch: 200 BatchSubmitResponse
UseCancelBatch-->>FrontendBatchPage: data (status=cancelled or partial)
end
end
end
Note over BatchService,Runner: BatchService.submit calls runner.mark_completed(batch_id)
Note over Runner,DB: cancel_event and completed_event control cooperative drain
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The new slider component is importing
SliderPrimitivefrom"radix-ui", which differs from the standard shadcn pattern (@radix-ui/react-slider); consider aligning the import/package to the shadcn template to avoid relying on a nonstandard entry point and potential runtime issues. - Both the backend (
_TERMINAL_BATCH_STATESinroutes.py) and frontend (TERMINAL_BATCH_STATESinbatch.tsx) now hard-code the set of terminal batch statuses; it may be worth centralizing this mapping (or deriving it from the sharedBatchStatusenum) to avoid future drift between the API and UI.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new slider component is importing `SliderPrimitive` from `"radix-ui"`, which differs from the standard shadcn pattern (`@radix-ui/react-slider`); consider aligning the import/package to the shadcn template to avoid relying on a nonstandard entry point and potential runtime issues.
- Both the backend (`_TERMINAL_BATCH_STATES` in `routes.py`) and frontend (`TERMINAL_BATCH_STATES` in `batch.tsx`) now hard-code the set of terminal batch statuses; it may be worth centralizing this mapping (or deriving it from the shared `BatchStatus` enum) to avoid future drift between the API and UI.
## Individual Comments
### Comment 1
<location path="app/features/batch/runner.py" line_range="136-145" />
<code_context>
+ handle = _ACTIVE_BATCHES.get(batch_id)
+ if handle is None:
+ return True
+ try:
+ await asyncio.wait_for(handle.completed_event.wait(), timeout=timeout_seconds)
+ return True
+ except TimeoutError:
+ return False
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Use asyncio.TimeoutError instead of built-in TimeoutError when catching wait_for timeouts
`asyncio.wait_for` raises `asyncio.TimeoutError`, not the built-in `TimeoutError`. Catching `TimeoutError` here means the timeout will escape instead of returning `False`, breaking the intended 504 behavior in the DELETE handler. Change the except clause to `except asyncio.TimeoutError:` (or import and use it directly) so the drain timeout behaves as documented and matches `cancel_batch_route`'s handling.
</issue_to_address>
### Comment 2
<location path="app/features/batch/tests/test_routes_cancel.py" line_range="34-43" />
<code_context>
+async def test_delete_409_terminal_batch(
</code_context>
<issue_to_address>
**suggestion (testing):** Add a happy-path DELETE /batch/{batch_id} test that exercises the 200 + drain-success case
Right now the tests only cover 404, 409, and 504, but not the documented 200 + successful drain path. Please add an integration test that:
- seeds an in-flight batch (similar to `_seed_synthetic_batch` in `test_runner_chaos.py`),
- registers a `CancelHandle` in the runner registry,
- calls `DELETE /batch/{batch_id}` such that `runner.cancel_batch` and `runner.await_drain` both return `True`,
- asserts a 200 and verifies the body shows a terminal status (`cancelled` or `partial`), `running_items == 0`, and `effective_max_parallel` in `result_summary`.
This will exercise the full success path for the cancel flow end-to-end.
Suggested implementation:
```python
"""A successfully-completed batch is terminal — DELETE returns RFC 7807 409.
Submits a 3-pair naive backtest; the run completes synchronously inside
``POST /batch/forecasting``. The subsequent DELETE finds the parent in
``completed`` (terminal) and the runner registry empty.
async def test_delete_200_cancel_success(
client: AsyncClient,
sample_store: Store,
sample_products_3: list[Product],
sample_sales_120: list[Any],
runner_registry,
) -> None:
"""Happy-path cancel: DELETE /batch/{batch_id} → 200 with successful drain.
Seeds an in-flight batch, registers a CancelHandle in the runner registry
whose ``cancel_batch`` and ``await_drain`` both succeed, and verifies that
DELETE returns 200 with a terminal batch status and no running items.
"""
# Seed an in-flight batch (same shape as the chaos tests use).
batch = await _seed_synthetic_batch(
store=sample_store,
products=sample_products_3,
sales_rows=sample_sales_120,
start_inflight=True,
)
# Register a CancelHandle that reports both cancel and drain as successful.
cancel_handle = AsyncMock()
cancel_handle.cancel_batch.return_value = True
cancel_handle.await_drain.return_value = True
runner_registry.register(batch_id=batch.batch_id, handle=cancel_handle)
# Exercise the cancel endpoint.
resp = await client.delete(f"/batch/{batch.batch_id}")
assert resp.status_code == 200
body = resp.json()
# Terminal batch status is either fully cancelled or partially completed.
assert body["status"] in ("cancelled", "partial")
# All items should have drained; none are still running.
assert body["running_items"] == 0
# The result summary should include the effective max parallelism used.
summary = body.get("result_summary") or {}
assert "effective_max_parallel" in summary
assert isinstance(summary["effective_max_parallel"], int)
assert summary["effective_max_parallel"] >= 1
```
To make this compile and pass, you will also need to:
1. **Import helpers and mocks at the top of this file (if not already present):**
- `from httpx import AsyncClient` (already likely present).
- `from unittest.mock import AsyncMock`.
- `from app.features.batch.tests.test_runner_chaos import _seed_synthetic_batch` (or the correct module path for `_seed_synthetic_batch`).
- Whatever provides `Store`, `Product`, and `runner_registry` if not already imported/declared in this file or `conftest.py`.
2. **Ensure `_seed_synthetic_batch` supports an `start_inflight=True` (or equivalent) flag:**
- It should create a batch in the “in-flight” / “running” state instead of marking it completed.
- It must return an object with a `batch_id` attribute that matches what `DELETE /batch/{batch_id}` expects.
3. **Ensure the runner registry API matches the usage in the test:**
- Provide a `runner_registry` fixture that yields the registry used by the cancel route.
- Implement `runner_registry.register(batch_id=..., handle=...)` or adapt the call in the test to your actual API (e.g., `runner_registry[batch.batch_id] = cancel_handle` or `runner_registry.add(batch.batch_id, cancel_handle)`).
4. **Align the response JSON fields with your actual schema:**
- If your cancel endpoint uses different field names (e.g., `state` instead of `status`, or `summary` instead of `result_summary`), adjust the assertions in the test accordingly.
- If terminal statuses use different strings (e.g., `"canceled"` vs `"cancelled"`, `"partially_completed"` vs `"partial"`), update the `status` assertion to use the exact values your API returns.
</issue_to_address>
### Comment 3
<location path="PRPs/PRP-34-batch-parallel-execution.md" line_range="554" />
<code_context>
+ test_cancel_pending_child_marks_cancelled_without_running
+ - max_parallel=1, 3 items. After first starts, cancel event fires.
+ - Assert items 2 and 3 transition pending → cancelled, never opened a session.
+ test_cancel_running_child_propagates_cancellederror
+ - One child sleeps 1s; cancel after 0.05s. Child observes CancelledError, finally block writes cancelled.
+
</code_context>
<issue_to_address>
**issue (typo):** The test name here likely has a typo in `cancellederror` and should match the usual `CancelledError` spelling.
For consistency with the rest of the document (where you use `asyncio.CancelledError`), please rename this to something like `test_cancel_running_child_propagates_cancelled_error` so the `CancelledError` portion is spelled correctly and easier to find via search.
```suggestion
test_cancel_running_child_propagates_cancelled_error
```
</issue_to_address>
### Comment 4
<location path="app/features/batch/runner.py" line_range="122" />
<code_context>
+ total_items=len(item_ids), max_parallel=max_parallel,
+ effective_max_parallel=effective)
+
+ async def _child(item_id: str) -> None:
+ # FAST-CANCEL BEFORE acquire — skips not-yet-started work cleanly.
+ if handle.cancel_event.is_set():
</code_context>
<issue_to_address>
**issue (complexity):** Consider reusing a single per-child AsyncSession and passing it into the helper functions instead of having each helper create its own session and transaction.
You can reduce complexity and boilerplate without changing behavior by reusing a single `AsyncSession` per child and passing it into the helpers, instead of having each helper open its own session/transaction.
That gives you:
- fewer moving parts (no per-helper `async with session_maker()`)
- more obvious “per child, per DB session” semantics
- a simpler mental model for state transitions
Concretely:
1. Create a per-child session in `_child` and pass it to the helpers:
```python
async def _child(item_id: str) -> None:
async with session_maker() as session:
if handle.cancel_event.is_set():
await _mark_cancelled_skipped(session, item_id)
return
acquired = False
try:
async with sem:
acquired = True
if handle.cancel_event.is_set():
await _mark_cancelled_skipped(session, item_id)
return
await _bump_running(session, batch_id, +1)
try:
await execute_item(item_id)
except asyncio.CancelledError:
await _mark_cancelled_running(session, item_id)
raise
except Exception:
logger.exception(
"batch.runner_unexpected_child_error",
batch_id=batch_id,
item_id=item_id,
)
await _mark_failed_unexpected(session, item_id)
finally:
await _bump_running(session, batch_id, -1)
except asyncio.CancelledError:
if not acquired:
await _mark_cancelled_skipped(session, item_id)
raise
```
2. Change helpers to accept an `AsyncSession` instead of a `session_maker`, and drop their internal `async with` + `commit` boilerplate. For example:
```python
async def _bump_running(
session: AsyncSession,
batch_id: str,
delta: int,
) -> None:
await session.execute(
update(BatchJob)
.where(BatchJob.batch_id == batch_id)
.values(running_items=BatchJob.running_items + delta)
)
await session.commit()
```
```python
async def _mark_cancelled_skipped(
session: AsyncSession,
item_id: str,
) -> None:
now = datetime.now(UTC)
await session.execute(
update(BatchJobItem)
.where(BatchJobItem.item_id == item_id)
.values(
status=BatchItemStatus.CANCELLED.value,
completed_at=now,
)
)
await session.commit()
```
```python
async def _mark_cancelled_running(
session: AsyncSession,
item_id: str,
) -> None:
from sqlalchemy import select
now = datetime.now(UTC)
row = (
await session.execute(
select(BatchJobItem.started_at).where(BatchJobItem.item_id == item_id)
)
).first()
started_at = row[0] if row is not None else None
duration_ms = (
int((now - started_at).total_seconds() * 1000)
if started_at is not None
else None
)
await session.execute(
update(BatchJobItem)
.where(BatchJobItem.item_id == item_id)
.values(
status=BatchItemStatus.CANCELLED.value,
completed_at=now,
duration_ms=duration_ms,
)
)
await session.commit()
```
```python
async def _mark_failed_unexpected(
session: AsyncSession,
item_id: str,
) -> None:
now = datetime.now(UTC)
await session.execute(
update(BatchJobItem)
.where(BatchJobItem.item_id == item_id)
.values(
status=BatchItemStatus.FAILED.value,
completed_at=now,
error_message="Runner caught unexpected exception (see structlog)",
error_type="UnexpectedRunnerError",
)
)
await session.commit()
```
This keeps all existing behavior (including per-item commits and the same cancellation semantics) while making the control flow and DB interaction substantially easier to follow and maintain.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| try: | ||
| async with sem: | ||
| acquired = True | ||
| # Re-check after acquire — a sibling may have signalled | ||
| # cancel while we waited on the semaphore. | ||
| if handle.cancel_event.is_set(): | ||
| await _mark_cancelled_skipped(session_maker, item_id) | ||
| return | ||
| await _bump_running(session_maker, batch_id, +1) | ||
| try: |
There was a problem hiding this comment.
issue (bug_risk): Use asyncio.TimeoutError instead of built-in TimeoutError when catching wait_for timeouts
asyncio.wait_for raises asyncio.TimeoutError, not the built-in TimeoutError. Catching TimeoutError here means the timeout will escape instead of returning False, breaking the intended 504 behavior in the DELETE handler. Change the except clause to except asyncio.TimeoutError: (or import and use it directly) so the drain timeout behaves as documented and matches cancel_batch_route's handling.
| async def test_delete_409_terminal_batch( | ||
| client: AsyncClient, | ||
| sample_store: Store, | ||
| sample_products_3: list[Product], | ||
| sample_sales_120: list[Any], | ||
| ) -> None: | ||
| """A successfully-completed batch is terminal — DELETE returns RFC 7807 409. | ||
|
|
||
| Submits a 3-pair naive backtest; the run completes synchronously inside | ||
| ``POST /batch/forecasting``. The subsequent DELETE finds the parent in |
There was a problem hiding this comment.
suggestion (testing): Add a happy-path DELETE /batch/{batch_id} test that exercises the 200 + drain-success case
Right now the tests only cover 404, 409, and 504, but not the documented 200 + successful drain path. Please add an integration test that:
- seeds an in-flight batch (similar to
_seed_synthetic_batchintest_runner_chaos.py), - registers a
CancelHandlein the runner registry, - calls
DELETE /batch/{batch_id}such thatrunner.cancel_batchandrunner.await_drainboth returnTrue, - asserts a 200 and verifies the body shows a terminal status (
cancelledorpartial),running_items == 0, andeffective_max_parallelinresult_summary.
This will exercise the full success path for the cancel flow end-to-end.
Suggested implementation:
"""A successfully-completed batch is terminal — DELETE returns RFC 7807 409.
Submits a 3-pair naive backtest; the run completes synchronously inside
``POST /batch/forecasting``. The subsequent DELETE finds the parent in
``completed`` (terminal) and the runner registry empty.
async def test_delete_200_cancel_success(
client: AsyncClient,
sample_store: Store,
sample_products_3: list[Product],
sample_sales_120: list[Any],
runner_registry,
) -> None:
"""Happy-path cancel: DELETE /batch/{batch_id} → 200 with successful drain.
Seeds an in-flight batch, registers a CancelHandle in the runner registry
whose ``cancel_batch`` and ``await_drain`` both succeed, and verifies that
DELETE returns 200 with a terminal batch status and no running items.
"""
# Seed an in-flight batch (same shape as the chaos tests use).
batch = await _seed_synthetic_batch(
store=sample_store,
products=sample_products_3,
sales_rows=sample_sales_120,
start_inflight=True,
)
# Register a CancelHandle that reports both cancel and drain as successful.
cancel_handle = AsyncMock()
cancel_handle.cancel_batch.return_value = True
cancel_handle.await_drain.return_value = True
runner_registry.register(batch_id=batch.batch_id, handle=cancel_handle)
# Exercise the cancel endpoint.
resp = await client.delete(f"/batch/{batch.batch_id}")
assert resp.status_code == 200
body = resp.json()
# Terminal batch status is either fully cancelled or partially completed.
assert body["status"] in ("cancelled", "partial")
# All items should have drained; none are still running.
assert body["running_items"] == 0
# The result summary should include the effective max parallelism used.
summary = body.get("result_summary") or {}
assert "effective_max_parallel" in summary
assert isinstance(summary["effective_max_parallel"], int)
assert summary["effective_max_parallel"] >= 1To make this compile and pass, you will also need to:
-
Import helpers and mocks at the top of this file (if not already present):
from httpx import AsyncClient(already likely present).from unittest.mock import AsyncMock.from app.features.batch.tests.test_runner_chaos import _seed_synthetic_batch(or the correct module path for_seed_synthetic_batch).- Whatever provides
Store,Product, andrunner_registryif not already imported/declared in this file orconftest.py.
-
Ensure
_seed_synthetic_batchsupports anstart_inflight=True(or equivalent) flag:- It should create a batch in the “in-flight” / “running” state instead of marking it completed.
- It must return an object with a
batch_idattribute that matches whatDELETE /batch/{batch_id}expects.
-
Ensure the runner registry API matches the usage in the test:
- Provide a
runner_registryfixture that yields the registry used by the cancel route. - Implement
runner_registry.register(batch_id=..., handle=...)or adapt the call in the test to your actual API (e.g.,runner_registry[batch.batch_id] = cancel_handleorrunner_registry.add(batch.batch_id, cancel_handle)).
- Provide a
-
Align the response JSON fields with your actual schema:
- If your cancel endpoint uses different field names (e.g.,
stateinstead ofstatus, orsummaryinstead ofresult_summary), adjust the assertions in the test accordingly. - If terminal statuses use different strings (e.g.,
"canceled"vs"cancelled","partially_completed"vs"partial"), update thestatusassertion to use the exact values your API returns.
- If your cancel endpoint uses different field names (e.g.,
Six review-driven cleanups, behaviour preserved: - runner.py: helpers now accept the per-child AsyncSession instead of the session_maker; one session opened at the top of _child, reused for every state-transition write (each helper still commits its own UPDATE so the running_items counter is observable to concurrent DELETE handlers). - runner.py: clarifying comment on the `except TimeoutError:` branch in await_drain — `asyncio.TimeoutError` has been aliased to the built-in since Python 3.11 (PEP 678 / asyncio docs); the project pins >= 3.12. - frontend/src/components/ui/slider.tsx: switched from the bundled `radix-ui` package to per-component `@radix-ui/react-slider` to match the project's existing shadcn primitives (alert-dialog, dialog, etc.). - app/features/batch/models.py + routes.py: TERMINAL_BATCH_STATES is now derived from VALID_BATCH_TRANSITIONS (a status with no out-edges is terminal) and exported from the models module — routes.py imports it instead of redeclaring the set. - frontend/src/types/api.ts: exports TERMINAL_BATCH_STATES so batch.tsx (and any future consumer) reads from the single source of truth. - test_routes_cancel.py: added test_delete_200_clean_drain covering the documented 200 happy-path. - Typo: test_cancel_running_child_propagates_cancellederror → test_cancel_running_child_propagates_cancelled_error (PRP doc updated).
Summary
Implements PRP-34 — activates the three forward-compat columns PRP-33 shipped on
batch_job(max_parallel,running_items,cancelled_items) by routingBatchService.submitthrough a newapp/features/batch/runner.py(singleasyncio.Semaphoreinside anasyncio.TaskGroup, per-childAsyncSession, cooperative cancellation via per-batchasyncio.Event+ trackedTaskrefs).Adds
DELETE /batch/{batch_id}(200 settled / 404 / 409 / 504 RFC 7807 via the newGatewayTimeoutError), a max-parallelSlider+ cancelAlertDialogonfrontend/src/pages/visualize/batch.tsx, and two new settings (BATCH_GLOBAL_MAX_PARALLEL=4,BATCH_CANCEL_DRAIN_TIMEOUT_SECONDS=30).Closes #290.
What's new
app/features/batch/runner.py(new) —_ACTIVE_BATCHESregistry,CancelHandle,run_batch / cancel_batch / await_drain / mark_completed. Theexcept* asyncio.CancelledErrorcatch shape is the PEP-654 form documented inPRPs/ai_docs/asyncio-taskgroup-cancellation.md. Children use one sharedasync_sessionmaker(no per-child engine).app/features/batch/service.py—submit()now delegates torunner.run_batch;_settle()writeseffective_max_parallelintoresult_summaryJSONB (no migration). The existing_pick_next/_execute_itemkept on the class so downstream PRPs reuse them.app/features/batch/routes.py—DELETE /batch/{batch_id}with RFC 7807 404 / 409 / 504.app/features/batch/schemas.py—BatchSubmitResponse.effective_max_parallelis a@computed_field(legacy rows return 0).app/features/batch/models.py—VALID_BATCH_ITEM_TRANSITIONS[RUNNING]now includesCANCELLEDso the cooperative-cancel path can write its terminal state truthfully.app/core/{config,exceptions,problem_details}.py—Settings.batch_global_max_parallel/Settings.batch_cancel_drain_timeout_seconds,GatewayTimeoutError,ERROR_TYPES["GATEWAY_TIMEOUT"].Slider(added via shadcn MCPpnpm dlx shadcn@4.7.0 add slider),useCancelBatchhook,BatchSubmitResponse.{max_parallel, effective_max_parallel},Cancel batchbutton +AlertDialog, liverunning_items+parallelchips.Why no Alembic migration
PRP-33 already shipped
batch_job.max_parallel / running_items / cancelled_itemsas forward-compat columns.uv run alembic checkreports "No new upgrade operations detected" on this branch.Test plan
uv run ruff check . && uv run ruff format --check .— cleanuv run mypy app/— Success, no issues in 291 source filesuv run pyright app/— 0 errors (69 pre-existing warnings, unrelated)uv run pytest -m "not integration"— 1466 passeduv run pytest -m integration app/features/batch/ tests/— 15 passed, 3 skipped (docker-stack fixtures, unrelated to PRP-34)uv run alembic check— No new upgrade operations detectedcd frontend && pnpm tsc --noEmit && pnpm lint && pnpm test --run— 121 frontend tests passing, 0 lint errorsNew tests (load-bearing)
app/features/batch/tests/test_runner.py— 9 unit tests includingtest_semaphore_caps_concurrency(catches unbounded fan-out),test_settings_global_cap_clamps_max_parallel,test_cancel_pending_child_marks_cancelled_without_running,test_cancel_running_child_propagates_cancellederror, plus 4 registry-hygiene testsapp/features/batch/tests/test_routes_cancel.py— 3 integration tests forDELETE /batch/{batch_id}(404 / 409 / 504)app/features/batch/tests/test_runner_chaos.py— 2 integration tests asserting no orphaned RUNNING rows post-cancel + parentrunning_items=0post-drainfrontend/src/hooks/use-batches.test.ts— 2 vitest cases foruseCancelBatch(success path + 409 RFC 7807 surface)Anti-pattern greps (all clean)
Deviations from the PRP
if handle.cancel_event.is_set():beforeasync with sem:but didn't catch theCancelledErrorraised insideasync with sem:when a pending child is cancelled mid-acquire. Fixed by wrapping with an outertry/except CancelledErrorkeyed off anacquiredflag — pending children still route to_mark_cancelled_skipped. The PRP's load-bearingtest_cancel_pending_child_marks_cancelled_without_runningcovers this.VALID_BATCH_ITEM_TRANSITIONS[RUNNING]extended to includeCANCELLED(andtest_valid_transitions_dict_itemupdated to match) — the doc dict was inconsistent with the new cooperative-cancel terminal state.@pytest.mark.integration(need a liveget_db); the PRP listed them under "Level 2: Unit Tests" but they exercise the FastAPI dependency chain.4.7.0fallback (pnpm dlx shadcn@latest5.x silently failed) — explicitly anticipated by the PRP.Summary by Sourcery
Activate bounded parallel execution and cooperative cancellation for batch jobs across backend and frontend.
New Features:
Enhancements:
Documentation: