Skip to content

fix(ops): expose processing/cancelled statuses through API and UI#1231

Merged
nicoloboschi merged 4 commits intomainfrom
fix/expose-processing-cancelled-status
Apr 23, 2026
Merged

fix(ops): expose processing/cancelled statuses through API and UI#1231
nicoloboschi merged 4 commits intomainfrom
fix/expose-processing-cancelled-status

Conversation

@nicoloboschi
Copy link
Copy Markdown
Collaborator

Summary

  • Stop collapsing processing into pending in list/get operation API responses — the real DB status is now returned as-is
  • Cancel sets status='cancelled' instead of deleting the row, with a guard that only pending operations can be cancelled (409 otherwise)
  • Retry accepts both failed and cancelled operations
  • UI: added processing/cancelled status badges, filters, and aligned stats card colors/labels with the operations table
  • UI: added cancel/retry buttons in the operation detail dialog
  • DB migration: adds cancelled to the async_operations_status_check constraint
  • OpenAPI spec and all client SDKs (Python, TypeScript, Go, Rust) regenerated

Test plan

  • 9 new API tests in test_operation_status.py covering list, get, filter, cancel→cancelled, retry from cancelled, and status validation guards
  • All 10 existing test_op_cancellation.py tests pass (cancel checkpoint, cascade delete, etc.)
  • Lint passes
  • Manual verification of UI status badges and dialog buttons

The API was collapsing 'processing' into 'pending' before returning
operation status to clients. Cancel was deleting the operation row
instead of preserving it with a 'cancelled' status.

- Stop mapping processing→pending in list/get operation responses
- Add 'processing' to OperationStatusResponse Literal type
- Change cancel_operation to set status='cancelled' instead of DELETE
- Guard cancel to only accept pending operations (409 otherwise)
- Extend retry to accept both failed and cancelled operations
- Add _check_op_alive support for cancelled status
- Add DB migration for 'cancelled' in status check constraint
- Add processing/cancelled badges and filters in operations UI
- Add cancel/retry buttons in operation detail dialog
- Align stats card status colors and labels with operations table
- Regenerate OpenAPI spec and all client SDKs
@nicoloboschi nicoloboschi merged commit 80982da into main Apr 23, 2026
53 of 54 checks passed
r266-tech added a commit to r266-tech/hindsight that referenced this pull request Apr 23, 2026
r266-tech added a commit to r266-tech/hindsight that referenced this pull request Apr 23, 2026
nicoloboschi added a commit that referenced this pull request Apr 24, 2026
Investigation and fixes for test failures on latest main:

1. test_per_operation_llm_config (2 tests): defaults were hardcoded to 10,
   but #1121 reduced DEFAULT_LLM_MAX_RETRIES to 3. Drive assertions from
   the constant so this tracks future changes automatically.

2. test_sql_schema_safety: #1210 added a docstring on task_backend.py:136
   that said "INSERTed into async_operations", which false-positived the
   unqualified-table regex (INTO+INSERT+bare table). Rephrased the prose.

3. test_memory_engine_execute_task_passes_through_defer_operation: #1231
   made execute_task short-circuit when the async_operations row is
   missing (treat as cancelled). The test created a fresh operation_id
   without inserting a row, so the handler never ran. Insert a pending
   row before execute_task.

4. 4 worker claim_batch / scan tests: assertions were counting total
   claims across the whole DB. test_async_batch_retain.py submits
   pending async_operations without sharing an xdist group, so parallel
   xdist workers polluted each other. Put test_async_batch_retain.py in
   the "worker_tests" group and also scope the worker-test assertions
   to the banks each test created, as defense-in-depth.

5. test_refresh_content_respects_max_tokens: observed ~1.9x over cap
   under Gemini's non-determinism; the 1.5x tolerance was too tight.
   Bumped to 2.5x — still well under the ~20x a "cap ignored" regression
   would produce.
nicoloboschi added a commit that referenced this pull request Apr 24, 2026
* fix(tests): repair 9 regressions surfaced on main

Investigation and fixes for test failures on latest main:

1. test_per_operation_llm_config (2 tests): defaults were hardcoded to 10,
   but #1121 reduced DEFAULT_LLM_MAX_RETRIES to 3. Drive assertions from
   the constant so this tracks future changes automatically.

2. test_sql_schema_safety: #1210 added a docstring on task_backend.py:136
   that said "INSERTed into async_operations", which false-positived the
   unqualified-table regex (INTO+INSERT+bare table). Rephrased the prose.

3. test_memory_engine_execute_task_passes_through_defer_operation: #1231
   made execute_task short-circuit when the async_operations row is
   missing (treat as cancelled). The test created a fresh operation_id
   without inserting a row, so the handler never ran. Insert a pending
   row before execute_task.

4. 4 worker claim_batch / scan tests: assertions were counting total
   claims across the whole DB. test_async_batch_retain.py submits
   pending async_operations without sharing an xdist group, so parallel
   xdist workers polluted each other. Put test_async_batch_retain.py in
   the "worker_tests" group and also scope the worker-test assertions
   to the banks each test created, as defense-in-depth.

5. test_refresh_content_respects_max_tokens: observed ~1.9x over cap
   under Gemini's non-determinism; the 1.5x tolerance was too tight.
   Bumped to 2.5x — still well under the ~20x a "cap ignored" regression
   would produce.

* fix(tests): extend bank-scoped claim filters to 3 more worker tests

CI on the first fix commit surfaced the same cross-file isolation
problem in three additional worker tests. Apply the same bank-scoped
filter pattern so each assertion only counts claims for the bank the
test actually created:

- test_claim_batch_claims_pending_tasks
- test_concurrent_workers_claim_different_tasks
- test_worker_slot_limits_enforced (in this one the executor itself
  ignores leaked tasks so its slot-limit gating stays on our tasks)

These flake under parallel xdist because claim_batch() is global
across bank_id; any pending row from another test file gets scooped
up. The per-test filter is defense-in-depth on top of putting
test_async_batch_retain.py in the same xdist_group.

* fix(tests): isolate more slot/executor worker tests from cross-file claims

test-api CI after the previous fix surfaced four more worker tests
flaking the same way: they assert on counts that include tasks the
poller legitimately claims from other test files running in parallel.

Same bank-scoped filter pattern applied in the executor, plus the
poller-internal counter assertions relaxed to >= (our executor
returns immediately for non-our-bank tasks, but the counter may see
them briefly before the slot frees).

Covers:
- test_worker_fire_and_forget_nonblocking
- test_consolidation_slots_reserved_when_retain_saturates
- test_per_operation_slot_reservations (multi-bank variant)
- test_shared_pool_usable_by_reserved_types (preemptive)

* fix(ui): remove unnecessary \- escape in parseBucketIso regexes

ESLint's no-useless-escape flags \- inside a character class when the
dash is not between two chars. Move the dash to the boundary so it's
always a literal without needing an escape.

Pre-existing on main (introduced by #1245); surfaced when verify-
generated-files started exercising this lint path again after #1248.

* chore: sync generated files with committed sources

verify-generated-files was failing because main's committed copies of
two generated/auto-formatted files have drifted from what the scripts
and ruff now produce:

- hindsight-api-slim/hindsight_api/db_url.py: ruff format now collapses
  a 2-line list comprehension to 1 line (long-line threshold).
- skills/hindsight-docs/references/developer/configuration.md: the
  doc-skill generator emits the Cohere output_dimensions entry that
  #1249 added to configuration.md but didn't regenerate the skill copy.

Not functional changes — just aligning the committed outputs with the
generators/formatters.

* fix(tests): isolate test_recall_time_range hardcoded-UUID fixture

This file inserts memory_units with three hardcoded UUIDs
(00000000-…-000{1,2,3}). memory_units.id is a global primary key, so
parallel xdist workers running these tests simultaneously hit
pk_memory_units uniqueness violations (seen intermittently in
test-api CI as fixture-setup ERRORs).

Two defenses:
- Share an xdist_group so the eight tests serialize on the same
  worker — prevents concurrent workers from inserting the same IDs.
- Defensive pre-DELETE at fixture setup so a previous interrupted
  run's leftover rows don't poison the next setup.

Flake, not a regression from this branch, but surfaces here so
fixing it unblocks the PR.

* fix(tests): filter claims in test_poller_without_tenant_extension_uses_public

One more worker test that asserted len(claimed) == 3 without scoping
to its own bank; scope the assertion to bank_id. Keeps the schema-None
invariant on every claim since no tenant extension is configured.
nicoloboschi pushed a commit that referenced this pull request Apr 24, 2026
* docs(ops): document processing + cancelled statuses from #1231

* docs(skills): mirror operations.md status update from #1231
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant