fix(ops): expose processing/cancelled statuses through API and UI#1231
Merged
nicoloboschi merged 4 commits intomainfrom Apr 23, 2026
Merged
fix(ops): expose processing/cancelled statuses through API and UI#1231nicoloboschi merged 4 commits intomainfrom
nicoloboschi merged 4 commits intomainfrom
Conversation
The API was collapsing 'processing' into 'pending' before returning operation status to clients. Cancel was deleting the operation row instead of preserving it with a 'cancelled' status. - Stop mapping processing→pending in list/get operation responses - Add 'processing' to OperationStatusResponse Literal type - Change cancel_operation to set status='cancelled' instead of DELETE - Guard cancel to only accept pending operations (409 otherwise) - Extend retry to accept both failed and cancelled operations - Add _check_op_alive support for cancelled status - Add DB migration for 'cancelled' in status check constraint - Add processing/cancelled badges and filters in operations UI - Add cancel/retry buttons in operation detail dialog - Align stats card status colors and labels with operations table - Regenerate OpenAPI spec and all client SDKs
r266-tech
added a commit
to r266-tech/hindsight
that referenced
this pull request
Apr 23, 2026
r266-tech
added a commit
to r266-tech/hindsight
that referenced
this pull request
Apr 23, 2026
3 tasks
nicoloboschi
added a commit
that referenced
this pull request
Apr 24, 2026
Investigation and fixes for test failures on latest main: 1. test_per_operation_llm_config (2 tests): defaults were hardcoded to 10, but #1121 reduced DEFAULT_LLM_MAX_RETRIES to 3. Drive assertions from the constant so this tracks future changes automatically. 2. test_sql_schema_safety: #1210 added a docstring on task_backend.py:136 that said "INSERTed into async_operations", which false-positived the unqualified-table regex (INTO+INSERT+bare table). Rephrased the prose. 3. test_memory_engine_execute_task_passes_through_defer_operation: #1231 made execute_task short-circuit when the async_operations row is missing (treat as cancelled). The test created a fresh operation_id without inserting a row, so the handler never ran. Insert a pending row before execute_task. 4. 4 worker claim_batch / scan tests: assertions were counting total claims across the whole DB. test_async_batch_retain.py submits pending async_operations without sharing an xdist group, so parallel xdist workers polluted each other. Put test_async_batch_retain.py in the "worker_tests" group and also scope the worker-test assertions to the banks each test created, as defense-in-depth. 5. test_refresh_content_respects_max_tokens: observed ~1.9x over cap under Gemini's non-determinism; the 1.5x tolerance was too tight. Bumped to 2.5x — still well under the ~20x a "cap ignored" regression would produce.
nicoloboschi
added a commit
that referenced
this pull request
Apr 24, 2026
* fix(tests): repair 9 regressions surfaced on main Investigation and fixes for test failures on latest main: 1. test_per_operation_llm_config (2 tests): defaults were hardcoded to 10, but #1121 reduced DEFAULT_LLM_MAX_RETRIES to 3. Drive assertions from the constant so this tracks future changes automatically. 2. test_sql_schema_safety: #1210 added a docstring on task_backend.py:136 that said "INSERTed into async_operations", which false-positived the unqualified-table regex (INTO+INSERT+bare table). Rephrased the prose. 3. test_memory_engine_execute_task_passes_through_defer_operation: #1231 made execute_task short-circuit when the async_operations row is missing (treat as cancelled). The test created a fresh operation_id without inserting a row, so the handler never ran. Insert a pending row before execute_task. 4. 4 worker claim_batch / scan tests: assertions were counting total claims across the whole DB. test_async_batch_retain.py submits pending async_operations without sharing an xdist group, so parallel xdist workers polluted each other. Put test_async_batch_retain.py in the "worker_tests" group and also scope the worker-test assertions to the banks each test created, as defense-in-depth. 5. test_refresh_content_respects_max_tokens: observed ~1.9x over cap under Gemini's non-determinism; the 1.5x tolerance was too tight. Bumped to 2.5x — still well under the ~20x a "cap ignored" regression would produce. * fix(tests): extend bank-scoped claim filters to 3 more worker tests CI on the first fix commit surfaced the same cross-file isolation problem in three additional worker tests. Apply the same bank-scoped filter pattern so each assertion only counts claims for the bank the test actually created: - test_claim_batch_claims_pending_tasks - test_concurrent_workers_claim_different_tasks - test_worker_slot_limits_enforced (in this one the executor itself ignores leaked tasks so its slot-limit gating stays on our tasks) These flake under parallel xdist because claim_batch() is global across bank_id; any pending row from another test file gets scooped up. The per-test filter is defense-in-depth on top of putting test_async_batch_retain.py in the same xdist_group. * fix(tests): isolate more slot/executor worker tests from cross-file claims test-api CI after the previous fix surfaced four more worker tests flaking the same way: they assert on counts that include tasks the poller legitimately claims from other test files running in parallel. Same bank-scoped filter pattern applied in the executor, plus the poller-internal counter assertions relaxed to >= (our executor returns immediately for non-our-bank tasks, but the counter may see them briefly before the slot frees). Covers: - test_worker_fire_and_forget_nonblocking - test_consolidation_slots_reserved_when_retain_saturates - test_per_operation_slot_reservations (multi-bank variant) - test_shared_pool_usable_by_reserved_types (preemptive) * fix(ui): remove unnecessary \- escape in parseBucketIso regexes ESLint's no-useless-escape flags \- inside a character class when the dash is not between two chars. Move the dash to the boundary so it's always a literal without needing an escape. Pre-existing on main (introduced by #1245); surfaced when verify- generated-files started exercising this lint path again after #1248. * chore: sync generated files with committed sources verify-generated-files was failing because main's committed copies of two generated/auto-formatted files have drifted from what the scripts and ruff now produce: - hindsight-api-slim/hindsight_api/db_url.py: ruff format now collapses a 2-line list comprehension to 1 line (long-line threshold). - skills/hindsight-docs/references/developer/configuration.md: the doc-skill generator emits the Cohere output_dimensions entry that #1249 added to configuration.md but didn't regenerate the skill copy. Not functional changes — just aligning the committed outputs with the generators/formatters. * fix(tests): isolate test_recall_time_range hardcoded-UUID fixture This file inserts memory_units with three hardcoded UUIDs (00000000-…-000{1,2,3}). memory_units.id is a global primary key, so parallel xdist workers running these tests simultaneously hit pk_memory_units uniqueness violations (seen intermittently in test-api CI as fixture-setup ERRORs). Two defenses: - Share an xdist_group so the eight tests serialize on the same worker — prevents concurrent workers from inserting the same IDs. - Defensive pre-DELETE at fixture setup so a previous interrupted run's leftover rows don't poison the next setup. Flake, not a regression from this branch, but surfaces here so fixing it unblocks the PR. * fix(tests): filter claims in test_poller_without_tenant_extension_uses_public One more worker test that asserted len(claimed) == 3 without scoping to its own bank; scope the assertion to bank_id. Keeps the schema-None invariant on every claim since no tenant extension is configured.
nicoloboschi
pushed a commit
that referenced
this pull request
Apr 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
processingintopendingin list/get operation API responses — the real DB status is now returned as-isstatus='cancelled'instead of deleting the row, with a guard that onlypendingoperations can be cancelled (409 otherwise)failedandcancelledoperationsprocessing/cancelledstatus badges, filters, and aligned stats card colors/labels with the operations tablecancelledto theasync_operations_status_checkconstraintTest plan
test_operation_status.pycovering list, get, filter, cancel→cancelled, retry from cancelled, and status validation guardstest_op_cancellation.pytests pass (cancel checkpoint, cascade delete, etc.)