feat(worker): add priority-based consolidation bank scheduling by nicoloboschi · Pull Request #1813 · vectorize-io/hindsight

nicoloboschi · 2026-05-28T10:32:12Z

Summary

Adds HINDSIGHT_API_WORKER_CONSOLIDATION_BANK_PRIORITY env var to control which banks' consolidation tasks are claimed first when a slot opens
Prevents large banks (e.g., 486K-node shadow-meetings) from being starved by many small banks cycling through limited global consolidation slots
Uses tiered claiming — each priority level is a separate index-friendly query (no JOINs or computed ORDER BY). Number of queries = number of distinct priority tiers (typically 2). Zero overhead when unset
Pattern wildcards supported (e.g., shadow-*:10,staging-*:5,*:1), using LIKE ANY / NOT LIKE ALL in PostgreSQL, expanded to OR/AND clauses for Oracle
Bank serialization (max 1 concurrent consolidation per bank) is preserved regardless of priority
Backwards compatible: when unset, behavior is identical to current ORDER BY created_at

Design decision: priority queues vs per-bank slot reservation

The original issue proposed per-bank slot allocation (SLOTS_PER_BANK). We chose priority-based scheduling instead because:

Consolidation batches are short-lived (capped at 100 memories, then reschedule), so the real problem is scheduling order, not slot reservation
Priority queues waste no capacity — when a high-priority bank has no pending work, other banks use the slots freely
This matches standard distributed task system patterns (Celery, SQS, Sidekiq)

Test plan

8 unit tests for config parsing (TestParseBankPriority)
5 integration tests against real PostgreSQL (TestConsolidationBankPriority):
- High-priority bank claimed before low-priority despite newer created_at
- Wildcard pattern matching (shadow-* matches shadow-meetings, shadow-people)
- Catch-all default for unlisted banks
- Backwards compat when priority unset
- Priority respects bank serialization (busy high-priority bank still excluded)
All 80 existing worker tests pass with no regressions
Lint passes

Add HINDSIGHT_API_WORKER_CONSOLIDATION_BANK_PRIORITY env var to control which banks' consolidation tasks are claimed first when a slot opens. This prevents large banks from being starved by many small banks cycling through limited global consolidation slots. Format: comma-separated bank-pattern:priority pairs (higher = claimed first). Patterns support * wildcards; bare * is the catch-all default. Example: "shadow-*:10,staging-*:5,*:1" Implementation uses tiered claiming — each priority level is a separate index-friendly query, no JOINs or computed ORDER BY. Bank serialization (max 1 concurrent consolidation per bank) is preserved.

nicoloboschi added 2 commits May 28, 2026 12:31

fix: suppress chained exception in _parse_bank_priority

cb7ed2c

nicoloboschi merged commit cf63779 into main May 28, 2026
71 of 72 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(worker): add priority-based consolidation bank scheduling#1813

feat(worker): add priority-based consolidation bank scheduling#1813
nicoloboschi merged 2 commits into
mainfrom
feat/consolidation-bank-priority

nicoloboschi commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicoloboschi commented May 28, 2026

Summary

Design decision: priority queues vs per-bank slot reservation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant