fix(clp-package): Prevent query scheduler from resetting job start time during sub-job dispatch (fixes #1874).#1875
Conversation
…me during sub-job dispatch (fixes y-scope#1874).
WalkthroughThe change modifies Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py`:
- Around line 629-631: Add a brief inline comment above the conditional
assignment that sets new_job.start_time only when it's None to explain that this
preserves the original dispatch start_time across recursive/subsequent calls to
dispatch_job_and_update_db and prevents overwriting when the DB CAS check
(prev_status=QueryJobStatus.PENDING) has already advanced the job; target the
block using the symbols start_time, new_job.start_time, and
dispatch_job_and_update_db so maintainers understand the in-memory vs DB
consistency rationale.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR `#1152` discussion.
📚 Learning: 2024-11-15T20:07:22.256Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 569
File: components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py:380-392
Timestamp: 2024-11-15T20:07:22.256Z
Learning: The current implementation assumes single-threaded execution, so race conditions in functions like `is_target_extracted` in `components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py` are not currently a concern.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: check-generated
- GitHub Check: build (macos-15)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| start_time = datetime.datetime.now() | ||
| new_job.start_time = start_time | ||
| if new_job.start_time is None: | ||
| new_job.start_time = start_time |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
LGTM! Correctly preserves start_time for sub-job dispatches.
The conditional assignment ensures the original start_time is retained when dispatch_job_and_update_db is called again for subsequent sub-job batches. The database update on line 638 remains unchanged, but since the CAS condition (prev_status=QueryJobStatus.PENDING) prevents overwrites after the first dispatch, this fix properly aligns the in-memory state with the intended database behaviour.
Consider adding a brief comment for future maintainers:
📝 Suggested comment
start_time = datetime.datetime.now()
+ # Only set start_time on first dispatch; preserve it for sub-job batches.
if new_job.start_time is None:
new_job.start_time = start_time🧰 Tools
🪛 Ruff (0.14.11)
629-629: datetime.datetime.now() called without a tz argument
(DTZ005)
🤖 Prompt for AI Agents
In
`@components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py`
around lines 629 - 631, Add a brief inline comment above the conditional
assignment that sets new_job.start_time only when it's None to explain that this
preserves the original dispatch start_time across recursive/subsequent calls to
dispatch_job_and_update_db and prevents overwriting when the DB CAS check
(prev_status=QueryJobStatus.PENDING) has already advanced the job; target the
block using the symbols start_time, new_job.start_time, and
dispatch_job_and_update_db so maintainers understand the in-memory vs DB
consistency rationale.
Description
This PR fixes Issue #1874.
While initially reported in the WebUI, this bug affects any large search (that doesn't write to a network address directly) on a dataset with more than 16 archives.
Cause:
The issue stems from how the QueryScheduler handles sub-jobs for queries that require a reducer.
NetworkAddress Queries: For a query like
search.sh '*'that involves 85 archives, all tasks are dispatched in a single batch, and a start time is set when dispatching that batch.Reducer/WebUI/ResultsCache Queries: When a reducer is required (e.g.,
search.sh '*' --count), the scheduler breaks the query into batches (sub-jobs) of 16 archives. codeThe Bug: To dispatch reducer queries, the scheduler incorrectly reuses the simple query dispatching method
dispatch_job_and_update_db. Inside this method, the job'sstart_timeis overwritten withdatetime.now()every time a new batch is dispatched.The
start_timein database remains unaffected due to the CAS-like operation that only updates rows in aPENDINGstate.Fix:
There were several ways to approach this fix, such as implementing a dedicated dispatch function for sub-jobs, or setting the time when constructing a QueryJob
In this PR, we modify the existing dispatch function to ensure the start_time is only set if it has not been assigned previously. This mimics the CAS operation on the database.
Checklist
breaking change.
Validation performed
Duration is correctly stored in the database. WebUI displays the duration correctly.
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.