feat(huntsman): Add protobuf defition and Rust binding for scheduler service; Implement gRPC scheduler client for execution manager.#342
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds shared ChangesScheduler gRPC client and shared proto types
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
components/spider-execution-manager/src/client/grpc/scheduler.rs (1)
121-184: ⚡ Quick winConsider adding test coverage for the NoTask variant and malformed task ID conversion.
The existing tests cover assignment success, missing result, and missing task ID, but two code paths remain untested:
- The
NoTaskvariant (line 105) that returnsOk(None)- The error path when
TaskId::try_fromfails due to a malformed task ID (e.g.,common::TaskId { kind: None })Adding these tests would improve confidence in the protocol validation logic.
📋 Suggested additional tests
+ #[test] + fn scheduler_response_to_result_returns_none_for_no_task() { + let response = NextTaskResponse { + result: Some(next_task_response::Result::NoTask(common::Void {})), + }; + + let result = scheduler_response_to_result(response) + .expect("scheduler response conversion should succeed"); + + assert!(result.is_none()); + } + + #[test] + fn scheduler_response_to_result_rejects_malformed_task_id() { + let response = NextTaskResponse { + result: Some(next_task_response::Result::Assignment( + SchedulerAssignment { + job_id: 11, + task_id: Some(common::TaskId { kind: None }), + resource_group_id: 13, + session_id: 17, + }, + )), + }; + + let result = scheduler_response_to_result(response); + + assert!(result.is_err()); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/spider-execution-manager/src/client/grpc/scheduler.rs` around lines 121 - 184, Add two new test cases to the tests module in scheduler.rs to improve coverage of untested code paths. First, create a test that verifies scheduler_response_to_result correctly returns Ok(None) when the NoTask variant is provided in the response result. Second, create a test that verifies scheduler_response_to_result returns an error when TaskId::try_from fails due to a malformed task ID (such as a common::TaskId with kind set to None). Both tests should follow the naming convention and assertion patterns of the existing test functions like scheduler_response_to_result_rejects_missing_result and scheduler_response_to_result_rejects_empty_assignment_task_id.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@components/spider-proto/common/common.proto`:
- Line 3: The proto compilation is failing because Buf cannot locate the module
root for the proto files in components/spider-proto/. To fix this, create a
buf.yaml configuration file in the components/spider-proto/ directory to declare
it as a Buf module, or alternatively create a buf.work.yaml at the repository
root to include components/spider-proto/ as a workspace module. This will allow
Buf to properly resolve proto imports and enable downstream files to
successfully import common.TaskId and common.Void from the common package
declaration.
---
Nitpick comments:
In `@components/spider-execution-manager/src/client/grpc/scheduler.rs`:
- Around line 121-184: Add two new test cases to the tests module in
scheduler.rs to improve coverage of untested code paths. First, create a test
that verifies scheduler_response_to_result correctly returns Ok(None) when the
NoTask variant is provided in the response result. Second, create a test that
verifies scheduler_response_to_result returns an error when TaskId::try_from
fails due to a malformed task ID (such as a common::TaskId with kind set to
None). Both tests should follow the naming convention and assertion patterns of
the existing test functions like
scheduler_response_to_result_rejects_missing_result and
scheduler_response_to_result_rejects_empty_assignment_task_id.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c217c818-67d2-4373-98f1-3da1177d00fb
⛔ Files ignored due to path filters (3)
components/spider-proto-rust/src/generated/common.rsis excluded by!**/generated/**components/spider-proto-rust/src/generated/scheduler.rsis excluded by!**/generated/**components/spider-proto-rust/src/generated/storage.rsis excluded by!**/generated/**
📒 Files selected for processing (11)
components/spider-execution-manager/src/client/grpc/mod.rscomponents/spider-execution-manager/src/client/grpc/scheduler.rscomponents/spider-execution-manager/src/client/grpc/storage.rscomponents/spider-proto-rust/build.rscomponents/spider-proto-rust/src/error.rscomponents/spider-proto-rust/src/id.rscomponents/spider-proto-rust/src/lib.rscomponents/spider-proto/common/common.protocomponents/spider-proto/scheduler/scheduler.protocomponents/spider-proto/storage/storage.protocomponents/spider-scheduler/src/storage_client/grpc.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@components/spider-execution-manager/src/client/grpc/scheduler.rs`:
- Around line 54-60: The loop starting at line 54 repeatedly sends the same
prev_assignment on every iteration when calling next_task, causing unnecessary
resends of completion records and hot-polling of the scheduler without any
delays. Modify the loop to send prev_assignment only on the first request by
clearing it after the initial send (set it to None after the first iteration),
and add retry pacing by introducing a delay when a NoTask response is received
to prevent the tight unbounded loop from aggressively polling. Apply the same
changes to the similar code referenced at lines 66-69.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0ba65225-0809-409d-b29a-aaff099b460a
⛔ Files ignored due to path filters (2)
components/spider-proto-rust/src/generated/scheduler.rsis excluded by!**/generated/**components/spider-proto-rust/src/generated/storage.rsis excluded by!**/generated/**
📒 Files selected for processing (3)
components/spider-execution-manager/src/client/grpc/scheduler.rscomponents/spider-proto/scheduler/scheduler.protocomponents/spider-proto/storage/storage.proto
| loop { | ||
| let response = self | ||
| .client | ||
| .clone() | ||
| .next_task(scheduler::NextTaskRequest { | ||
| execution_manager_id: em_id.get(), | ||
| prev_assignment: prev_assignment.map(task_assignment_record_to_protocol), |
There was a problem hiding this comment.
Avoid re-sending prev_assignment on every NoTask retry and add retry pacing.
Line 60 currently reuses prev_assignment inside an unbounded tight loop (Line 54), so a NoTask response can repeatedly send the same completion record and hot-poll the scheduler.
Suggested patch
) -> Result<SchedulerResponse, SchedulerError> {
+ let mut prev_assignment = prev_assignment;
loop {
let response = self
.client
.clone()
.next_task(scheduler::NextTaskRequest {
execution_manager_id: em_id.get(),
- prev_assignment: prev_assignment.map(task_assignment_record_to_protocol),
+ prev_assignment: prev_assignment
+ .take()
+ .map(task_assignment_record_to_protocol),
})
.await
.map_err(to_transport_error)?
.into_inner();
if let Some(assignment) = scheduler_response_to_result(response)? {
return Ok(assignment);
}
+ tokio::time::sleep(std::time::Duration::from_millis(100)).await;
}
}Also applies to: 66-69
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@components/spider-execution-manager/src/client/grpc/scheduler.rs` around
lines 54 - 60, The loop starting at line 54 repeatedly sends the same
prev_assignment on every iteration when calling next_task, causing unnecessary
resends of completion records and hot-polling of the scheduler without any
delays. Modify the loop to send prev_assignment only on the first request by
clearing it after the initial send (set it to None after the first iteration),
and add retry pacing by introducing a delay when a NoTask response is received
to prevent the tight unbounded loop from aggressively polling. Apply the same
changes to the similar code referenced at lines 66-69.
# Conflicts: # components/spider-proto/storage/storage.proto
LinZhihao-723
left a comment
There was a problem hiding this comment.
Changes in 38d0fd1: when a new error enum is added, it should follow the convention as the other enums:
- I'm ok with
#[error("next task response is missing its result")], but still prefer to be consistent with others. - But
#[error("scheduler assignment is missing its task id")]is wrong. The error enum itself is general to a missing task ID. Task ID may be carried by many responses, not just the scheduler assignment. Thus, the error message should not includescheduler assignment.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Directly modified the PR title. Notice that spider-grpc is not a valid field name since it doesn't correspond to any components in the system.
Description
This PR:
Checklist
breaking change.
Validation performed
Summary by CodeRabbit