Skip to content

feat(huntsman): Add protobuf defition and Rust binding for scheduler service; Implement gRPC scheduler client for execution manager.#342

Merged
LinZhihao-723 merged 31 commits into
y-scope:mainfrom
sitaowang1998:grpc-scheduler
Jun 26, 2026
Merged

Conversation

@sitaowang1998

@sitaowang1998 sitaowang1998 commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR:

  • Adds gRPC proto for scheduler service for execution manager.
  • Moves shared proto types to common proto file.
  • Adds gRPC scheduler client in execution manager.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • GitHub workflows pass.

Summary by CodeRabbit

  • New Features
    • Added scheduler gRPC support, including heartbeat and shutdown.
    • Extended task polling to include previous assignment context.
  • Changes
    • Introduced shared protobuf types for task IDs and empty responses, applied across scheduler and storage APIs.
    • Expanded scheduler assignment responses to include scheduler ID and session details for follow-up storage calls.
  • Tests
    • Updated conversion/packing coverage for the new protobuf contracts and response unpacking.
  • Chores
    • Regenerated protobufs to include the new common and scheduler definitions.

@sitaowang1998 sitaowang1998 requested a review from a team as a code owner June 14, 2026 03:53
@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds shared common and scheduler protobuf definitions, migrates storage messages to shared types, updates proto-rust generation and conversions, and introduces a gRPC scheduler client in the execution manager.

Changes

Scheduler gRPC client and shared proto types

Layer / File(s) Summary
common.proto and scheduler.proto contracts
components/spider-proto/common/common.proto, components/spider-proto/scheduler/scheduler.proto
Introduces shared TaskId/Void messages and extends SchedulerService with NextTask, Heartbeat, and Shutdown plus their request and response messages.
storage.proto migration to shared types
components/spider-proto/storage/storage.proto, components/spider-storage/src/grpc.rs, components/spider-scheduler/src/storage_client/grpc.rs
Imports common.proto, replaces local storage TaskId and Void usage with shared types, removes the local declarations, updates the storage gRPC adapter signatures, and updates the scheduler storage client test input.
proto-rust exports, conversions, and unpacking
components/spider-proto-rust/build.rs, components/spider-proto-rust/src/lib.rs, components/spider-proto-rust/src/id.rs, components/spider-proto-rust/src/error.rs, components/spider-proto-rust/src/unpack/*
Expands proto generation, exports the new generated modules, switches TaskId conversions to common::TaskId, adds shared unpack helpers, adds ResponseUnpack, and updates scheduler response unpacking.
GrpcSchedulerClient implementation and wiring
components/spider-core/src/types/scheduler.rs, components/spider-execution-manager/src/client.rs, components/spider-execution-manager/src/client/scheduler.rs, components/spider-execution-manager/src/client/grpc/mod.rs, components/spider-execution-manager/src/client/grpc/scheduler.rs, components/spider-execution-manager/src/client/grpc/storage.rs
Introduces the shared SchedulerResponse type, re-exports scheduler client types, adds GrpcSchedulerClient and its gRPC call handling, and updates storage client task ID construction to the shared protobuf type.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • y-scope/spider#331: It changes the core TaskId shape that this PR’s common.TaskId conversion and unpacking code targets.
  • y-scope/spider#350: It consumes the new scheduler gRPC surface added here, including NextTask, Heartbeat, Shutdown, and TaskAssignmentRecord.
  • y-scope/spider#354: It touches the same storage gRPC client request construction paths that are updated here to use shared protobuf task IDs.

Suggested reviewers

  • LinZhihao-723
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title matches the main change: scheduler protobufs and the execution manager gRPC scheduler client.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
components/spider-execution-manager/src/client/grpc/scheduler.rs (1)

121-184: ⚡ Quick win

Consider adding test coverage for the NoTask variant and malformed task ID conversion.

The existing tests cover assignment success, missing result, and missing task ID, but two code paths remain untested:

  1. The NoTask variant (line 105) that returns Ok(None)
  2. The error path when TaskId::try_from fails due to a malformed task ID (e.g., common::TaskId { kind: None })

Adding these tests would improve confidence in the protocol validation logic.

📋 Suggested additional tests
+    #[test]
+    fn scheduler_response_to_result_returns_none_for_no_task() {
+        let response = NextTaskResponse {
+            result: Some(next_task_response::Result::NoTask(common::Void {})),
+        };
+
+        let result = scheduler_response_to_result(response)
+            .expect("scheduler response conversion should succeed");
+
+        assert!(result.is_none());
+    }
+
+    #[test]
+    fn scheduler_response_to_result_rejects_malformed_task_id() {
+        let response = NextTaskResponse {
+            result: Some(next_task_response::Result::Assignment(
+                SchedulerAssignment {
+                    job_id: 11,
+                    task_id: Some(common::TaskId { kind: None }),
+                    resource_group_id: 13,
+                    session_id: 17,
+                },
+            )),
+        };
+
+        let result = scheduler_response_to_result(response);
+
+        assert!(result.is_err());
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-execution-manager/src/client/grpc/scheduler.rs` around
lines 121 - 184, Add two new test cases to the tests module in scheduler.rs to
improve coverage of untested code paths. First, create a test that verifies
scheduler_response_to_result correctly returns Ok(None) when the NoTask variant
is provided in the response result. Second, create a test that verifies
scheduler_response_to_result returns an error when TaskId::try_from fails due to
a malformed task ID (such as a common::TaskId with kind set to None). Both tests
should follow the naming convention and assertion patterns of the existing test
functions like scheduler_response_to_result_rejects_missing_result and
scheduler_response_to_result_rejects_empty_assignment_task_id.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/spider-proto/common/common.proto`:
- Line 3: The proto compilation is failing because Buf cannot locate the module
root for the proto files in components/spider-proto/. To fix this, create a
buf.yaml configuration file in the components/spider-proto/ directory to declare
it as a Buf module, or alternatively create a buf.work.yaml at the repository
root to include components/spider-proto/ as a workspace module. This will allow
Buf to properly resolve proto imports and enable downstream files to
successfully import common.TaskId and common.Void from the common package
declaration.

---

Nitpick comments:
In `@components/spider-execution-manager/src/client/grpc/scheduler.rs`:
- Around line 121-184: Add two new test cases to the tests module in
scheduler.rs to improve coverage of untested code paths. First, create a test
that verifies scheduler_response_to_result correctly returns Ok(None) when the
NoTask variant is provided in the response result. Second, create a test that
verifies scheduler_response_to_result returns an error when TaskId::try_from
fails due to a malformed task ID (such as a common::TaskId with kind set to
None). Both tests should follow the naming convention and assertion patterns of
the existing test functions like
scheduler_response_to_result_rejects_missing_result and
scheduler_response_to_result_rejects_empty_assignment_task_id.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c217c818-67d2-4373-98f1-3da1177d00fb

📥 Commits

Reviewing files that changed from the base of the PR and between 42bbdb6 and c370dea.

⛔ Files ignored due to path filters (3)
  • components/spider-proto-rust/src/generated/common.rs is excluded by !**/generated/**
  • components/spider-proto-rust/src/generated/scheduler.rs is excluded by !**/generated/**
  • components/spider-proto-rust/src/generated/storage.rs is excluded by !**/generated/**
📒 Files selected for processing (11)
  • components/spider-execution-manager/src/client/grpc/mod.rs
  • components/spider-execution-manager/src/client/grpc/scheduler.rs
  • components/spider-execution-manager/src/client/grpc/storage.rs
  • components/spider-proto-rust/build.rs
  • components/spider-proto-rust/src/error.rs
  • components/spider-proto-rust/src/id.rs
  • components/spider-proto-rust/src/lib.rs
  • components/spider-proto/common/common.proto
  • components/spider-proto/scheduler/scheduler.proto
  • components/spider-proto/storage/storage.proto
  • components/spider-scheduler/src/storage_client/grpc.rs

Comment thread components/spider-proto/common/common.proto

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/spider-execution-manager/src/client/grpc/scheduler.rs`:
- Around line 54-60: The loop starting at line 54 repeatedly sends the same
prev_assignment on every iteration when calling next_task, causing unnecessary
resends of completion records and hot-polling of the scheduler without any
delays. Modify the loop to send prev_assignment only on the first request by
clearing it after the initial send (set it to None after the first iteration),
and add retry pacing by introducing a delay when a NoTask response is received
to prevent the tight unbounded loop from aggressively polling. Apply the same
changes to the similar code referenced at lines 66-69.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0ba65225-0809-409d-b29a-aaff099b460a

📥 Commits

Reviewing files that changed from the base of the PR and between c370dea and aea00b8.

⛔ Files ignored due to path filters (2)
  • components/spider-proto-rust/src/generated/scheduler.rs is excluded by !**/generated/**
  • components/spider-proto-rust/src/generated/storage.rs is excluded by !**/generated/**
📒 Files selected for processing (3)
  • components/spider-execution-manager/src/client/grpc/scheduler.rs
  • components/spider-proto/scheduler/scheduler.proto
  • components/spider-proto/storage/storage.proto

Comment on lines +54 to +60
loop {
let response = self
.client
.clone()
.next_task(scheduler::NextTaskRequest {
execution_manager_id: em_id.get(),
prev_assignment: prev_assignment.map(task_assignment_record_to_protocol),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid re-sending prev_assignment on every NoTask retry and add retry pacing.

Line 60 currently reuses prev_assignment inside an unbounded tight loop (Line 54), so a NoTask response can repeatedly send the same completion record and hot-poll the scheduler.

Suggested patch
     ) -> Result<SchedulerResponse, SchedulerError> {
+        let mut prev_assignment = prev_assignment;
         loop {
             let response = self
                 .client
                 .clone()
                 .next_task(scheduler::NextTaskRequest {
                     execution_manager_id: em_id.get(),
-                    prev_assignment: prev_assignment.map(task_assignment_record_to_protocol),
+                    prev_assignment: prev_assignment
+                        .take()
+                        .map(task_assignment_record_to_protocol),
                 })
                 .await
                 .map_err(to_transport_error)?
                 .into_inner();

             if let Some(assignment) = scheduler_response_to_result(response)? {
                 return Ok(assignment);
             }
+            tokio::time::sleep(std::time::Duration::from_millis(100)).await;
         }
     }

Also applies to: 66-69

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-execution-manager/src/client/grpc/scheduler.rs` around
lines 54 - 60, The loop starting at line 54 repeatedly sends the same
prev_assignment on every iteration when calling next_task, causing unnecessary
resends of completion records and hot-polling of the scheduler without any
delays. Modify the loop to send prev_assignment only on the first request by
clearing it after the initial send (set it to None after the first iteration),
and add retry pacing by introducing a delay when a NoTask response is received
to prevent the tight unbounded loop from aggressively polling. Apply the same
changes to the similar code referenced at lines 66-69.

Comment thread components/spider-proto/scheduler/scheduler.proto
Comment thread components/spider-proto-rust/src/unpack/mod.rs Outdated
Comment thread components/spider-proto-rust/build.rs Outdated
Comment thread components/spider-execution-manager/src/client/grpc/scheduler.rs Outdated
Comment thread components/spider-execution-manager/src/client/grpc/scheduler.rs Outdated
Comment thread components/spider-proto-rust/src/unpack/scheduler.rs Outdated
Comment thread tests/huntsman/em-runtime/tests/test_runtime.rs Outdated
Comment thread components/spider-execution-manager/src/runtime.rs Outdated

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in 38d0fd1: when a new error enum is added, it should follow the convention as the other enums:

  • I'm ok with #[error("next task response is missing its result")], but still prefer to be consistent with others.
  • But #[error("scheduler assignment is missing its task id")] is wrong. The error enum itself is general to a missing task ID. Task ID may be carried by many responses, not just the scheduler assignment. Thus, the error message should not include scheduler assignment.

@LinZhihao-723 LinZhihao-723 changed the title feat(spider-grpc): Add gRPC for scheduler service; Add gRPC scheduler client in execution manager. feat(huntsman): Add protobuf defition and Rust binding for scheduler service; Implement gRPC scheduler client for execution manager. Jun 26, 2026

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly modified the PR title. Notice that spider-grpc is not a valid field name since it doesn't correspond to any components in the system.

@LinZhihao-723 LinZhihao-723 merged commit 7af2859 into y-scope:main Jun 26, 2026
14 of 15 checks passed
@sitaowang1998 sitaowang1998 deleted the grpc-scheduler branch June 26, 2026 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants