Skip to content

feat(rpc): implement WatchNode bidirectional streaming service (RFE370)#371

Merged
k82cn merged 1 commit into
xflops:mainfrom
k82cn:flm_370
Mar 13, 2026
Merged

feat(rpc): implement WatchNode bidirectional streaming service (RFE370)#371
k82cn merged 1 commit into
xflops:mainfrom
k82cn:flm_370

Conversation

@k82cn
Copy link
Copy Markdown
Contributor

@k82cn k82cn commented Mar 13, 2026

  • Add WatchNode gRPC service with bidirectional streaming for executor status updates
  • Implement WatchRegistry in session_manager for subscription management
  • Add stream_handler in executor_manager for handling streaming connections
  • Remove polling mode, enforce streaming-only communication
  • Refactor to send Executor directly in WatchNodeResponse (remove ExecutorAction)
  • Add server-side heartbeat timeout to prevent resource leaks
  • Fix client-side StreamHandler to handle errors internally as self-recovering task
  • Add comprehensive test plan and design documentation

Fixes #370

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the real-time capabilities of the system by introducing a new bidirectional streaming gRPC service. This service replaces an inefficient polling mechanism, aiming to drastically reduce latency and server load by pushing updates to clients as they occur. The architectural shift involves comprehensive changes across both the backend and client components, focusing on a more reactive and efficient communication model for node and executor status. The design prioritizes a state-based approach for updates, empowering clients to intelligently process information and adapt to changes.

Highlights

  • New Bidirectional Streaming Service: Implemented a WatchNode gRPC service with bidirectional streaming to provide real-time executor status updates, replacing the previous polling mechanism.
  • Streaming Enforcement: Removed the polling mode entirely, making streaming the mandatory communication method for all clients to ensure low latency and reduced server load.
  • State-Based Executor Updates: Refactored the WatchNodeResponse to send raw Executor objects directly, allowing clients to derive actions (create, update, delete) by comparing the received state with their local cache, rather than relying on an ExecutorAction enum.
  • Client-Side Stream Management: Introduced a StreamHandler in the executor_manager to manage the client-side aspects of the streaming connection, including registration, periodic heartbeats, and robust reconnection logic with exponential backoff.
  • Server-Side Stream Registry: Implemented a WatchRegistry in the session_manager to efficiently manage active WatchNode subscriptions and dispatch executor lifecycle notifications to relevant connected nodes.
Changelog
  • docs/designs/RFE17-watch-node/CHANGELOG.md
    • Added a changelog document for the RFE17 WatchNode streaming service, detailing design decisions and changes.
    • Initial HLD document (FS.md) for WatchNode streaming service.
    • Removed separate backend.proto.proposed file; proto definitions merged into FS.md.
    • Updated FS.md to follow standard design template.
    • Removed polling mode and configuration; streaming is now mandatory for all clients.
    • Clarified state-based design: HLD now explicitly states that raw Executor objects are returned (no ExecutorAction enum). Client derives actions from executor state.
    • Bidirectional streaming chosen to allow client heartbeats.
    • Heartbeat mechanism: 5-second interval with 15-second timeout for connection health.
    • Mandatory streaming: Removed polling fallback to simplify architecture and enforce low latency.
    • Reconnection strategy: Exponential backoff for stream recovery.
    • State-based notifications: Server sends raw Executor objects instead of action-wrapped messages.
    • Created RFE17-watch-node design directory.
    • Initial design document with motivation, gRPC streaming API specification, architecture diagrams, sequence diagrams, data structure definitions, and use cases.
Activity
  • The pull request introduces a new feature (feat(rpc)) for implementing the WatchNode bidirectional streaming service, as indicated by the title.
  • The author k82cn has provided a detailed description outlining the key changes and their purpose.
  • New design and test plan documents (docs/designs/RFE17-watch-node/FS.md, docs/designs/RFE17-watch-node/TEST_PLAN.md) have been added, indicating thorough planning and testing considerations.
  • Core RPC definitions have been updated (rpc/protos/backend.proto) to include the new WatchNode service and its associated messages.
  • The executor_manager client has been updated (executor_manager/src/client.rs, executor_manager/src/main.rs, executor_manager/src/manager.rs, executor_manager/src/stream_handler.rs) to integrate with the new streaming API, including a new StreamHandler module.
  • The session_manager backend has been significantly refactored (session_manager/src/apiserver/backend.rs, session_manager/src/apiserver/mod.rs, session_manager/src/controller/mod.rs, session_manager/src/model/mod.rs, session_manager/src/model/watch.rs, session_manager/src/storage/mod.rs, session_manager/src/storage/node_tests.rs) to implement the WatchNode service, manage stream registrations, and notify clients of executor changes.
  • New test files (session_manager/src/storage/node_tests.rs) have been added to verify node status update and data preservation logic related to heartbeats.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-designed feature to replace polling with a bidirectional gRPC streaming service (WatchNode) for executor status updates. The changes are comprehensive, including design documents, a test plan, and modifications across the client and server to adopt the new streaming approach. The implementation of WatchRegistry for managing streams and the refactoring of the ExecutorManager to be event-driven are well-executed. However, I've identified two high-severity issues: a potential resource leak on the server due to a missing stream timeout, and a bug in the client's async task handling that will cause it to block indefinitely. Addressing these will be crucial for the stability and correctness of the new service.

Comment thread executor_manager/src/manager.rs Outdated
Comment thread session_manager/src/apiserver/backend.rs Outdated
…s#370)

- Add WatchNode gRPC service with bidirectional streaming for executor status updates
- Implement WatchRegistry in session_manager for subscription management
- Add stream_handler in executor_manager for handling streaming connections
- Remove polling mode, enforce streaming-only communication
- Refactor to send Executor directly in WatchNodeResponse (remove ExecutorAction)
- Add server-side heartbeat timeout to prevent resource leaks
- Fix client-side StreamHandler to handle errors internally as self-recovering task
- Add comprehensive test plan and design documentation

Fixes xflops#370

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@k82cn k82cn changed the title feat(rpc): implement WatchNode bidirectional streaming service (RFE17) feat(rpc): implement WatchNode bidirectional streaming service (RFE370) Mar 13, 2026
@k82cn k82cn merged commit 7e2ab2e into xflops:main Mar 13, 2026
5 checks passed
@k82cn k82cn deleted the flm_370 branch March 13, 2026 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce watch_node to sync up node/executors

1 participant