Skip to content

feat: ReplayBuffer example.#441

Merged
k82cn merged 2 commits intoxflops:mainfrom
k82cn:rb_example
May 9, 2026
Merged

feat: ReplayBuffer example.#441
k82cn merged 2 commits intoxflops:mainfrom
k82cn:rb_example

Conversation

@k82cn
Copy link
Copy Markdown
Contributor

@k82cn k82cn commented May 9, 2026

No description provided.

Signed-off-by: Klaus Ma <klausm@nvidia.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a distributed reinforcement learning replay buffer example that leverages Flame's patch_object API for efficient data synchronization. It includes a collector service, a replay buffer implementation with custom deserialization, and updated gRPC configurations in both the Rust cache server and Python SDK to support larger message sizes. Review feedback identified potential security risks in setting unlimited message sizes, suggesting a 2GB cap instead to prevent resource exhaustion, and pointed out a mathematical inaccuracy in the average reward calculation within the example script.

Comment thread object_cache/src/cache.rs
Comment on lines +1288 to +1292
.add_service(
FlightServiceServer::new(server)
.max_decoding_message_size(usize::MAX)
.max_encoding_message_size(usize::MAX),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Setting the maximum message size to usize::MAX effectively disables all size limits. While necessary for large reinforcement learning buffers, using an unlimited value can expose the server to Out-Of-Memory (OOM) issues or denial-of-service attacks if malformed or excessively large packets are received. It is safer to use a very large but finite limit (e.g., 2GB).

Suggested change
.add_service(
FlightServiceServer::new(server)
.max_decoding_message_size(usize::MAX)
.max_encoding_message_size(usize::MAX),
)
.add_service(
FlightServiceServer::new(server)
.max_decoding_message_size(2 * 1024 * 1024 * 1024)
.max_encoding_message_size(2 * 1024 * 1024 * 1024),
)

Comment on lines +268 to +271
GRPC_OPTIONS = [
("grpc.max_send_message_length", -1),
("grpc.max_receive_message_length", -1),
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Using -1 for gRPC message length options disables the limits entirely. Similar to the server-side change, it is recommended to use a large but bounded value to prevent potential stability issues or memory exhaustion.

Suggested change
GRPC_OPTIONS = [
("grpc.max_send_message_length", -1),
("grpc.max_receive_message_length", -1),
]
GRPC_OPTIONS = [
("grpc.max_send_message_length", 2 * 1024 * 1024 * 1024),
("grpc.max_receive_message_length", 2 * 1024 * 1024 * 1024),
]

Comment thread examples/rl/replay_buffer/main.py Outdated
stats = buffer_svc.state().get()
total_size = stats["size"]
total_added = stats["total_added"]
avg_reward = sum(r["avg_reward"] for r in collect_results) / num_collections
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line calculates an average of averages, which can be mathematically incorrect if the number of episodes completed by each collector varies. A more accurate approach is to calculate a weighted average based on the total episodes seen by each collector.

Suggested change
avg_reward = sum(r["avg_reward"] for r in collect_results) / num_collections
total_episodes = sum(r["episode_count"] for r in collect_results)
avg_reward = sum(r["avg_reward"] * r["episode_count"] for r in collect_results) / max(1, total_episodes)

@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
object_cache/src/cache.rs 0.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

- Use weighted average for avg_reward calculation based on episode count
- Add codecov.yml to disable patch coverage check while keeping project coverage
@k82cn k82cn merged commit 56579ec into xflops:main May 9, 2026
6 of 7 checks passed
@k82cn k82cn deleted the rb_example branch May 9, 2026 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant