Add RL example. by k82cn · Pull Request #424 · xflops/flame

k82cn · 2026-04-28T08:36:37Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces a distributed Reinforcement Learning example using the REINFORCE algorithm and the Flame Runner API, supporting various Gymnasium and MuJoCo environments. It also includes infrastructure updates to optimize Python package installation via uv and improves the dynamic discovery of shared libraries for the environment's LD_LIBRARY_PATH. Feedback focuses on improving the modularity of the RL example by extracting shared components into a separate module to avoid fragile imports during remote execution, and optimizing the performance of the discounted reward calculation.

gemini-code-assist · 2026-04-28T08:40:15Z

+    import numpy as np
+    import torch
+
+    from main import ENV_CONFIGS, create_policy


The collect_episode function, which is designed to run on remote executors, imports components directly from the main script. This creates a tight coupling and can be fragile, especially with code serialization and remote execution. It's generally better to avoid such imports from the main executable script within a distributed task.

To improve modularity and robustness, I recommend extracting shared components into a separate file (e.g., model.py). This file could contain:

EnvConfig dataclass

ENV_CONFIGS dictionary

DiscretePolicy and ContinuousPolicy classes

create_policy function

Both main.py and collect_episode can then import from this new module. This change will make the code cleaner and less prone to issues in a distributed environment.

gemini-code-assist · 2026-04-28T08:40:15Z

+    discounted = []
+    R = 0
+    for r in reversed(rewards):
+        R = r + gamma * R
+        discounted.insert(0, R)


The current implementation for calculating discounted rewards uses list.insert(0, ...) inside a loop. This has a time complexity of O(n^2), where n is the episode length, which can be inefficient for long episodes.

You can improve the performance to O(n) by pre-allocating a list and filling it in reverse.

Suggested change

discounted = []

R = 0

for r in reversed(rewards):

R = r + gamma * R

discounted.insert(0, R)

discounted = [0.0] * len(rewards)

R = 0

for i in range(len(rewards) - 1, -1, -1):

R = rewards[i] + gamma * R

discounted[i] = R

…mize discounted rewards - Extract EnvConfig, ENV_CONFIGS, DiscretePolicy, ContinuousPolicy, create_policy into model.py - Update collect_episode to import from model instead of main (fixes fragile distributed imports) - Optimize compute_discounted_rewards from O(n²) to O(n) using pre-allocated list - Update README.md to document new file structure Addresses PR review feedback from gemini-code-assist

Fixes build error: 'Multiple top-level modules discovered in a flat-layout' after adding model.py alongside main.py

…ilures - Add '|| true' to mkdir commands to prevent failure with 'set -e' - Add '-r' flag to xargs to not run if input is empty - Add empty line check in while loop - Redirect xargs stderr to /dev/null

codecov · 2026-04-28T09:21:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Add RL example.

63dde2a

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

k82cn added 3 commits April 28, 2026 16:49

fix(examples/rl): explicitly configure py-modules for setuptools

ae241b7

Fixes build error: 'Multiple top-level modules discovered in a flat-layout' after adding model.py alongside main.py

fix(flmadm): make flmenv.sh robust to empty find results and mkdir fa…

5b93995

…ilures - Add '|| true' to mkdir commands to prevent failure with 'set -e' - Add '-r' flag to xargs to not run if input is empty - Add empty line check in while loop - Redirect xargs stderr to /dev/null

k82cn merged commit 1535ca2 into xflops:main Apr 28, 2026
7 checks passed

k82cn deleted the rl_example branch April 28, 2026 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RL example.#424

Add RL example.#424
k82cn merged 4 commits intoxflops:mainfrom
k82cn:rl_example

k82cn commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

codecov Bot commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

k82cn commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 28, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant