Skip to content

Add RL example.#424

Merged
k82cn merged 4 commits intoxflops:mainfrom
k82cn:rl_example
Apr 28, 2026
Merged

Add RL example.#424
k82cn merged 4 commits intoxflops:mainfrom
k82cn:rl_example

Conversation

@k82cn
Copy link
Copy Markdown
Contributor

@k82cn k82cn commented Apr 28, 2026

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a distributed Reinforcement Learning example using the REINFORCE algorithm and the Flame Runner API, supporting various Gymnasium and MuJoCo environments. It also includes infrastructure updates to optimize Python package installation via uv and improves the dynamic discovery of shared libraries for the environment's LD_LIBRARY_PATH. Feedback focuses on improving the modularity of the RL example by extracting shared components into a separate module to avoid fragile imports during remote execution, and optimizing the performance of the discounted reward calculation.

Comment thread examples/rl/main.py Outdated
import numpy as np
import torch

from main import ENV_CONFIGS, create_policy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The collect_episode function, which is designed to run on remote executors, imports components directly from the main script. This creates a tight coupling and can be fragile, especially with code serialization and remote execution. It's generally better to avoid such imports from the main executable script within a distributed task.

To improve modularity and robustness, I recommend extracting shared components into a separate file (e.g., model.py). This file could contain:

  • EnvConfig dataclass
  • ENV_CONFIGS dictionary
  • DiscretePolicy and ContinuousPolicy classes
  • create_policy function

Both main.py and collect_episode can then import from this new module. This change will make the code cleaner and less prone to issues in a distributed environment.

Comment thread examples/rl/main.py Outdated
Comment on lines +147 to +151
discounted = []
R = 0
for r in reversed(rewards):
R = r + gamma * R
discounted.insert(0, R)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation for calculating discounted rewards uses list.insert(0, ...) inside a loop. This has a time complexity of O(n^2), where n is the episode length, which can be inefficient for long episodes.

You can improve the performance to O(n) by pre-allocating a list and filling it in reverse.

Suggested change
discounted = []
R = 0
for r in reversed(rewards):
R = r + gamma * R
discounted.insert(0, R)
discounted = [0.0] * len(rewards)
R = 0
for i in range(len(rewards) - 1, -1, -1):
R = rewards[i] + gamma * R
discounted[i] = R

k82cn added 3 commits April 28, 2026 16:49
…mize discounted rewards

- Extract EnvConfig, ENV_CONFIGS, DiscretePolicy, ContinuousPolicy, create_policy into model.py
- Update collect_episode to import from model instead of main (fixes fragile distributed imports)
- Optimize compute_discounted_rewards from O(n²) to O(n) using pre-allocated list
- Update README.md to document new file structure

Addresses PR review feedback from gemini-code-assist
Fixes build error: 'Multiple top-level modules discovered in a flat-layout'
after adding model.py alongside main.py
…ilures

- Add '|| true' to mkdir commands to prevent failure with 'set -e'
- Add '-r' flag to xargs to not run if input is empty
- Add empty line check in while loop
- Redirect xargs stderr to /dev/null
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@k82cn k82cn merged commit 1535ca2 into xflops:main Apr 28, 2026
7 checks passed
@k82cn k82cn deleted the rl_example branch April 28, 2026 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant