[RFC] Routing-aware replay recipe and diagnostics for MoE RL #6805
kaining-never-stop
started this conversation in
RFC
Replies: 1 comment
-
|
I prepared a minimal draft implementation in verl-recipe for easier review: verl-project/verl-recipe#114 The draft only includes CPU-testable utilities, README, REQUIRED_VERL.txt, a synthetic demo, and unit tests. It does not modify verl core. Happy to rename the recipe or reduce the scope if maintainers prefer a different direction. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
[RFC] Routing-aware replay recipe and diagnostics for MoE RL
Summary
I would like to propose a new
verl-recipeentry namedrouting_aware_replayfor experimenting with routing-aware replay policies in MoE RL post-training.The first version is intended to be self-contained and non-invasive. It will provide CPU-testable utilities for Fisher-weighted replay masks, budget-matched replay controls, and routing replay diagnostics. The recipe is designed to complement existing router replay support in
verlwithout changing core trainer behavior.Motivation
MoE RL post-training often needs to balance two goals:
Existing router replay mechanisms are useful for stabilizing routing behavior, but a uniform replay policy does not always distinguish between experts that should be strongly preserved and experts that can be more freely adapted. For research and debugging, it is also useful to separate the effect of replay budget from the effect of replay selection policy.
This recipe focuses on that gap. It provides a small set of routing-aware replay utilities and diagnostics so users can compare different replay policies under matched budgets.
Proposed location
Repository:
Recipe directory:
The first version does not require changes to
verlcore. If maintainers find some interfaces generally useful, parts of the recipe can later be promoted intoverlcore as separate PRs.Proposed contents
Scope of the first PR
The first PR will focus on a minimal, reviewable subset:
REQUIRED_VERL.txt.Non-goals
The first PR will not:
verlcore trainer behavior;Relationship to existing router replay support
This proposal is not intended to replace existing router replay support. It is intended to provide a recipe-level space for comparing replay selection policies and collecting diagnostics around replay behavior.
In particular, the recipe will help distinguish:
Testing plan
The initial PR should be testable with:
The tests will validate:
Draft first PR
If the direction looks acceptable, I can start with a small PR titled:
The PR would add only the self-contained recipe utilities, README,
REQUIRED_VERL.txt, synthetic demo, and CPU tests.Open questions
verl-recipe, or should a small diagnostics interface be added toverlcore later?routing_aware_replayan acceptable recipe name?Beta Was this translation helpful? Give feedback.
All reactions