Fix pickle error in prime reward manager when prime in main_ppo.py by cgpeter96 · Pull Request #1296 · verl-project/verl

cgpeter96 · 2025-04-28T13:17:41Z

wuxibin89 · 2025-04-28T14:46:55Z

 async def parallel_compute_score_async(evaluation_func, completions, references, tasks, extra_info=None, num_processes=64):
    scores = []
-    with ProcessPoolExecutor(max_workers=num_processes) as executor:
+    with ThreadPoolExecutor(max_workers=num_processes) as executor:


ThreadPoolExecutor is not a good idea since compute_score is compute intensive, python GIL prevents it to use multi cores.

ThreadPoolExecutor is not a good idea since compute_score is compute intensive, python GIL prevents it to use multi cores.

Understood, I will update the code shortly.

patrik-bartak · 2025-04-30T12:37:47Z

I actually ran into the same issue as you, and I think your solution is unnecessarily complex. I also got the issue that the function is not pickleable - the issue is that the wrapped_fn if defined within a function, and not top level.

A simple solution that works for me is replacing the wrapped function with a partial over the raw function.

...
    def wrapped_fn(*args, **kwargs):
        return raw_fn(*args, **kwargs, **reward_kwargs)

    return wrapped_fn

->

...
    return partial(raw_fn, **reward_kwargs)

No other changes are needed. Do you agree?

I don't see what the point of wrapping the function is, other than passing in the reward_kwargs, which the partial already does. I think someone added it without testing it with the prime reward manager.

patrik-bartak · 2025-04-30T13:04:02Z

Also, the function get_custom_reward_fn is duplicated across main_dapo, main_eval, main_ppo. If you want I can make a PR to share the implementation of this function.

cgpeter96 · 2025-05-06T03:36:00Z

Also, the function get_custom_reward_fn is duplicated across main_dapo, main_eval, main_ppo. If you want I can make a PR to share the implementation of this function.

Of course, simple implementation is best :)

I actually ran into the same issue as you, and I think your solution is unnecessarily complex. I also got the issue that the function is not pickleable - the issue is that the wrapped_fn if defined within a function, and not top level.

A simple solution that works for me is replacing the wrapped function with a partial over the raw function.
...
    def wrapped_fn(*args, **kwargs):
        return raw_fn(*args, **kwargs, **reward_kwargs)

    return wrapped_fn
->
...
    return partial(raw_fn, **reward_kwargs)
No other changes are needed. Do you agree?

I don't see what the point of wrapping the function is, other than passing in the reward_kwargs, which the partial already does. I think someone added it without testing it with the prime reward manager.

YES! You are right. The simple solution is best

cgpeter96 · 2025-05-06T03:37:06Z

Also, the function get_custom_reward_fn is duplicated across main_dapo, main_eval, main_ppo. If you want I can make a PR to share the implementation of this function.

Yes!

cgpeter96 · 2025-05-15T11:46:21Z

@wuxibin89 Could you given me some feedback 😊. This pr will help someone who wanna use prime reward manager in ppo/grpo training but failed.

CLAassistant · 2025-05-21T08:21:55Z

All committers have signed the CLA.

vadimkantorov · 2025-06-24T22:53:18Z

Hi @wuxibin89 @vermouth1992 ! We also stumbled in this issue with a custom reward function defined in a sidekick python file: [Error] Task failed: Can't get local object 'get_custom_reward_fn.<locals>.wrapped_fn', completion: ...

We tried the partial(...) solution but then hit another bug: [Error] Task failed: Can't pickle <function compute_score at 0x7f23310b4ea0>: it's not the same object as custom_module.compute_score, completion: ...

We're calling verl trainer as follows:

python -m verl.trainer.main_ppo \
    custom_reward_function.path=/mnt/fs/verl/custom_function.py \
    custom_reward_function.name=compute_score \
    reward_model.reward_manager=prime \
    ...

Would you have any suggestions how to fix this?

zhourunlong · 2025-06-30T00:50:45Z

Hi @wuxibin89 @vermouth1992 ! We also stumbled in this issue with a custom reward function defined in a sidekick python file: [Error] Task failed: Can't get local object 'get_custom_reward_fn.<locals>.wrapped_fn', completion: ...

We tried the partial(...) solution but then hit another bug: [Error] Task failed: Can't pickle <function compute_score at 0x7f23310b4ea0>: it's not the same object as custom_module.compute_score, completion: ...

We're calling verl trainer as follows:
python -m verl.trainer.main_ppo \
    custom_reward_function.path=/mnt/fs/verl/custom_function.py \
    custom_reward_function.name=compute_score \
    reward_model.reward_manager=prime \
    ...
Would you have any suggestions how to fix this?

Meeting exactly the same issue.

george1459 · 2025-07-24T08:06:15Z

I ran into this issue as well (even with #2239). Here is my fix for the issue which worked for me.
Let me know if you guys are interested in having this as a PR.
@vadimkantorov @zhourunlong perhaps you guys can give this a try if interested.

warmsnow-sh · 2025-08-06T07:21:23Z

This code seems to be inconsistent with the latest version. Are there any latest solutions to this problem at present?

warmsnow-sh · 2025-08-06T08:03:33Z

I ran into this issue as well (even with #2239). Here is my fix for the issue which worked for me. Let me know if you guys are interested in having this as a PR. @vadimkantorov @zhourunlong perhaps you guys can give this a try if interested.

I tried your plan, but it reported an error
(TaskRunner pid=1138472) [Error] Task failed: 'list' object has no attribute 'get',
I'm still checking if there are any other potential issues with my code. I wonder why this problem(prime manager) hasn't been fixed for so long

wuxibin89 reviewed Apr 28, 2025

View reviewed changes

cgpeter96 force-pushed the main branch from 0143765 to 5e23822 Compare April 29, 2025 07:00

ZihengJiang added the status: review in process label Apr 29, 2025

cgpeter96 force-pushed the main branch from 85330d4 to 5249d47 Compare May 8, 2025 03:12

cgpeter96 changed the title ~~Fix pickle error in prime reward manager~~ Fix pickle error in prime reward manager when prime in main_ppo.py May 9, 2025

cgpeter96 added 4 commits May 21, 2025 16:56

fix pickle error in prime reward manager when grpo training

7bcd048

keep return of prime.py align with naive.py

93464b7

modifiy return

d2ff040

add comment

948f41b

cgpeter96 force-pushed the main branch from 2b15ac9 to 948f41b Compare May 21, 2025 08:58

vadimkantorov mentioned this pull request Jul 2, 2025

[cfg] fix: pickleing error in multiprocessing in the reward_fn #2239

Merged

george1459 mentioned this pull request Jul 24, 2025

Rollout reward evaluation is serial — how to parallelize LLM-based reward computation? #2236

Open

Conversation

cgpeter96 commented Apr 28, 2025

Uh oh!

wuxibin89 Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

cgpeter96 Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

patrik-bartak commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrik-bartak commented Apr 30, 2025

Uh oh!

cgpeter96 commented May 6, 2025

Uh oh!

cgpeter96 commented May 6, 2025

Uh oh!

cgpeter96 commented May 15, 2025

Uh oh!

CLAassistant commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadimkantorov commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhourunlong commented Jun 30, 2025

Uh oh!

george1459 commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

warmsnow-sh commented Aug 6, 2025

Uh oh!

warmsnow-sh commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

patrik-bartak commented Apr 30, 2025 •

edited

Loading

CLAassistant commented May 21, 2025 •

edited

Loading

vadimkantorov commented Jun 24, 2025 •

edited

Loading

george1459 commented Jul 24, 2025 •

edited

Loading