-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Benchmark] Support benchmark throughput for external launcher DP #25913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Benchmark] Support benchmark throughput for external launcher DP #25913
Conversation
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for data parallel throughput benchmarking when using an external launcher like torchrun
. It adds a new function to filter and distribute benchmark requests across data parallel ranks and updates argument validation to permit this new mode. My review identifies a critical issue in the new request filtering logic where a lack of validation on world_size
and data_parallel_size
could lead to a ZeroDivisionError
or incorrect rank assignments. I've provided a code suggestion to add the necessary checks and make the implementation more robust.
global_rank = int(os.environ["RANK"]) | ||
world_size = int(os.environ["WORLD_SIZE"]) | ||
data_parallel_rank = global_rank // (world_size // data_parallel_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation of data_parallel_rank
is susceptible to a ZeroDivisionError
if data_parallel_size
is greater than world_size
. Additionally, if world_size
is not divisible by data_parallel_size
, the calculated data_parallel_rank
can be incorrect for some ranks, potentially leading to out-of-bounds errors or incorrect behavior. It's crucial to add validation for these conditions to ensure the benchmark runs robustly.
global_rank = int(os.environ["RANK"]) | |
world_size = int(os.environ["WORLD_SIZE"]) | |
data_parallel_rank = global_rank // (world_size // data_parallel_size) | |
global_rank = int(os.environ["RANK"]) | |
world_size = int(os.environ["WORLD_SIZE"]) | |
if data_parallel_size > world_size: | |
raise ValueError( | |
f"data_parallel_size ({data_parallel_size}) cannot be larger than " | |
f"world_size ({world_size}).") | |
if world_size % data_parallel_size != 0: | |
raise ValueError( | |
f"world_size ({world_size}) must be divisible by " | |
f"data_parallel_size ({data_parallel_size}).") | |
model_parallel_size = world_size // data_parallel_size | |
data_parallel_rank = global_rank // model_parallel_size |
…lm-project#25913) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
…5913) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…lm-project#25913) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Purpose
Add throughput benchmark for DP with external launcher mode. This is benchmark the throughput for integration with RL frameworks.
Test Plan
Test Result
on 2xH100s.
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.