-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Closed
Labels
P0Issues that should be fixed in short orderIssues that should be fixed in short ordercommunity-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CoreenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilitygpu-objects
Description
Description
Similar to compiled graphs, the driver should order all collective calls to avoid deadlocks.
Example 1:
- Avoid passing tensors within the same actor using NCCL. Instead, we should access the in-actor store directly.
Example 2: Both actors are single-threaded and synchronous. If t1_1
is the input for t2_2
and t1_2
is the input for t2_1
, both use NCCL to transfer data. In this case, we should call NCCL recv of t2_2
before t2_1
to avoid deadlock.
Actor 1: t1_1, t1_2
Actor 2: t2_1, t2_2
Note: Check if this will work if we only have one CUDA stream.
Use case
No response
Metadata
Metadata
Assignees
Labels
P0Issues that should be fixed in short orderIssues that should be fixed in short ordercommunity-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CoreenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilitygpu-objects