Open
Description
Hi,
I was wondering at what point MSCCLPP deviates from NVSHMEM. In what scenarios would MSCCLPP be more beneficial than NVSHMEM?
I want to implement a broadcast operation using a proxy to avoid using SM cores, but it seems that with MSCCLPP, I would have to make (ngpus - 1) memcpy calls internally. Do you have any insights on whether MSCCLPP would be better than NVSHMEM in this scenario?
Metadata
Metadata
Assignees
Labels
No labels