Skip to content

[Feature] Automatically Find the best NIC for each GPU #542

Open
@zyksir

Description

@zyksir

For Now I am using mscclpp and I find that we have to explicitly name each devices for each GPU, while NCCL can find the best device for each GPU automatically. Most time we have to set MSCCLPP_HCA_DEVICES for different nodes to achieve the best performance. A feature about parsing the topo and auto-choosing the device will help a lot.

In addition, sometimes "libibverbs.so" cannot be found and only "libibverbs.so.1" can be found. For now, I use ln -sv /usr/lib/x86_64-linux-gnu/libibverbs.so.1 /usr/lib/x86_64-linux-gnu/libibverbs.so to fix this. But I think if you can search "libibverbs.so.1" in the mscclpp code, that will help a lot.

This is a related issue. sgl-project/sglang#6834

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions