Issue search results

Filter by

128 results

(65 ms)inmicrosoft/mscclpp (press backspace or delete to remove)

microsoft/mscclpp
[Feature] Support on Multi-Node Nvlink

I found that NCCL has support blackwell GPU with multi-node Nvlink since nccl 2.25, is this feature supported on mscclpp? this is a related blog: https://docs.nvidia.com/multi-node-nvlink-systems/multi-node-tuning-guide/nccl.html ...

zyksir

Opened
3 days ago

#548

microsoft/mscclpp
[Feature] Automatically Find the best NIC for each GPU

For Now I am using mscclpp and I find that we have to explicitly name each devices for each GPU, while NCCL can find the best device for each GPU automatically. Most time we have to set MSCCLPP_HCA_DEVICES ...

zyksir

Opened
9 days ago

#542

microsoft/mscclpp
[Feature] ProxyService avoid keeping ownership of the RegisteredMemory

Hi, the current implementation of proxy service keeps ownership of RegisteredMemory: https://github.com/microsoft/mscclpp/blob/c184485808aeaaec5625bc97905c819db1514184/src/port_channel.cc#L41 Thus, any ...

liangyuRain

Opened
10 days ago

#540

microsoft/mscclpp
[Bug] string rep of torch dtype in `to_dlpack`

https://github.com/microsoft/mscclpp/blob/2b9b18d562fc9eea4574927e3ceb7f40b0b20d63/python/mscclpp/gpu_utils_py.cpp#L27 The correct string representations of torch.float and torch.int seem to be torch.float32 ...

liangyuRain

Opened
16 days ago

#536

microsoft/mscclpp
[Bug] Cannot bind multiple memory buffers to a single `NvlsConnection`

Hi, we observe a strange behavior with NvlsConnection. When we bind two memory buffers to the same NvlsConnection, the DeviceMulticastPointer returned for the second buffer actually points to the first ...

liangyuRain

Opened
17 days ago

#535

microsoft/mscclpp
[Bug] RegisteredMemory not properly destroyed

Hi, the following code causes GPU OOM on hopper with nvls enabled. I am using the latest main branch. from mscclpp import Transport, TcpBootstrap, Communicator from mscclpp._mscclpp import Context, RawGpuBuffer ...

liangyuRain

Opened
18 days ago

#533

microsoft/mscclpp
[Bug] About Performance drop in Cpp API

I am trying to implement a one-shot allreduce in sglang. you can see my code in this PR. I want to use the algorithm used in allreduce_bench.py. To fit everything in sgl-kernel, I rewrite the API in cpp. ...

zyksir

Opened
20 days ago

#531

microsoft/mscclpp
[Bug] memchan.put function cannot ensure completion of data writing

Hi, we observe the implement of memChan. put may have bug. eg memChan.put(0, 0, nElem * sizeof(int), threadIdx.x, blockDim.x); if (threadIdx.x == 0) memChan.signal(); This operation put does not include ...

qishilu

Opened
on Apr 29

#517

microsoft/mscclpp
[Bug] `cudaMemcpyAsync` blocks proxy thread making program hang

Hi, we observe that when many kernels are pushed to launch queue, the cudaMemcpyAsync used in flushing fifo can get blocked due to full launch queue, which in turn blocks the whole proxy thread? This can ...

liangyuRain

Opened
on Apr 27

#516

microsoft/mscclpp
[Feature]Supporting cudaMemcpyBatchAsync for PortChannels

When issuing multiple sends to a PortChannel, the Memcpy kernel launch overhead may lead to bad performance for small message sizes. For example, using MemChannel is much faster than PortChannel for small ...

cubele

Opened
on Apr 16

#504

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Press the

key to activate the search input again and adjust your query.

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Restrict your search to the title by using the in:title qualifier.

Languages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter by

State

Advanced

microsoft/mscclpp
[Feature] Support on Multi-Node Nvlink

microsoft/mscclpp
[Feature] Automatically Find the best NIC for each GPU

microsoft/mscclpp
[Feature] ProxyService avoid keeping ownership of the RegisteredMemory

microsoft/mscclpp
[Bug] string rep of torch dtype in `to_dlpack`

microsoft/mscclpp
[Bug] Cannot bind multiple memory buffers to a single `NvlsConnection`

microsoft/mscclpp
[Bug] RegisteredMemory not properly destroyed

microsoft/mscclpp
[Bug] About Performance drop in Cpp API

microsoft/mscclpp
[Bug] memchan.put function cannot ensure completion of data writing

microsoft/mscclpp
[Bug] `cudaMemcpyAsync` blocks proxy thread making program hang

microsoft/mscclpp
[Feature]Supporting cudaMemcpyBatchAsync for PortChannels

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.

issues Search Results · repo:microsoft/mscclpp language:C++

Filter by

State

Advanced

128 results

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.