Open
Description
Hi, the following code causes GPU OOM on hopper with nvls enabled. I am using the latest main branch.
from mscclpp import Transport, TcpBootstrap, Communicator
from mscclpp._mscclpp import Context, RawGpuBuffer
import cupy as cp
cp.cuda.Device(0).use()
bootstrap = TcpBootstrap.create(0, 1)
bootstrap.initialize(bootstrap.create_unique_id(), 60)
comm = Communicator(bootstrap)
for i in range(100):
if i % 10 == 0:
print(f"{i=}", flush=True)
mem = RawGpuBuffer(2 ** 30)
reg = comm.register_memory(mem.data(), mem.bytes(), Transport.CudaIpc)
del reg, mem
Output:
i=0
i=10
i=20
i=30
i=40
i=50
i=60
i=70
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
mscclpp._mscclpp.CuError: (2, 'Call to result failed./.../mscclpp/src/gpu_utils.cc:128 (Cu failure: out of memory)')
The code is fine if memory is not registered. Could you please check if it can be reproduced on your side?
Metadata
Metadata
Assignees
Labels
No labels