Skip to content

[Bug] Cannot bind multiple memory buffers to a single NvlsConnection #535

Open
@liangyuRain

Description

@liangyuRain

Hi, we observe a strange behavior with NvlsConnection. When we bind two memory buffers to the same NvlsConnection, the DeviceMulticastPointer returned for the second buffer actually points to the first buffer. In other words, when binding multiple buffers to a singleNvlsConnection, the DeviceMulticastPointer returned always points to the first buffer that was bound.

We think the problem is at the following line:

MSCCLPP_CUTHROW(cuMemMap((CUdeviceptr)(mcPtr), devBuffSize, 0, mcHandle_, 0));

The cuMemMap always maps mcPtr to offset 0 of mcHandle_, which is the offset of the first bound memory buffer, rather than the newly allocated offset above. However, cuda documentation (link) requires the offset to be 0 right now. I guess it is technically impossible to reuse mcHandle (and thus NvlsConnection) for multiple memory buffers. Do you think this is correct?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions