-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Use non-CPU device type and id for host accessible memory #25043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Use non-CPU device type and id for host accessible memory #25043
Conversation
… the link between CPU and the non-CPU device explicit. Update the data transfer implementations to check vendor id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
onnxruntime/test/providers/cuda/test_cases/cuda_execution_provider_test.cc
Outdated
Show resolved
Hide resolved
bytes, | ||
ACL_MEMCPY_HOST_TO_DEVICE, | ||
static_cast<aclrtStream>(stream.GetHandle()))); | ||
} else if (src_device.Type() == OrtDevice::NPU) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapped the order to match the non-async method above. i.e. check NPU -> NPU first then NPU -> CPU/host accessible.
The else if
in the original code is misleading as there are only 3 src options: NPU, host accessible and CPU, so it was really an else
.
This change applied in a couple of other places as well given implementations for data transfer have been copied from existing EPs when being created.
} | ||
} else if (src_device.Type() == OrtDevice::NPU) { | ||
if (dst_device.Type() == OrtDevice::CPU) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this if
was unnecessary as CPU in the old code covered CPU and host accessible, so there are no other destination options as we did npu to npu earlier.
if it was necessary data transfer would have been broken as there's no else
that would handle other cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
@@ -53,11 +53,11 @@ class CUDAExternalAllocator : public CUDAAllocator { | |||
// TODO: add a default constructor | |||
class CUDAPinnedAllocator : public IAllocator { | |||
public: | |||
CUDAPinnedAllocator(const char* name) | |||
CUDAPinnedAllocator(OrtDevice::DeviceId device_id, const char* name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to associate the pinned memory allocator with a specific device? I didn't see where this device_id is used by the allocator implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in CUDA but I believe the device is selected outside of cudaMallocHost via cudaSetDevice. I would have expected the device to have been set in general for the EP given an instance is only using one device. So they get the device id now, but there's probably nothing that needs to be done in the current setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I can see, cudaSetDevice is called for CUDAAllocator::Alloc/Free but not CUDAPinnedAllocator::Alloc/Free. is a CUDAPinnedAllocator instance intended to only be used with one device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA Pinned memory is host memory.
// so that would satisfy the alignment requirement of any other CPU consumers. | ||
// If one device is not on CPU, we default on the one that is CPU. | ||
auto determine_device = [](const OrtDevice& output_device, const OrtDevice& suggested_device) -> OrtDevice { | ||
if (output_device.Type() == OrtDevice::CPU && suggested_device.Type() == OrtDevice::CPU) { | ||
if (output_device.MemType() == OrtDevice::MemType::DEFAULT && | ||
suggested_device.MemType() == OrtDevice::MemType::DEFAULT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above comment mentions "default mem type". Is that still relevant or should it be updated?
@@ -399,8 +399,7 @@ void DumpTensor( | |||
// check tensor is on CPU before dumping it | |||
auto& tensor_location = tensor.Location(); | |||
if (tensor_location.device.Type() == OrtDevice::CPU || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensor_location.device.UsesCpuMemory()
: IAllocator( | ||
OrtMemoryInfo(name, OrtAllocatorType::OrtDeviceAllocator, | ||
OrtDevice(OrtDevice::CPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA, | ||
0 /*CPU device always with id 0*/), | ||
OrtDevice(OrtDevice::GPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OrtDevice(OrtDevice::CPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA, | ||
0 /*CPU device always with id 0*/), | ||
OrtDevice(OrtDevice::GPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA, | ||
device_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return OrtDevice(OrtDevice::CPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA, | ||
0 /*CPU device id always be 0*/); | ||
return OrtDevice(OrtDevice::GPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA, | ||
default_device_.Id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to revert the change
Description
Use the non-CPU device type and id for host accessible memory to make the link between CPU and the non-CPU device explicit.
Update the data transfer implementations to check vendor id.
Motivation and Context