Use non-CPU device type and id for host accessible memory #25043

skottmckay · 2025-06-13T04:34:09Z

Description

Use the non-CPU device type and id for host accessible memory to make the link between CPU and the non-CPU device explicit.

Update the data transfer implementations to check vendor id.

Motivation and Context

… the link between CPU and the non-CPU device explicit. Update the data transfer implementations to check vendor id.

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/providers/cann/cann_execution_provider.cc

onnxruntime/core/providers/cann/npu_data_transfer.cc

onnxruntime/core/providers/cuda/gpu_data_transfer.cc

onnxruntime/test/providers/cuda/test_cases/cuda_execution_provider_test.cc

skottmckay · 2025-06-13T04:39:14Z

onnxruntime/core/providers/cann/npu_data_transfer.cc

-                                            bytes,
-                                            ACL_MEMCPY_HOST_TO_DEVICE,
-                                            static_cast<aclrtStream>(stream.GetHandle())));
-    } else if (src_device.Type() == OrtDevice::NPU) {


Swapped the order to match the non-async method above. i.e. check NPU -> NPU first then NPU -> CPU/host accessible.
The else if in the original code is misleading as there are only 3 src options: NPU, host accessible and CPU, so it was really an else.

This change applied in a couple of other places as well given implementations for data transfer have been copied from existing EPs when being created.

skottmckay · 2025-06-13T04:41:08Z

onnxruntime/core/providers/cann/npu_data_transfer.cc

    }
-  } else if (src_device.Type() == OrtDevice::NPU) {
-    if (dst_device.Type() == OrtDevice::CPU) {


this if was unnecessary as CPU in the old code covered CPU and host accessible, so there are no other destination options as we did npu to npu earlier.

if it was necessary data transfer would have been broken as there's no else that would handle other cases.

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/providers/cuda/gpu_data_transfer.cc

…eviceInOrtDeviceForHostAccessibleMemory

edgchen1 · 2025-06-16T20:45:30Z

onnxruntime/core/providers/cuda/cuda_allocator.h

@@ -53,11 +53,11 @@ class CUDAExternalAllocator : public CUDAAllocator {
 // TODO: add a default constructor
 class CUDAPinnedAllocator : public IAllocator {
 public:
-  CUDAPinnedAllocator(const char* name)
+  CUDAPinnedAllocator(OrtDevice::DeviceId device_id, const char* name)


do we need to associate the pinned memory allocator with a specific device? I didn't see where this device_id is used by the allocator implementation.

I'm not an expert in CUDA but I believe the device is selected outside of cudaMallocHost via cudaSetDevice. I would have expected the device to have been set in general for the EP given an instance is only using one device. So they get the device id now, but there's probably nothing that needs to be done in the current setup.

as far as I can see, cudaSetDevice is called for CUDAAllocator::Alloc/Free but not CUDAPinnedAllocator::Alloc/Free. is a CUDAPinnedAllocator instance intended to only be used with one device?

CUDA Pinned memory is host memory.

adrianlizarraga · 2025-06-16T23:27:21Z

onnxruntime/core/framework/allocation_planner.cc

    // so that would satisfy the alignment requirement of any other CPU consumers.
    // If one device is not on CPU, we default on the one that is CPU.
    auto determine_device = [](const OrtDevice& output_device, const OrtDevice& suggested_device) -> OrtDevice {
-      if (output_device.Type() == OrtDevice::CPU && suggested_device.Type() == OrtDevice::CPU) {
-        if (output_device.MemType() == OrtDevice::MemType::DEFAULT &&
-            suggested_device.MemType() == OrtDevice::MemType::DEFAULT) {


The above comment mentions "default mem type". Is that still relevant or should it be updated?

HectorSVC · 2025-06-18T16:32:31Z

onnxruntime/core/framework/debug_node_inputs_outputs_utils.cc

@@ -399,8 +399,7 @@ void DumpTensor(
  // check tensor is on CPU before dumping it
  auto& tensor_location = tensor.Location();
  if (tensor_location.device.Type() == OrtDevice::CPU ||


tensor_location.device.UsesCpuMemory()

HectorSVC · 2025-06-18T21:19:30Z

onnxruntime/core/providers/cuda/cuda_allocator.h

      : IAllocator(
            OrtMemoryInfo(name, OrtAllocatorType::OrtDeviceAllocator,
-                          OrtDevice(OrtDevice::CPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA,
-                                    0 /*CPU device always with id 0*/),
+                          OrtDevice(OrtDevice::GPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA,


GPU

should still be CPU since it's host memory.

HectorSVC · 2025-06-19T23:47:19Z

onnxruntime/core/providers/cuda/cuda_allocator.h

-                          OrtDevice(OrtDevice::CPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA,
-                                    0 /*CPU device always with id 0*/),
+                          OrtDevice(OrtDevice::GPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA,
+                                    device_id),


device_id

should set device_id to 0 for host memory.

HectorSVC · 2025-06-19T23:49:57Z

onnxruntime/core/providers/cuda/cuda_execution_provider.cc

-    return OrtDevice(OrtDevice::CPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA,
-                     0 /*CPU device id always be 0*/);
+    return OrtDevice(OrtDevice::GPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA,
+                     default_device_.Id());


need to revert the change

Use the non-CPU device type and id for host accessible memory to make…

b0ca570

… the link between CPU and the non-CPU device explicit. Update the data transfer implementations to check vendor id.

github-actions bot reviewed Jun 13, 2025

View reviewed changes

skottmckay commented Jun 13, 2025

View reviewed changes

skottmckay added 5 commits June 13, 2025 16:28

Fix some builds

15ac02d

Add files that were updated but not saved

14f8068

Fix OrtDevice::MemType::DEFAULT

ea56c74

Fix type comparison

5baa690

Fix bad AI suggestion with hopefully correct AI suggestion

209af83

github-actions bot reviewed Jun 13, 2025

View reviewed changes

onnxruntime/core/providers/cuda/gpu_data_transfer.cc Outdated Show resolved Hide resolved

onnxruntime/core/providers/cuda/gpu_data_transfer.cc Outdated Show resolved Hide resolved

skottmckay added 5 commits June 14, 2025 08:17

Fix some tests and lint

ad901fe

Update code that used OrtDevice::CPU directly

2eabae3

Merge remote-tracking branch 'origin/main' into skottmckay/UseNonCpuD…

20714d2

…eviceInOrtDeviceForHostAccessibleMemory

Fix MIGraphX build

4df8842

Add CPU to CPU check in data transfer.

36618b6

edgchen1 reviewed Jun 16, 2025

View reviewed changes

adrianlizarraga reviewed Jun 16, 2025

View reviewed changes

HectorSVC reviewed Jun 18, 2025

View reviewed changes

HectorSVC reviewed Jun 19, 2025

View reviewed changes

Use non-CPU device type and id for host accessible memory #25043

Are you sure you want to change the base?

Use non-CPU device type and id for host accessible memory #25043

Conversation

skottmckay commented Jun 13, 2025

Description

Motivation and Context

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skottmckay Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skottmckay Jun 13, 2025 •

edited

Loading