Memory safety for Nvidia GPU time-slicing

### Describe the feature request

Nvidia has introduced a feature called time-slicing on GPUs (see [here](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html) and [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/time-slicing-gpus-in-openshift.html)). However, this feature doesn't natively support memory-isolation between replicas- ```Unlike Multi-Instance GPU (MIG), there is no memory or fault-isolation between replicas, but for some workloads this is better than not being able to share at all.```.

As far as I can tell, ONNX doesn't currently have support for safely managing GPU memory while working with GPU time-slicing, as I've seen errors resulting from memory interference.  Is safely managing GPU memory while using GPU time-slicing something that folks have considered supporting for ONNX, or have I missed some existing support?

(See https://bruce-lee-ly.medium.com/nvidia-gpu-virtual-memory-management-7fdc4122226b for reference).

### Describe scenario use case

Being able to run multiple small services on a single GPU can lead to cost savings across a broad range of applications, so as to avoid renting more GPUs than necessary.  In my particular use case, it would cut the number of required GPUs by about 2/3rds.   This also indirectly has some minor environmental benefits, as it reduces the about of required electricity for computation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory safety for Nvidia GPU time-slicing #24943

Describe the feature request

Describe scenario use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory safety for Nvidia GPU time-slicing #24943

Description

Describe the feature request

Describe scenario use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions