Skip to content

Memory safety for Nvidia GPU time-slicing #24943

@Galadros

Description

@Galadros

Describe the feature request

Nvidia has introduced a feature called time-slicing on GPUs (see here and here). However, this feature doesn't natively support memory-isolation between replicas- Unlike Multi-Instance GPU (MIG), there is no memory or fault-isolation between replicas, but for some workloads this is better than not being able to share at all..

As far as I can tell, ONNX doesn't currently have support for safely managing GPU memory while working with GPU time-slicing, as I've seen errors resulting from memory interference. Is safely managing GPU memory while using GPU time-slicing something that folks have considered supporting for ONNX, or have I missed some existing support?

(See https://bruce-lee-ly.medium.com/nvidia-gpu-virtual-memory-management-7fdc4122226b for reference).

Describe scenario use case

Being able to run multiple small services on a single GPU can lead to cost savings across a broad range of applications, so as to avoid renting more GPUs than necessary. In my particular use case, it would cut the number of required GPUs by about 2/3rds. This also indirectly has some minor environmental benefits, as it reduces the about of required electricity for computation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestrequest for unsupported feature or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions