Feat/backend/list devices #1883

laggui · 2024-06-12T19:53:46Z

Checklist

Confirmed that run-checks all script has been executed.

Changes

Added list_available_devices to Backend and JitRuntime trait along with backend-specific implementation

Allows the user to get a list of available devices at runtime, e.g.

use burn::{
    backend::{Autodiff, LibTorch, NdArray, Wgpu},
    tensor::backend::Backend,
};

fn main() {
    type A = NdArray;
    type B = Wgpu;
    type C = LibTorch;
    type D = Autodiff<B>;

    println!("NdArray: {:?}", A::list_available_devices());

    println!("Wgpu: {:?}", B::list_available_devices());

    println!("Tch: {:?}", C::list_available_devices());

    println!("Autodiff<Wgpu>: {:?}", D::list_available_devices());
}

NdArray: [Cpu]
Wgpu: [DiscreteGpu(0), IntegratedGpu(0), IntegratedGpu(1)]
Tch: [Cpu, Cuda(0)]
Autodiff<Wgpu>: [DiscreteGpu(0), IntegratedGpu(0), IntegratedGpu(1)]

jin-eld · 2024-06-12T22:34:41Z

Let's move the discussion from Discord to the PR, so that it's easier to track :)

Fist of all, thank you for reacting so quickly! I was about to ask if I should file a feature request, but you are already working on a PR ))

So, throwing in a few thoughts:

Wgpu: [DiscreteGpu(0), IntegratedGpu(0), IntegratedGpu(1)]
Tch: [Cpu, Cuda(0)]
Autodiff<Wgpu>: [DiscreteGpu(0), IntegratedGpu(0), IntegratedGpu(1)]

Would it be possible to generalize the device types to an enum which could be used for all backends? In the above output the code that selects the device would have to be backend specific, i.e. the GPU appears as Cuda(0) on tch, but as DiscreteGpu(0) on Wgpu and so on. Imho it'd be a lot more convenient from a user's perspective if we could have a higher level Burn API which internally maps the devices according to their meaning and uses the same device list enum type for all backends, allowing to check the devices in the same way regardless of which backend is being used. I'm thinking of how currently (referring to various examples in the repo) one selects the desired backend in the beginning and the rest of the code does not care that much which Backend exactly is being used. It'd be awesome if the same approach was possible for querying devices.

You raised the issue, that PyTorch reports ROCm devices as CUDA, however I think, that this can be ignored for this particular use case: the intention is to find a GPU suitable for inference or multiple GPUs suitable for training. Identifying what kind of GPU it really is could perhaps be achieved via a dedicated API querying the device name or device properties (i.e. similar to https://pytorch.org/docs/stable/cuda.html) if someone really requires it.

codecov · 2024-06-13T12:48:42Z

Codecov Report

Attention: Patch coverage is 0% with 66 lines in your changes missing coverage. Please review.

Project coverage is 86.08%. Comparing base (671ec8c) to head (64ae3d6).
Report is 1 commits behind head on main.

Files	Patch %	Lines
crates/burn-wgpu/src/runtime.rs	0.00%	28 Missing ⚠️
crates/burn-tch/src/backend.rs	0.00%	14 Missing ⚠️
crates/burn-candle/src/backend.rs	0.00%	12 Missing ⚠️
crates/burn-autodiff/src/backend.rs	0.00%	3 Missing ⚠️
crates/burn-fusion/src/backend.rs	0.00%	3 Missing ⚠️
crates/burn-jit/src/backend.rs	0.00%	3 Missing ⚠️
crates/burn-ndarray/src/backend.rs	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1883      +/-   ##
==========================================
- Coverage   86.11%   86.08%   -0.04%     
==========================================
  Files         777      778       +1     
  Lines       90555    90846     +291     
==========================================
+ Hits        77979    78202     +223     
- Misses      12576    12644      +68

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

laggui · 2024-06-13T13:20:27Z

With the discussions we had on discord it wasn't too difficult to figure out an easy way to draft this :)

I understand your point of view a bit better now, so ideally you'd like something like this if I got your point right?

pub enum Device<B: Backend> {
    DiscreteGpu(B::Device),
    IntegratedGpu(B::Device),
    Cpu(B::Device),
}

And this enum could possibly grow with different backends (new variants would need to be added for backends with different devices, e.g. TPU).

The issue I see (and tried to raise on discord) is that backends are kind of free to categorize their devices as they wish. For example, with wgpu we can differentiate between discrete and integrated gpu (plus, there are other types like virtual and "other" detected devices). But if we take torch, well their mps device for MacOS doesn't differentiate between a discrete, integrated or external gpu even if metal supports all of them (as far as I understand). In that case, you could think that simply not differentiating between discrete and integrated gpus could work (so simplifying the enum variants), but then if you actually have a choice you probably would prefer a discrete gpu (on wgpu for example). This would remain a choice at the backend-level then...

So I see the motivation but not sure about the usefulness of this abstraction at a higher level 🤔 but I can be convinced otherwise 🙂

jin-eld · 2024-06-13T14:11:33Z

I think there are two ways plus a possible middle ground on how to look at this. One is as you described in the above post, having a detailed enum with precise types and of course this is not without problems: as you correctly pointed out, some backends simply do not provide the necessary information right away or may not provide fine grained information at all.

The other way is to treat this as a convenience API which will be good enough for the majority of standard use cases and here the PyTorch approach is imho sufficient: usually the most pressing question is if there is any usable GPU on the system at all and the wish to select it instead of the CPU. The next more advanced case is with training and LLM inference, where multiple GPUs can be used at the same time (llama.cpp will even use the CPU in addition to the GPUs). In this situation one usually does not care which GPUs exactly there are: it's a "use all that are there" scenario.

So for the above logic I'd argue that it would even be enough to shrink the enum to the Torch-style interpretation of:

pub enum Device<B: Backend> {
    Gpu(B::Device),
    Cpu(B::Device),
    // perhaps TPU, NPU, other devices that are not GPUs can be added later
}

I recognize your point about wanting to prefer a discrete GPU over an integrated GPU, but I think selection of a specific GPU is a more special case, also "use these two, but not all" is also something that the user should decide, so such detailed settings are left to users via command line parameters.

There could perhaps be a way to still provide a way to handle more sophisticated selections for those who require it:

I'd have to fire up my AI server to see what these functions report and how useful the information is, but I am assuming that there will be some more detailed infos about the underlying hardware there:

https://pytorch.org/docs/stable/generated/torch.cuda.get_device_properties.html#torch.cuda.get_device_properties
https://pytorch.org/docs/stable/generated/torch.cuda.get_device_capability.html#torch.cuda.get_device_capability

Here, as a middle way, the idea would be, similar to Torch, to have an additional API function which allows to query a specific Device::GPU(0) and retrieve its properties for use cases where someone wants to implement a "pick the best device" solution.

To be fair, I did not check how much of the device info/properties all backends expose and this could be generalized at all. Then again, if someone wants such a fine grained control, they could indeed go down to the Backend level and use the tch/wgpu/etc functions directly, the convenience API is what it is - for conveniently handling most the most common use cases.

Again, my argument is for a convenience API which does not have to be that detailed and which covers the most common use case of "give me one or more GPUs that there are on the system", regardless of the underlying details. At least this is the main use case that I see when looking at the gazillion of AI applications that are out there. Would be nice if other users share their view on this, how would you guys prefer to handle the CPU vs GPU and multi-GPU scenario in the applications that you are developing?

laggui · 2024-06-13T14:33:59Z

Ok perhaps I misunderstood your initial request 🙂 thought you wanted something a bit in between, but in this case it's simply differentiating high level types.

laggui added 2 commits June 12, 2024 15:16

Add list_available_devices to backend trait

2a746dd

Add wasm default

f20a4b9

Fix clippy

64ae3d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/backend/list devices #1883

Feat/backend/list devices #1883

laggui commented Jun 12, 2024 •

edited

Loading

jin-eld commented Jun 12, 2024

codecov bot commented Jun 13, 2024

laggui commented Jun 13, 2024 •

edited

Loading

jin-eld commented Jun 13, 2024

laggui commented Jun 13, 2024 •

edited

Loading

Feat/backend/list devices #1883

Are you sure you want to change the base?

Feat/backend/list devices #1883

Conversation

laggui commented Jun 12, 2024 • edited Loading

Checklist

Changes

jin-eld commented Jun 12, 2024

codecov bot commented Jun 13, 2024

Codecov Report

laggui commented Jun 13, 2024 • edited Loading

jin-eld commented Jun 13, 2024

laggui commented Jun 13, 2024 • edited Loading

laggui commented Jun 12, 2024 •

edited

Loading

laggui commented Jun 13, 2024 •

edited

Loading

laggui commented Jun 13, 2024 •

edited

Loading