Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

f16 with WGPU #597

Open
Gadersd opened this issue Aug 6, 2023 · 14 comments
Open

f16 with WGPU #597

Gadersd opened this issue Aug 6, 2023 · 14 comments
Labels
enhancement Enhance existing features performance Anything related to performance wgpu Related to WGPU backend

Comments

@Gadersd
Copy link
Contributor

Gadersd commented Aug 6, 2023

Feature description

It would be great if burn-wgpu supported f16. Is there a timeline for this?

Feature motivation

Large models such as Stable Diffusion exceed wgpu's maximum buffer size when using f32. f16 support would enable some such models to be run with the wgpu backend.

@antimora antimora added enhancement Enhance existing features performance Anything related to performance wgpu Related to WGPU backend labels Aug 6, 2023
@nathanielsimard
Copy link
Member

Linked to gfx-rs/wgpu#4384

@Gadersd
Copy link
Contributor Author

Gadersd commented Aug 6, 2023

Does anyone know if the limited buffer size in wgpu will be alleviated eventually? Even if f16 gets supported the buffer size limits will still be a barrier to running large models.

@nathanielsimard
Copy link
Member

You can manually override the limits when selecting the device : https://github.com/burn-rs/burn/blob/ed255c5561b85876cf02cbc4d48f35e1f0d29ac0/burn-wgpu/src/context/base.rs#L228

The limits are low for compatibility reasons I think, but I can increase max_storage_buffer_binding_size on my RTX 3070 to usize::pow(8, 10) and load bigger tensors. I think we should come up with a way to change the limits for specific devices, probably with a config file or env variables (or both).

@Gadersd
Copy link
Contributor Author

Gadersd commented Aug 7, 2023

Would it be reasonable to use the pub fn limits(&self) -> Limits function on the adapter to get the best limits that the adapter offers instead of relying on defaults? I think this would resolve the issue.

@nathanielsimard
Copy link
Member

Ho yes I didn't know that, I'll make a PR soon.

@nathanielsimard
Copy link
Member

@Gadersd PR done #601 Let me know if it helps in running your models.

@Gadersd Gadersd closed this as completed Aug 7, 2023
@Gadersd Gadersd reopened this Aug 7, 2023
@Gadersd
Copy link
Contributor Author

Gadersd commented Aug 7, 2023

My bad, I accidentally tested with tch.

@Gadersd
Copy link
Contributor Author

Gadersd commented Aug 7, 2023

I get the following panic when trying to run stable diffusion: `thread panicked at 'Error in Queue::submit: Validation Error

Caused by:
Parent device is lost
', /home/hermes/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.17.0/src/backend/direct.rs:2289:30
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
thread 'main' panicked at 'Unable to read buffer', /home/hermes/.cargo/git/checkouts/burn-acfbee6a141c1b41/22ab534/burn-wgpu/src/context/client.rs:120:17`

@nathanielsimard
Copy link
Member

nathanielsimard commented Aug 7, 2023

It may happen when you run out of memory. You can try to lower the MAX_TASKS to 1 to reduce memory usage:

https://github.com/burn-rs/burn/blob/9361193b5d62065807fdb6721e95dca8bcf8bf74/burn-wgpu/src/context/server.rs#L65

It might increase the computing time, but it's probably negligible for a big model. Once again, a value that I'm not sure how we should set it 😅.

@Gadersd
Copy link
Contributor Author

Gadersd commented Aug 7, 2023

Setting MAX_TASKS to 1 enabled inference to work, but it was very slow compared to the tch run, ~5 minutes for 1 image with wgpu vs ~15 seconds for two images with tch. Perhaps the value should be settable by the user when the default isn't viable?

@nathanielsimard
Copy link
Member

Yes we could do that for now. There is an issue to optimize the memory strategy: #582.

@nathanielsimard
Copy link
Member

@Gadersd I added a way to configure MAX_TASKS: #603.

@antimora
Copy link
Collaborator

antimora commented Sep 8, 2023

There is a tweet saying "float16 in webGPU finally works now"

https://twitter.com/nisten/status/1698796718840598850

Worth looking into this and see if we need to update anything.

@antimora
Copy link
Collaborator

gfx-rs/wgpu#4384

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhance existing features performance Anything related to performance wgpu Related to WGPU backend
Projects
None yet
Development

No branches or pull requests

3 participants