Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Stable Diffusion model #129

Open
siriux opened this issue Sep 20, 2022 · 8 comments
Open

Support Stable Diffusion model #129

siriux opened this issue Sep 20, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@siriux
Copy link

siriux commented Sep 20, 2022

Is your feature request related to a problem? Please describe.
I would like to be able to run Stable Diffusion using wonnx

Describe the solution you'd like
At least, these operators are missing and should be implemented before even trying too run Stable Diffusion on wonnx:
Einsum, Erf, Expand, InstanceNormalization, Shape, Slice

This is the minimum based on this guide that simplifies the onnx model (see the simplification table):
https://www.photoroom.com/tech/stable-diffusion-25-percent-faster-and-save-seconds/

Probably many more things will be needed, but I'm creating this issue because it can be a really interesting use case to be able to run SD in rust on the GPU directly.

I don't have much experience with wonnx or even ML, but I decided to create this issue because it surprised me how few operators are missing to run this model. I would need to get more experience with stable diffusion, diffusers library and onnx in python before attempting to port it here, but maybe there are more experienced users interested too.

@haixuanTao
Copy link
Collaborator

Hello Sirius, thanks for taking interest in wonnx!

The erf function is not yet a native operation on WGSL, see: https://www.w3.org/TR/WGSL/

It will be required to do an approximation of the erf function, to do stable diffusion on wonnx. I am at this point not sure on how to implement this.

@siriux
Copy link
Author

siriux commented Sep 20, 2022

Thanks for your answer. Again, let me reiterate my ignorance on this field, but this is what I've found.

The implementation used in tract seems very simple https://github.com/sonos/tract/blob/21928fb3652d028db5be1348e6017494318d4b86/onnx-opl/src/erf.rs

Looking at other WGSL shaders for other operations, it seems translatable.

The signum in WGSL is just sign, abs is the same, powi we can just use pow or even unroll it as it's 16 (and it's short and efficient), recip is just 1/x.

copysign is trickier, but for the erf function should be just a multiplication with the original sign (as erf(0) == 0).

I've looked a little bit to the other missing ops, and they don't seem as straight forward.

@FL33TW00D
Copy link

I looked into this a few weeks ago - it is a significant chunk of work for 2 reasons:

  1. The ops to implement are complicated (i.e Einsum)
  2. WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

@siriux
Copy link
Author

siriux commented Oct 11, 2022

Thanks for looking at it. I hope one day we can be able to run something like SD in pure Rust.

@philpax
Copy link
Contributor

philpax commented Nov 13, 2022

As a matter of interest, tch-rs recently implemented Stable Diffusion: https://github.com/LaurentMazare/diffusers-rs

It's not directly applicable to this, but it could inform future development efforts.

@pixelspark
Copy link
Collaborator

  1. WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

I am not too familiar with SD but at least for BERT and other text encoders, parameterized dimensions can be replaced with fixed dimensions just fine (the model will then work with text token strings up to the statically set length).

@pixelspark pixelspark added the enhancement New feature or request label Mar 6, 2023
@pixelspark
Copy link
Collaborator

  1. WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

The shape inference engine in WONNX now supports this (it allows you to set parametrized dimensions, then infer shapes for other outputs).

@pixelspark
Copy link
Collaborator

I looked into this a few weeks ago - it is a significant chunk of work for 2 reasons:

  1. The ops to implement are complicated (i.e Einsum)
  2. WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

As for Einsum: this may be feasible, a first start is in #154

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants