Stable Diffusion V3 is next generation of latent diffusion image Stable Diffusion models family that outperforms state-of-the-art text-to-image generation systems in typography and prompt adherence, based on human preference evaluations. In comparison with previous versions, it based on Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
More details about model can be found in model card, research paper and Stability.AI blog post. In this tutorial, we will consider how to convert Stable Diffusion v3 for running with OpenVINO. An additional part demonstrates how to run optimization with NNCF to speed up pipeline. If you want to run previous Stable Diffusion versions, please check our other notebooks:
The notebooks provides a simple interface that allows communication with a model using text instruction. In this demonstration user can provide input instructions and the model generates an image. An additional part demonstrates how to optimize model with NNCF to speed up pipeline and reduce memory consumption. The torch FX notebook also follows a similar approach but showcasing the capabilities of NNCF and torch compile with openvino backend in successfully optimizing with the torch FX model representation.
The image below illustrates the provided generated image example.
Note: Some demonstrated models can require at least 32GB RAM for conversion and running.
This folder contains notebooks that demonstrate the use of the Stable Diffusion v3 model with OpenVINO and Torch FX representations.
The OpenVINO tutorial consists of the following steps:
- Install prerequisites
- Convert model to OpenVINO intermediate representation (IR) format and compress weights using NNCF
- Prepare OpenVINO Inference pipeline
- Run Text-to-Image generation
- Launch interactive demo
The Torch FX tutorial consists of the following steps:
- Install prerequisites
- Collect Pytorch model pipeline
- Convert model to Torch FX representation format and compress weights using NNCF
- Compile the model using torch.compile with backend openvino
- Run Text-to-Image generation
- Compare the results of the original and optimized pipelines
- Launch interactive demo
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.