Skip to content
Disty0 edited this page Nov 10, 2023 · 32 revisions

SD.Next includes experimental support for additional model pipelines
This includes support for additional models such as:

  • Stable Diffusion XL
  • Kandinsky
  • Deep Floyd IF

And soon:

  • Shap-E, UniDiffuser, Consistency Models, Diffedit Zero-Shot
  • Text2Video, Video2Video, etc...

This has been made possible by integration of huggingface diffusers library with the help of huggingface team!

How to

Moved to Installation and SDXL

Integration

Standard workflows

  • txt2img
  • img2img
  • inpaint
  • process

Model Access

  • For standard SD 1.5 and SD 2.1 models, you can use either
    standard safetensor models (single file) or diffusers models (folder structure)
  • For additional models, you can use diffusers models only
  • You can download diffuser models directly from Huggingface hub
    or use built-in model search & download in SD.Next: UI -> Models -> Huggingface
  • Note that access to some models is gated
    In which case, you need to accept model EULA and provide your huggingface token
  • When loading safetensors models, you must specify model pipeline type in:
    UI -> Settings -> Diffusers -> Pipeline
    When loading huggingface models, pipeline type is automatically detected
  • If you get this Diffuser model downloaded error: model=stabilityai/stable-diffusion-etc [Errno 2] No such file or directory:
    you need to go to the HuggingFace page and accept the EULA for that model.

Extra Networks

  • Lora networks
  • Textual inversions (embeddings)

Note that Lora and TI need are still model-specific, so you cannot use Lora trained on SD 1.5 on SD-XL
(just like you couldn't do it on SD 2.1 model) - it needs to be trained for a specific model

Support for SD-XL training is expected shortly

Diffuser Settings

  • UI -> Settings -> Diffuser Settings
    contains additional tunable parameters

Samplers

  • Samplers (schedulers) are pipeline specific, so when running with diffuser backend, you'll see a different list of samplers
  • UI -> Settings -> Sampler Settings shows different configurable parameters depending on backend
  • Recommended sampler for diffusers is DEIS

Other

  • Updated System Info tab with additional information
  • Support for lowvram and medvram modes - Both work extremely well
    Additional tunables are available in UI -> Settings -> Diffuser Settings
  • Support for both default SDP and xFormers cross-optimizations
    Other cross-optimization methods are not available
  • Extra Networks UI will show available diffusers models
  • CUDA model compile
    UI Settings -> Compute settings
    Requires GPU with high VRAM
    Diffusers recommend reduce overhead compile mode, but other methods are available as well
    Fullgraph compile is possible (with sufficient vram) when using diffusers
  • Note that some CUDA compile modes only work on Linux

SD-XL Notes

  • SD-XL Technical Report
  • SD-XL model is designed as two-stage model
    You can run SD-XL pipeline using just base model or load both base and refiner models
    • base: Trained on images with variety of aspect ratios and uses OpenCLIP-ViT/G and CLIP-ViT/L for text encoding
    • refiner: Trained to denoise small noise levels of high quality data and uses the OpenCLIP model
    • Having both base model and refiner model loaded can require significant VRAM
    • If you want to use refiner model, it is advised to add sd_model_refiner to quicksettings
      in UI Settings -> User Interface
  • SD-XL model was trained on 1024px images
    You can use it with smaller sizes, but you will likely get better results with SD 1.5 models
  • SD-XL model NSFW filter has been turned off

Download SD-XL 1.0

  1. Enter stabilityai/stable-diffusion-xl-base-1.0 in Select Model and press Download
  2. Enter stabilityai/stable-diffusion-xl-refiner-1.0 in Select Model and press Download

Limitations

  • Any extension that requires access to model internals will likely not work when using diffusers backend
    This for example includes standard extensions such as ControlNet, MultiDiffusion,
    Note: application will auto-disable incompatible built-in extensions when running in diffusers mode
  • Explicit refiner as postprocessing is not yet implemented
  • Hypernetworks
  • Limited callbacks support for scripts/extensions: additional callbacks will be added as needed

Performance

Comparison of original stable diffusion pipeline and diffusers pipeline when using standard SD 1.5 model
Performance is measured for batch-size 1, 2, 4, 8 16

pipeline performance it/s memory cpu/gpu
original 7.99 / 7.93 / 8.83 / 9.14 / 9.2 6.7 / 7.2
original medvram 6.23 / 7.16 / 8.41 / 9.24 / 9.68 8.4 / 6.8
original lowvram 1.05 / 1.94 / 3.2 / 4.81 / 6.46 8.8 / 5.2
diffusers 9 / 7.4 / 8.2 / 8.4 / 7.0 4.3 / 9.0
diffusers medvram 7.5 / 6.7 / 7.5 / 7.8 / 7.2 6.6 / 8.2
diffusers lowvram 7.0 / 7.0 / 7.4 / 7.7 / 7.8 4.3 / 7.2
diffusers with safetensors 8.9 / 7.3 / 8.1 / 8.4 / 7.1 5.9 / 9.0

Notes:

  • Test environment: nVidia RTX 3060 GPU, Torch 2.1-nightly with CUDA 12.1, Cross-optimization: SDP
  • All being equal, diffusers seem to:
    • Use slightly less RAM and more VRAM
    • Have highly efficient medvram/lowvram equivalents which don't lose a lot of performance
    • Faster on smaller batch sizes, slower on larger batch sizes
Clone this wiki locally