# Notebook: Generating Images with Stable Diffusion using Gradio

This Python script demonstrates how to create an interactive image generation application using the Stable Diffusion model and the Gradio library. The code utilizes the following libraries:

- **Gradio**: A Python library for creating user-friendly web interfaces.
- **PIL (Python Imaging Library)**: Used for image processing.
- **Torch**: A deep learning framework used for building and deploying machine learning models.
- **Diffusers**: A library from Hugging Face for running diffusion models like Stable Diffusion.

In [4]:
pip install gradio torch diffusers transformers accelerate

Collecting diffusers
  Downloading diffusers-0.30.0-py3-none-any.whl.metadata (18 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-

In [5]:
import gradio as gr
from PIL import Image
import torch
from diffusers import StableDiffusionPipeline

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

The 'auth_token' is your personal access token obtained from Hugging Face.
It is required to authenticate and access certain models, like Stable Diffusion.
To get your token, sign up or log in at https://huggingface.co/, go to your account settings, and generate an Access Token. Paste the token as the value of 'auth_token' below.

In [9]:
auth_token = "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"

This code does the following:

- Check if a CUDA-compatible GPU is available; if so, use it (device="cuda"), otherwise default to using the CPU (device="cpu").
- Specify the model ID for the pre-trained Stable Diffusion model. "CompVis/stable-diffusion-v1-4" is the name of the model hosted on Hugging Face.
- Initialize the Stable Diffusion pipeline with the specified model. The 'auth_token' is your personal access token from Hugging Face, required to authenticate and access the model.
- Move the model to the specified device (GPU if available, otherwise CPU) for computation.

In [10]:
# Initialize the model
device = "cuda" if torch.cuda.is_available() else "cpu"
modelid = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(modelid, use_auth_token=auth_token)
pipe.to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

Fetching 16 files:   0%|          | 0/16 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

safety_checker/config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

scheduler/scheduler_config.json:   0%|          | 0.00/313 [00:00<?, ?B/s]

(…)kpoints/scheduler_config-checkpoint.json:   0%|          | 0.00/209 [00:00<?, ?B/s]

tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

text_encoder/config.json:   0%|          | 0.00/592 [00:00<?, ?B/s]

(…)ature_extractor/preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

tokenizer/special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

tokenizer/tokenizer_config.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

unet/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

vae/config.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

Keyword arguments {'use_auth_token': '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'} are not expected by StableDiffusionPipeline and will be ignored.


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

StableDiffusionPipeline {
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.30.0",
  "_name_or_path": "CompVis/stable-diffusion-v1-4",
  "feature_extractor": [
    "transformers",
    "CLIPImageProcessor"
  ],
  "image_encoder": [
    null,
    null
  ],
  "requires_safety_checker": true,
  "safety_checker": [
    "stable_diffusion",
    "StableDiffusionSafetyChecker"
  ],
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

In [11]:
def generate(prompt):
    # Disable gradient calculations to speed up the process and reduce memory usage
    # since we are not training the model, only using it for inference.
    with torch.no_grad():
        # Generate an image from the provided text prompt using the Stable Diffusion pipeline.
        # The 'guidance_scale' parameter controls how much the generated image should align with the prompt.
        # A higher guidance_scale makes the image more aligned with the prompt.
        image = pipe(prompt, guidance_scale=8.5).images[0]
    return image

In [12]:
# Create a Gradio interface
iface = gr.Interface(
    # The function 'generate' will be called when the interface is used.
    fn=generate,
    # The input to the interface will be a text prompt.
    inputs="text",
    # The output of the interface will be an image.
    outputs="image",
    # The title of the interface.
    title="Stable Diffusion",
    # A brief description of the interface.
    description="Enter a prompt to generate an image using Stable Diffusion."
)
# Launch the interface with sharing enabled
iface.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://747fc216e39f522a6b.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


