In [1]:
!pip install -qU transformers accelerate safetensors

The command `!pip install -qU transformers accelerate safetensors` installs or updates three Python libraries.

* `!pip install`: This is a standard command for installing Python packages.
* `-q`: This option means "quiet," so less output is shown during installation.
* `-U`: This option means "upgrade," so if the packages are already installed, they will be updated to the newest version.

The packages being installed are:

* **transformers**: A library from Hugging Face that provides pre-trained models for Natural Language Processing (NLP) tasks like text summarization, translation, and question-answering.
* **accelerate**: A library that helps run PyTorch code on different kinds of hardware (like GPUs or TPUs) with minimal code changes.
* **safetensors**: A file format for storing large and complex data structures (tensors) safely and efficiently.

In [2]:
!pip install git+https://github.com/huggingface/diffusers.git

Collecting git+https://github.com/huggingface/diffusers.git
  Cloning https://github.com/huggingface/diffusers.git to /tmp/pip-req-build-xu6qefad
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/diffusers.git /tmp/pip-req-build-xu6qefad
  Resolved https://github.com/huggingface/diffusers.git to commit 425a715e35479338c06b2a68eb3a95790c1db3c5
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


This command installs the `diffusers` library directly from its source code on GitHub.

Instead of getting a stable version from the Python Package Index (PyPI), this command uses **Git**, a version control system, to download the most current code from the Hugging Face `diffusers` repository. This is often done to get the very latest features or bug fixes that have not yet been released in an official package version. In this instance we need to be able to use the FluxKontextPipeline.

In [3]:
import gradio as gr
import numpy as np
import os
import torch
import random
from PIL import Image

from diffusers import FluxKontextPipeline
from diffusers.utils import load_image


os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

This code imports several Python libraries needed to run an AI image generation model and sets an environment variable for faster model downloads.

---
## Library Imports

* **`import gradio as gr`**: Imports the **Gradio** library, which is used to create simple web interfaces for machine learning models.
* **`import numpy as np`**: Imports the **NumPy** library, a fundamental package for scientific computing and working with arrays of numbers.
* **`import os`**: Imports the **os** module, which allows the program to interact with the operating system, for instance, to manage files and directories.
* **`import torch`**: Imports **PyTorch**, a popular machine learning framework that provides tools for building and training neural networks.
* **`import random`**: Imports the **random** module for generating random numbers.
* **`from PIL import Image`**: Imports the **Image** module from the Python Imaging Library (PIL), used for opening, manipulating, and saving many different image file formats.
* **`from diffusers import FluxKontextPipeline`**: From the **`diffusers`** library, this imports a specific image generation model pipeline named **`FluxKontextPipeline`**.
* **`from diffusers.utils import load_image`**: Imports a helper function named **`load_image`** from the `diffusers` library to easily load images.

---
## Environment Variable

* **`os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"`**: This line sets an environment variable that enables a faster library called `hf_transfer` for downloading models and datasets from the Hugging Face Hub. This speeds up the initial setup process.

In [4]:
class CFG:
    model = "black-forest-labs/FLUX.1-Kontext-dev"
    device = 'cuda'
    dtype = torch.bfloat16
    variant = "fp16"
    seed = 42

This code defines a configuration class named `CFG`.

The class acts as a container to hold various settings for a machine learning script. This approach groups all the important parameters in one place, making them easy to find and change.

---
### **Configuration Parameters**

* **`model`**: This sets the model identifier to **`"black-forest-labs/FLUX.1-Kontext-dev"`**. This string is the name of a pre-trained model, likely hosted on the Hugging Face Hub.
* **`device`**: This specifies the hardware device for running the model as **`'cuda'`**, which refers to a NVIDIA GPU. Using a GPU significantly speeds up calculations for machine learning models.
* **`dtype`**: This sets the data type for the model's calculations to **`torch.bfloat16`**. This is a 16-bit floating-point format from the PyTorch library that can speed up computations and reduce memory use while maintaining a good level of precision.
* **`variant`**: This sets the model variant to **`"fp16"`**. This indicates that a version of the model using 16-bit floating-point precision should be used.
* **`seed`**: This sets a random seed to **`42`**. Setting a seed ensures that any process involving randomness (like initializing model weights) will produce the exact same results every time the code is run.

In [None]:
pipe = FluxKontextPipeline.from_pretrained(CFG.model, torch_dtype= CFG.dtype).to(CFG.device)


This code initializes and prepares an AI model for use.

It takes the **`FluxKontextPipeline`** model and performs two main actions:

1.  **`from_pretrained(CFG.model, torch_dtype=CFG.dtype)`**: This part loads the pre-trained model identified by `CFG.model`. It also sets the model's numerical precision to `CFG.dtype` (`torch.bfloat16`) to optimize its performance and memory usage.

2.  **`.to(CFG.device)`**: After loading, this method moves the entire model onto the specified hardware, which is the **GPU** (`'cuda'`). Running the model on a GPU is much faster than using a CPU.

The final, ready-to-use model is then stored in the variable named `pipe`.

In [14]:
def infer(input_image, prompt,  guidance_scale=2.5, steps=28, progress=gr.Progress(track_tqdm=True)):

    input_image = input_image.convert("RGB")
    image = pipe(
        image=input_image,
        prompt=prompt,
        guidance_scale=guidance_scale,
        width = input_image.size[0],
        height = input_image.size[1],
        num_inference_steps=steps,
        generator=torch.Generator().manual_seed(CFG.seed),
    ).images[0]

    return image, gr.Button(visible=True)

This Python code defines a function named `infer` that generates a new image based on an existing image and a text description.

### Function Purpose

The **`infer`** function is designed to be the core logic for an image editing or generation task. It takes an input image, a text **`prompt`**, and a few settings to create an output image. The `progress` parameter shows it's intended to be used with a Gradio interface to display a progress bar.

---

### Step-by-Step Process

1.  **Convert Image Format**: The line `input_image = input_image.convert("RGB")` ensures the input image is in the standard **RGB** (Red, Green, Blue) color format, which is a common requirement for image models.

2.  **Generate New Image**: The function then calls the **`pipe`** object (the model pipeline loaded earlier). It provides the model with all the necessary information:
    * **`image`**: The original input image.
    * **`prompt`**: The text instructions for how to change the image.
    * **`guidance_scale`**: A number that controls how strictly the model should follow the prompt.
    * **`width`** and **`height`**: These are set to the dimensions of the input image, so the output image has the same size.
    * **`num_inference_steps`**: The number of steps the model takes to generate the image.
    * **`generator`**: This creates a random number generator with a fixed **seed** (`CFG.seed`). Using a seed ensures that the generation process is repeatable and will produce the same output for the same inputs.

3.  **Return Output**: The function returns two items:
    * The newly generated **`image`**.
    * A Gradio button object that is set to be visible. This is likely used to re-enable a "submit" or "run" button in the web interface after the image generation is complete.

In [15]:
with gr.Blocks() as demo:

    with gr.Column(elem_id="col-container"):

        with gr.Row():
            with gr.Column():
                input_image = gr.Image(label="Upload the image for editing", type="pil")
                with gr.Row():
                    prompt = gr.Text(
                        label="Prompt",
                        show_label=False, max_lines=1,
                        placeholder="Enter your prompt for editing",
                        container=False,
                    )
                    run_button = gr.Button("Run", scale=0)

                with gr.Accordion("Advanced Settings", open=False):


                    guidance_scale = gr.Slider(
                        label="Guidance Scale", minimum=1,  maximum=10, step=0.1, value=2.5, )

                    steps = gr.Slider( label="Steps",  minimum=1, maximum=30,  value=28, step=1  )

            with gr.Column():
                result = gr.Image(label="Result", show_label=False, interactive=False)
                reuse_button = gr.Button("Reuse this image", visible=False)


    gr.on(
        triggers=[run_button.click, prompt.submit],
        fn = infer,
        inputs = [input_image, prompt, guidance_scale, steps],
        outputs = [result, reuse_button]
    )
    reuse_button.click(
        fn = lambda image: image,
        inputs = [result],
        outputs = [input_image]
    )

This code uses the **Gradio** library to create a web interface for the image generation function described previously.

The interface allows a user to upload an image, type a text prompt, adjust settings, and see the resulting edited image.

---

### UI Layout

The code defines the visual layout of the web application.

* **Main Container**: The entire interface is organized into a main column.
* **Two-Column Layout**: Inside, the layout is split into two columns, side-by-side.
    * **Left Column (Inputs)**:
        * An **`Image`** upload box for the user to provide the initial picture.
        * A **`Text`** input box for the user to type their prompt (e.g., "make it look like a watercolor painting").
        * A **`Button`** labeled "**Run**" to start the process.
        * An **`Accordion`** labeled "**Advanced Settings**" which is closed by default. It contains two **`Slider`** controls:
            * **Guidance Scale**: To adjust how strongly the model follows the prompt.
            * **Steps**: To control the number of generation steps.
    * **Right Column (Outputs)**:
        * An **`Image`** display area to show the final **`Result`**.
        * A hidden **`Button`** labeled "**Reuse this image**."

---

### Functionality

This part of the code connects the user interface elements to Python functions.

* **Generating an Image**:
    * **Trigger**: Clicking the "**Run**" button or pressing Enter in the prompt box.
    * **Action**: This calls the **`infer`** function.
    * **Inputs**: It sends the uploaded image, the prompt text, and the values from the two sliders (`guidance_scale` and `steps`) to the `infer` function.
    * **Outputs**: The generated image is displayed in the **`result`** box, and the "**Reuse this image**" button becomes visible.

* **Reusing an Image**:
    * **Trigger**: Clicking the "**Reuse this image**" button.
    * **Action**: It takes the image from the **`result`** box and moves it to the **`input_image`** box on the left. This allows the user to perform further edits on the newly created image.

In [None]:
demo.launch(debug=True)

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://428945cd0a4590f21e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Generation `height` and `width` have been adjusted to 1024 and 1008 to fit the model requirements.


  0%|          | 0/28 [00:00<?, ?steps/s]

Generation `height` and `width` have been adjusted to 1024 and 1008 to fit the model requirements.


  0%|          | 0/28 [00:00<?, ?steps/s]

Generation `height` and `width` have been adjusted to 880 and 1168 to fit the model requirements.


  0%|          | 0/28 [00:00<?, ?steps/s]

Generation `height` and `width` have been adjusted to 880 and 1168 to fit the model requirements.


  0%|          | 0/28 [00:00<?, ?steps/s]

Generation `height` and `width` have been adjusted to 880 and 1168 to fit the model requirements.


  0%|          | 0/28 [00:00<?, ?steps/s]