# Setup

In [None]:
!pip install -q git+https://github.com/huggingface/diffusers.git


This command installs the **Diffusers** library directly from its GitHub repository.

* **!pip install -q**: This is a command-line instruction to the **pip** package installer. The exclamation mark `!` at the beginning tells a Jupyter or Colab notebook environment to execute the command in the shell. The `-q` flag stands for "quiet," which suppresses most of the output during the installation process, so you only see minimal information.

* **git+[https://github.com/huggingface/diffusers.git](https://github.com/huggingface/diffusers.git)**: This part specifies the source of the package to be installed. Instead of installing a stable version from the Python Package Index (PyPI), this command uses **git** to clone and install the latest version of the **diffusers** library directly from its main branch on GitHub. This is useful for getting the most up-to-date features and bug fixes that haven't been released on PyPI yet.

In [None]:
!pip install -qU transformers accelerate safetensors


This command installs or upgrades three Python libraries: **transformers**, **accelerate**, and **safetensors**.

* **!pip install**: This is the command used to install packages from the Python Package Index (PyPI).
* **-q**: This stands for "quiet" and suppresses much of the installation's output, so you only see minimal information.
* **-U**: This stands for "upgrade" and tells `pip` to update the packages to their newest available version if they're already installed.
* **transformers**: This library from Hugging Face provides tools for using state-of-the-art machine learning models, particularly those based on the transformer architecture, for tasks in natural language processing (NLP), computer vision, and more.
* **accelerate**: This library, also from Hugging Face, helps to run PyTorch training scripts with minimal changes. It handles the complexities of using different hardware setups like multiple GPUs or CPUs, and it automatically configures your code to work efficiently across these different environments.
* **safetensors**: This is a new format for storing and loading machine learning model weights. It is designed to be safer and faster than the traditional PyTorch `.pt` or `.bin` files, as it avoids executing arbitrary code during the loading process. This prevents potential security vulnerabilities when loading models from untrusted sources.

In [3]:
import gradio as gr
import numpy as np
import os
import torch
import random
from PIL import Image

from diffusers import FluxKontextPipeline
from diffusers.utils import load_image


os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.


This code imports several libraries and configures an environment setting, preparing a script for tasks related to machine learning and image generation.


## Library Imports

* `import gradio as gr`: This line imports the **Gradio** library, which is used to create simple web-based user interfaces for machine learning models, and assigns it the shorter name `gr`.
* `import numpy as np`: This imports the **NumPy** library and gives it the alias `np`. NumPy is essential for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices.
* `import os`: This imports the **os** module, which allows the program to interact with the operating system, for example, to manage files or access environment variables.
* `import torch`: This imports **PyTorch**, a popular open-source machine learning framework used for building and training neural networks.
* `import random`: This imports Python's built-in **random** module, which provides functions for generating random numbers.
* `from PIL import Image`: This imports the `Image` class from the **Python Imaging Library** (Pillow fork). This class is used to open, manipulate, and save various image file formats.


## Diffusers Imports

* `from diffusers import FluxKontextPipeline`: This line imports a specific class named `FluxKontextPipeline` from the **`diffusers`** library by Hugging Face. This class represents a complete text-to-image generation pipeline based on the FLUX model.
* `from diffusers.utils import load_image`: This imports the `load_image` function from the utility tools of the **`diffusers`** library. It is a helper function designed to load images from a file path or URL into a format compatible with the library's models.


## Environment Configuration

* `os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"`: This line sets an environment variable within the current script's process. It enables an accelerated file transfer client (`hf_transfer`) for downloading models and datasets from the Hugging Face Hub, which can significantly speed up download times.

In [4]:
class CFG:
    model = "black-forest-labs/FLUX.1-Kontext-dev"
    device = 'cuda'
    dtype = torch.bfloat16
    variant = "fp16"
    seed = 42

This code defines a class named `CFG` that serves as a container for configuration settings, which are typically used to control the behavior of a machine learning script.

## Attributes

* `model = "black-forest-labs/FLUX.1-Kontext-dev"`: This sets the **model** attribute to a string that is an identifier for a pre-trained model on the Hugging Face Hub. This specific identifier points to the `FLUX.1-Kontext-dev` model from the `black-forest-labs` organization.
* `device = 'cuda'`: This attribute specifies the computational **device** on which the model will run. The value `'cuda'` directs the script to use a compatible NVIDIA GPU for hardware acceleration, which is significantly faster for deep learning tasks than running on a CPU.
* `dtype = torch.bfloat16`: This sets the data type (**dtype**) for the model's calculations to `bfloat16`, a 16-bit floating-point format from PyTorch. Using `bfloat16` can reduce memory consumption and speed up computations compared to the standard 32-bit float, often with a minimal impact on model accuracy.
* `variant = "fp16"`: This **variant** attribute specifies which version of the pre-trained model weights to download. `"fp16"` indicates that the weights are stored in a 16-bit floating-point format, making the model file smaller and faster to load.
* `seed = 42`: This sets a **seed** for random number generation. Fixing the seed to a specific integer, like `42`, ensures that any process involving randomness (such as model weight initialization or data shuffling) will produce the exact same results every time the code is run. This makes experiments reproducible.

# Functions

In [5]:
def infer(input_image, prompt,  guidance_scale=2.5, steps=28, progress=gr.Progress(track_tqdm=True)):

    input_image = input_image.convert("RGB")
    image = pipe(
        image=input_image,
        prompt=prompt,
        guidance_scale=guidance_scale,
        width = input_image.size[0],
        height = input_image.size[1],
        num_inference_steps=steps,
        generator=torch.Generator().manual_seed(CFG.seed),
    ).images[0]

    return image, gr.Button(visible=True)


This function, named `infer`, uses a diffusion model pipeline to generate a new image based on an input image and a text prompt.

**Function Definition and Parameters**

The function is defined to accept several arguments that control the image generation process:

* `input_image`: The initial image that will be modified.
* `prompt`: A text string describing the desired content or style of the output image.
* `guidance_scale`: A number that controls how closely the generated image follows the `prompt`. A higher value means stricter adherence. It defaults to `2.5`.
* `steps`: The number of **inference steps** the model takes to generate the image. More steps can lead to higher quality but increase processing time. It defaults to `28`.
* `progress`: A **Gradio** progress bar object that is used to visually track the generation process in the user interface.


**Image Generation Process**

Inside the function, two main operations occur:

1.  **Image Conversion**: The line `input_image = input_image.convert("RGB")` ensures the input image is in the standard **RGB** (Red, Green, Blue) color format. This is a common preprocessing step to remove any transparency channel (alpha) and standardize the input for the model.
2.  **Pipeline Execution**: The code calls an object named `pipe` (which represents the pre-loaded `FluxKontextPipeline`) to perform the main image generation. It passes several key arguments:
    * The `input_image` and `prompt` are provided as the main inputs.
    * The output `width` and `height` are set to match the dimensions of the `input_image`.
    * The `num_inference_steps` and `guidance_scale` are passed from the function's arguments.
    * A new PyTorch `Generator` is created and seeded with `CFG.seed`. This ensures that the random aspects of the generation process are **reproducible**, meaning the same inputs will always produce the same output image.
    * Finally, `.images[0]` extracts the first generated image from the pipeline's output.

**Return Values**

The function returns a tuple containing two items:

1.  `image`: The final, newly generated image.
2.  `gr.Button(visible=True)`: An update for a Gradio button component, making it visible in the user interface. This is used to show a button (e.g., "Reset" or "Download") only after the generation process is complete.

# Edit 1

In [None]:
pipe = FluxKontextPipeline.from_pretrained(CFG.model, torch_dtype = CFG.dtype)
pipe.to(CFG.device)

This code initializes a pre-trained image generation model and prepares it for use on a GPU.


**Loading the Model Pipeline **

The first line, `pipe = FluxKontextPipeline.from_pretrained(CFG.model, torch_dtype=CFG.dtype)`, handles downloading and setting up the model.

It uses the `from_pretrained` method to fetch a specific **pre-trained model**, identified by `CFG.model` (which is `"black-forest-labs/FLUX.1-Kontext-dev"`), from the Hugging Face Hub. The entire toolset, including the model and its supporting components, is loaded as a complete **pipeline** object and assigned to the variable `pipe`.

The `torch_dtype=CFG.dtype` argument tells the library to load the model's weights using the `bfloat16` **data type**, which reduces memory usage and can speed up calculations.


**Moving the Model to the GPU **

The second line, `pipe.to(CFG.device)`, moves the entire loaded `pipe` object to the specified computational device.

Since `CFG.device` is set to `'cuda'`, this command transfers the model from the computer's RAM to the **GPU's** memory (VRAM). This step is crucial for **hardware acceleration**, as GPUs can perform the complex mathematical operations required for image generation much faster than a CPU.

In [7]:
with gr.Blocks() as demo:

    with gr.Column(elem_id="col-container"):

        with gr.Row():
            with gr.Column():
                input_image = gr.Image(label="Upload the image for editing", type="pil")
                with gr.Row():
                    prompt = gr.Text(
                        label="Prompt",
                        show_label=False, max_lines=1,
                        placeholder="Enter your prompt for editing",
                        container=False,
                    )
                    run_button = gr.Button("Run", scale=0)

                with gr.Accordion("Advanced Settings", open=False):


                    guidance_scale = gr.Slider(
                        label="Guidance Scale", minimum=1,  maximum=10, step=0.1, value=2.5, )

                    steps = gr.Slider( label="Steps",  minimum=1, maximum=30,  value=28, step=1  )

            with gr.Column():
                result = gr.Image(label="Result", show_label=False, interactive=False)
                reuse_button = gr.Button("Reuse this image", visible=False)


    gr.on(
        triggers=[run_button.click, prompt.submit],
        fn = infer,
        inputs = [input_image, prompt, guidance_scale, steps],
        outputs = [result, reuse_button]
    )
    reuse_button.click(
        fn = lambda image: image,
        inputs = [result],
        outputs = [input_image]
    )

This code uses the **Gradio** library to build a web-based user interface for an image editing application. It defines the layout of the interface and connects the UI elements to the backend processing function.


## UI Layout

The entire interface is created within a `gr.Blocks()` context, which allows for a custom layout.

The layout consists of a main vertical `Column` that holds a single `Row`. This `Row` is then split into two columns, creating a side-by-side view:

* **Left Column (Inputs):** This area is for user input. It contains:
    * An `Image` component for uploading the initial image.
    * A `Text` input box for the user to type their editing `prompt`.
    * A `Button` labeled "Run" to start the process.
    * A collapsible `Accordion` section for "Advanced Settings," which holds two `Slider` controls: one for **Guidance Scale** and one for **Steps**.
* **Right Column (Outputs):** This area displays the results. It contains:
    * An `Image` component to show the final generated `result`.
    * A `Button` labeled "Reuse this image," which is hidden by default.


## Functionality and Events

The code defines how the interface elements interact with the Python functions.

* **Main "Run" Event**: The `gr.on(...)` block sets up the primary event listener.
    * **Triggers**: The process starts when the user either clicks the `run_button` or presses Enter after typing in the `prompt` box.
    * **Function**: When triggered, it calls the `infer` function.
    * **Inputs/Outputs**: It gathers the current values from the `input_image` uploader, the `prompt` box, and the two `sliders` and passes them to the `infer` function. It then takes the values returned by `infer` and uses them to update the `result` image display and the visibility of the `reuse_button`.

* **"Reuse" Button Event**: The `reuse_button.click(...)` block defines a second, simpler interaction.
    * When the "Reuse this image" button is clicked, it takes the image currently displayed in the `result` component and copies it directly into the `input_image` component on the left. This allows a user to perform sequential edits on a generated image.

In [None]:
demo.launch(debug=True)