# Computer vision and deep learning - Laboratory 6

In this last laboratory, we will switch our focus from implementing and training neural networks to developing a machine learning application.
More specifically you will learn how you can convert your saved torch model into a more portable format using torch script and how you can create a simple demo application for your model.



In [1]:
!pip install gradio
!pip install torch
!pip install torchvision
!pip install opencv-python



# Creating a simple UI with gradio


[Gradio](https://www.gradio.app/docs/interface) is an open-source Python library used for creating customizable UI components for machine learning models with just a few lines of code. It greatly simplifies the process of building web-based interfaces to interact with ML models without requiring extensive knowledge of web development and allows you to quickly build an MVP and get feedback from the users.


To get an application running, you just need to specify three parameters:
1. the function to wrap the interface around.
2. what are the desired input components?
3. what are the desired output components?


This is achieved through the ``gradio.Interface`` class, the central component in gradio, responsible for creating the user interface for your machine learning model.


```
import gradio as gr
demo = gr.Interface(fn=image_classifier,
                    inputs="image",
                    outputs="label")


```


Once you've defined the gr.Interface, the launch() method is used to start the interface, making it accessible through a web browser.


```
demo.launch()
```


When the launch method is called, ```gradio``` launches a simple web server that serves the demo. If you specify ```share=True``` when calling the launch function, ```gradio``` will create a public link Can also be used to create a public link used by anyone to access the demo from their browser.


## Simple UI for image classification in gradio

Below you have an example of how you could use ```gradio``` to create a simple UI for an image classification problem.

In [132]:
import numpy as np
import gradio as gr

CLASSES = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

def softmax(x):
    return(np.exp(x - np.max(x)) / np.exp(x - np.max(x)).sum())


def classify_image(img):
    # TODO run a classification model to get the class scores
    prediction = softmax(np.random.randn(10, ))
    confidences = {CLASSES[i]: float(prediction[i]) for i in range(len(CLASSES))}
    return confidences

ui = gr.Interface(fn=classify_image,
             inputs=gr.Image(),
             outputs=gr.Label(num_top_classes=3),
             # TODO replace example1.png example2.png with some images from your device
            #examples=['example1.png', 'example2.png']
          )
ui.launch()

Running on local URL:  http://127.0.0.1:7896

To create a public link, set `share=True` in `launch()`.




## Accessing the webcam with gradio

In the example below, you have an example in which you take the input images from your webcam.
The function wrapped by gradio uses a mask to blur the input image outside that mask. If you plan to do background blurring, the mask could be the segmentation mask predicted by your model.



In [133]:
import cv2
import gradio as gr
import numpy as np

def blur_background(input_image):
    input_image = cv2.cvtColor(input_image, cv2.COLOR_RGB2BGR)

    # Generate a blank mask
    # TODO your code here: call a segmentation model to get predicted mask
    mask = np.zeros_like(input_image)

    # for demo purposes, we are going to create a random segmentation mask
    #  just a circular blob centered in the middle of the image
    center_x, center_y = mask.shape[1] // 2, mask.shape[0] // 2
    cv2.circle(mask, (center_x, center_y), 100, (255, 255, 255), -1)

    # Convert the mask to grayscale
    mask_gray = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
    mask_gray = mask_gray[:,:,np.newaxis]



    # apply a strong Gaussian blur to the areas outside the mask
    blurred = cv2.GaussianBlur(input_image, (51, 51), 0)
    result = np.where(mask_gray, input_image, blurred)

    # Convert the result back to RGB format for Gradio
    result = cv2.cvtColor(result, cv2.COLOR_BGR2RGB)
    return result


ui = gr.Interface(
    fn=blur_background,
    inputs=gr.Image(sources=["webcam"]),
    outputs="image",
    title="Image segmentation demo!"

)
ui.launch()

Running on local URL:  http://127.0.0.1:7897

To create a public link, set `share=True` in `launch()`.




## Laboratory assignment


Now you have all the knowledge required to build your own ML semantic segmentation application.


1. First use ```torchscript``` to obtain a model binary.
2. Using gradio, create a simple application that uses the semantic segmentation that you developed. Feel free to define the scope and the functional requirements of your app.
3. __[Optional, independent work]__ Use a serverless cloud function on [AWS Lambda](https://aws.amazon.com/lambda/) (this requires an account on Amazon AWS and you need to provide the details of a credit card) to run the prediction and get the results.


Congratulations, you've just completed all the practical work for Computer Vision and Deep Learning!
May your data always be clean, your models accurate, and your code bug-free!





In [2]:
import torch

def center_crop(original_image, target_image):
    # get dimensions of the original and target images
    original_height, original_width = original_image.shape[-2], original_image.shape[-1]
    target_height, target_width = target_image.shape[-2], target_image.shape[-1]

    # calculate starting indices for cropping
    start_height = max(0, (original_height - target_height) // 2)
    start_width = max(0, (original_width - target_width) // 2)

    # calculate ending indices for cropping
    end_height = start_height + target_height
    end_width = start_width + target_width

    # perform cropping
    cropped_image = original_image[..., start_height:end_height, start_width:end_width]

    return cropped_image


class DoubleSamplingBlock(torch.nn.Module):
    def __init__(self, in_channels, out_channels, mid_channels=None):
        super().__init__()

        if mid_channels is None:
            mid_channels = out_channels

        # sequential block with two convolutional layers
        self.double_conv = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels=in_channels, out_channels=mid_channels, kernel_size=3),
            torch.nn.BatchNorm2d(mid_channels),
            torch.nn.ReLU(inplace=True),

            torch.nn.Conv2d(in_channels=mid_channels, out_channels=out_channels, kernel_size=3),
            torch.nn.BatchNorm2d(out_channels),
            torch.nn.ReLU(inplace=True),
        )

    def forward(self, X):
        return self.double_conv(X)
    

class EncoderBlock(torch.nn.Module):
    def __init__(self, in_channels, out_channels) -> None:
        super().__init__()
        self.encoder = torch.nn.Sequential(
            # 2x2 max pooling at the beginning
            torch.nn.MaxPool2d(kernel_size=2),

            # 2 upsample blocks
            DoubleSamplingBlock(in_channels, out_channels=out_channels),
        )

    def forward(self, X):
        return self.encoder(X)


class DecoderBlock(torch.nn.Module):
    def __init__(self, input_channels, output_channels) -> None:
        super().__init__()

        self.up = torch.nn.Sequential(
            torch.nn.ConvTranspose2d(
                in_channels=input_channels,
                out_channels=input_channels // 2,
                kernel_size=2,
                stride=2,
                padding=0
            ),
            torch.nn.BatchNorm2d(input_channels // 2),
            torch.nn.ReLU(inplace=True),
        )
        self.sampling_block = DoubleSamplingBlock(
            in_channels=input_channels,
            out_channels=output_channels,
            mid_channels=input_channels // 2
        )

    def forward(self, encoder_features, X):
        X = self.up(X)

        # center crop the encoder features to match the input
        encoder_features = center_crop(encoder_features, X)

        # concatenate the encoder features and input along the channel dimension
        X = torch.cat([encoder_features, X], dim=1)

        # apply the sampling block
        X = self.sampling_block(X)

        return X
    

class UNet(torch.nn.Module):
    def __init__(self, in_channels, num_classes, intermediary_filters=64, num_layers=4):
        super().__init__()

        # initial convolution block
        self.in_convolution = DoubleSamplingBlock(
            in_channels=in_channels,
            out_channels=intermediary_filters,
            mid_channels=64
        )

        # list of encoder blocks
        self.encoders = torch.nn.ModuleList([
            EncoderBlock(
                in_channels=intermediary_filters * 2**i,
                out_channels=2 * intermediary_filters * 2**i
            ) for i in range(num_layers)
        ])

        # list of decoder blocks
        self.decoders = torch.nn.ModuleList([
            DecoderBlock(
                input_channels=2 * intermediary_filters * 2**i,
                output_channels=intermediary_filters * 2**i
            ) for i in range(num_layers)
        ])

        # output convolution layer
        self.out_convolution = torch.nn.Conv2d(
            in_channels=intermediary_filters,
            out_channels=num_classes,
            kernel_size=1
        )

    def forward(self, X):
        X = self.in_convolution(X)

        # list to store intermediate outputs from each encoder block
        outputs = [X]

        # forward pass through encoder blocks
        for i, encoder in enumerate(self.encoders):
            outputs.append(encoder.forward(outputs[i]))

        # get the output from the last encoder block
        X = outputs[-1]

        # forward pass through decoder blocks in reverse order
        for i, decoder in enumerate(self.decoders[::-1]):
            X = decoder.forward(outputs[len(self.decoders) - 1 - i], X)

        # apply the final output convolution layer
        X = self.out_convolution(X)

        return X

In [4]:
import pickle

with open('./Downloads/saved_model.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

In [21]:
import cv2
import matplotlib.pyplot as plt
import torch
import torchvision.transforms as v2

INPUT_SHAPE = (128, 128)

def custom_transform(X):
    with torch.no_grad():
        if X is not None:
            # transpose channels to PyTorch format (C, H, W)
            X = cv2.cvtColor(X, cv2.COLOR_BGR2RGB)
            X = X.transpose([2, 0, 1])
            # convert to PyTorch tensor
            X = torch.from_numpy(X)
            X = X.to(torch.float32)
            # normalize pixel values to the range [0, 1]
            X = X / 255
            # resize image to the desired input shape
            X = torch.nn.functional.interpolate(X.view(-1, *X.shape), size=INPUT_SHAPE).view(-1, *INPUT_SHAPE)
        
    return X

def apply_inverse_transform(image, label):
    with torch.no_grad():
        if label is not None:
            # convert label to one-hot encoding and scale
            label = torch.nn.functional.one_hot(label, num_classes=3) * torch.tensor(255)
            label = label.to(torch.uint8)
            label = label.cpu().numpy()

        if image is not None:
            # scale image values and transpose
            image = image.squeeze().permute(1, 2, 0)
            image = (image * 255).to(torch.uint8)
            image = cv2.cvtColor(image.cpu().numpy(), cv2.COLOR_RGB2BGR)

    return image, label


def predict_image(input_image):
    image = cv2.cvtColor(input_image, cv2.COLOR_BGR2RGB)
    initial_image_size = image.shape[:-1]

    image = custom_transform(image)
    image = image.unsqueeze(dim=0)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    image = image.to(device)

    loaded_model.eval()
    with torch.no_grad():
        predicted_labels = loaded_model(image)

    predicted_labels = torch.nn.functional.interpolate(predicted_labels, size=tuple(INPUT_SHAPE))
    predicted_labels = predicted_labels.squeeze(dim=0).argmax(dim=0)

    image, predicted_labels = apply_inverse_transform(image, predicted_labels)

    predicted_labels = cv2.resize(predicted_labels, initial_image_size)
    
    return predicted_labels

In [22]:
# define the Gradio interface
iface = gr.Interface(
    fn=predict_image,     
    inputs=gr.Image(sources=["upload"]),
    outputs=["image"],
    live=True,
    title="Image segmentation"
)

iface.launch()

# it has pretty bad results because I used a poorly trained version of the network

Running on local URL:  http://127.0.0.1:7919

To create a public link, set `share=True` in `launch()`.


