<a href="https://colab.research.google.com/github/yangyangkiki/LTU-CSE5CV-Term5-Labs/blob/main/Lab_07_Object_Detection_with_Azure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CSE5CV - Object Detection with Azure and the Microsoft Video Analyzer

In this weeks lab we'll be training an object detection model with Microsoft Azure. In addition, we'll also learn how to use the Microsoft Video Analyzer.

## Colab preparation

Google Colab is a free online service for editing and running code in notebooks like this one. To get started, follow the steps below:

1. Click the "Copy to Drive" button at the top of the page. This will open a new tab with the title "Copy of...". This is a copy of the lab notebook which is saved in your personal Google Drive. **Continue working in that copy, otherwise you will not be able to save your work**. You may close the original Colab page (the one which displays the "Copy to Drive" button).
2. Run the code cell below to prepare the Colab coding environment by downloading sample files. Note that if you close this notebook and come back to work on it again later, you will need to run this cell again.

In [None]:
!git clone https://github.com/ltu-cse5cv/cse5cv-labs.git
%cd cse5cv-labs/Lab10

## Packages

We'll be using similar packages to those in the last lab.

In [None]:
%pip install azure-cognitiveservices-vision-customvision~=3.1.0

In [None]:
from pathlib import Path

import cv2
import matplotlib.pyplot as plt

from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

# Utility functions
def load_image_rgb(filepath):
    image = cv2.imread(filepath)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    return image

def display_image(image, title=None):
    fig, axes = plt.subplots(figsize=(18, 12))

    if image.ndim == 2:
        axes.imshow(image, cmap='gray', vmin=0, vmax=255)
    else:
        axes.imshow(image)

    axes.axis('off')

    if title is not None:
        plt.title(title)

    plt.show()

# Some useful functions to help overlay detections (Taken from Lab 9)
COLOURS = [
    tuple(int(colour_hex.strip('#')[i:i+2], 16) for i in (0, 2, 4))
    for colour_hex in plt.rcParams['axes.prop_cycle'].by_key()['color']
]

def draw_detections(img, dets, colours=COLOURS):
    for i, (cl, tlx, tly, brx, bry) in enumerate(dets):
        i %= len(colours)
        cv2.rectangle(img, (tlx, tly), (brx, bry), color=colours[i], thickness=2)

def annotate_class(img, dets, conf=None, colours=COLOURS):
    for i, (cl, tlx, tly, brx, bry) in enumerate(dets):
        txt = cl
        if conf is not None:
            txt += f' {conf[i]:1.3f}'
        # A box with a border thickness draws half of that thickness to the left of the
        # boundaries, while filling fills only within the boundaries, so we expand the filled
        # region to match the border
        offset = 1

        cv2.rectangle(img,
                      (tlx-offset, tly-offset-12),
                      (tlx-offset+len(txt)*12, tly),
                      color=colours[i],
                      thickness=cv2.FILLED)

        ff = cv2.FONT_HERSHEY_PLAIN
        cv2.putText(img, txt, (tlx, tly-1), fontFace=ff, fontScale=1.0, color=(255,)*3)

# 1. Creating Resources

Just like in the previous lab, before we can work with Azure we need to setup a "resource".

We will be using the Custom Vision cognitive service to perform object detection in Azure. Because we will both train a model and use it for inference on new data, we'll need to setup resources to do this (specifically a "Custom Vision" resource).

**Remember**: Before finishing this lab, make sure you remove the Custom Vision resource you create (see Section 1.2).

## 1.1 Create Custom Vision Resource

**Task**: Create a "Custom Vision" resource by following the instructions below.

Starting from: https://portal.azure.com:

1. Click "Create a Resource"
2. Search for "Custom Vision"
3. Select "Custom Vision" and then click "Create"
4. Create a new resource group by clicking "Create new" under "Resource Group". Name it "CSE5CV-Azure".
5. Enter a unique instance name including your student ID and "customvision". For example, `22222222customvision`.
6. Select the free tier for both training and prediction.
7. Click "Review + Create". Then click "Create".

## 1.2 Deleting Resource Groups

**Important**: **To do only when you are done with this lab**

*When you are done with this lab* you can follow these instructions to delete your resource group. This will remove all resources you created in one go.

1. Visit https://portal.azure.com/#blade/HubsExtension/BrowseResourceGroups.
2. Select the resource group (click the name)
3. Click "Delete resource group"
4. Enter the name "CSE5CV-Azure" (this is a clever UI to ensure that you are deleting the resource group you intend to).
5. Click "Delete".

## 1.3 Collecting Communication Credentials

These resources come pre-configured to receive REST requests. However, they are not open to the world; you need to use specific credentials to access them. Here we describe how to get the credentials for your Custom Vision resource.

We will models for the Custom Vision resource to use for prediction. We will choose which model to access using the project id and model's name. However, all of the REST requests will go to the same resource using the same credentials.

**Task**: Collect your Custom Vision resource's REST credentials from Azure.

1. Visit: https://www.customvision.ai
2. Log in with your La Trobe Student account.
3. Click the cog in the top right to view your resources.
4. Open your "Prediction" resource to see the *Prediction endpoint* and the *Prediction key*.
5. Record the endpoint and the key in the following code cell

In [None]:
# TODO: fill with your data
custom_vision_endpoint = ''
custom_vision_key = ''

custom_vision_credentials = ApiKeyCredentials(
    in_headers={"Prediction-key": custom_vision_key}
)
custom_vision_client = CustomVisionPredictionClient(
    endpoint=custom_vision_endpoint,
    credentials=custom_vision_credentials
)

# 2. Fruit Object Detection

Using the Custom Vision resource we created, we will train a model that can detect fruit within an image. To do this, we will annotate images of fruit using the Azure interface. Once we train the model, we will deploy it onto our Custom Vision resource, and make predictions on images from within this notebook using a REST API.

## 2.1 Create a New Project

To get started, we first need to create a new project.

Visit: https://customvision.ai and sign in using the Microsoft account associated with your Azure subscription.

**Task**: Create a new project with the following settings:

* **Name**: Grocery Detection
* **Description**: Object detection for groceries
* **Resource**: (Select the name of the Custom Vision resource you created earlier)
* **Project Types**: Object Detection
* **Domains**: General

After clicking "Create Project" the project will automatically open in your browser.

## 2.2 Upload Training Images

To train our object detection model, we need to upload images that contain classes we want to detect and annotate them.

**Task**: Upload all of your training images.
1. Download and extract the following .zip archive containing fruit images: https://github.com/ltu-cse5cv/cse5cv-labs/releases/download/v0.0.0/fruitdet.zip
2. Back on the Custom Vision web page, click "Add images".
3. Navigate to `fruitdet/train`. Use shift to select all the images in this folder and upload them.

**Question**: What things do you think we should annotate for this task?

<details>
<summary style='cursor:pointer;'><u>Answer</u></summary>

For every piece of fruit in the image, we should annotate:
* The class of the fruit (e.g. apple, banana, orange)
* The bounding box coordinates of the fruit (e.g. tlx, tly, brx, bry coordinates)
    
</details>

## 2.3 Annotating the Images

After all images have been uploaded, the next step is to annotate them.

**Task**: Annotate all training images that you have uploaded.

1. Click on the first image that has been uploaded. Once you do this, you should see an "Image Detail" screen.
2. There are two ways you can draw boxes around objects in the image:
    * Click and drag your mouse to draw a box around one of the objects
    * Hover your mouse over any of the objects in the image until an automatically detected region is displayed like below. Click the region once it is displayed.
3. Once the region is selected, resize the region as necessary to capture the whole object, then assign a tag (class) to the object. This should be: *apple*, *banana*, or *orange*.
4. Repeat this process to annotate all objects within the image.
5. Use the **>** button on the right to go to the next image, and tag its objects. Keep working through the entire set of images, tagging each *apple*, *banana*, and *orange*.
6. Once you've finished tagging the last image, close the **Image Detail** editor and in the left hand panel, under **Tags**, select **Tagged** to see all of your tagged images.

**Question**: How many examples of each class do we have?

<details>
<summary style='cursor:pointer;'><u>Answer</u></summary>

Across the 33 images, you should have:
* 18 apples
* 20 bananas
* 20 oranges
    
</details>

## 2.4 Train Model

Now that the dataset is fully annotated, you're ready to train a model.

**Task**: Click the green "Train" button in the top-right. Choose "Quick Training". Click "Train".

While it is training, we will collect the project id.

**Task**: Click the cog in the top-right to get to the project settings. Copy the "Project ID" into the `project_id` variable in the next cell.

In [None]:
# TODO: Fill in project id
project_id = ''

**Task**: Click "Performance" in the top-right. This will take about 10 minutes to complete training. While you're waiting for the model to train, do the task at the start of [Section 3](#3.-Microsoft-Video-Analyzer), them come back here.

## 2.5 Evaluate Model

When your model has finished training you will be presented with a page showing precision, recall, and AP performance metrics.

**Question**: In the context of object detection, how do we know if a detection is a true positive?

<details>
<summary style='cursor:pointer;'><u>Answer</u></summary>

A detection is considered as a true positive if the Intersection over Union (IoU) between the detection and ground truth is greater than some threshold. By default, Azure has set this threshold to 30% (See the left hand panel).
    
</details>

Before deploying our model, let's see how we can use the Azure interface to test our newly trained model.


**Task**: Click the "Quick Test" button in the top-right and upload the image found at `fruitdet/test/apple_orange.png` (Click "Browse local files")

**Question**: How were the predictions? Were the class predictions correct? Did the boxes bound the objects well?

<details>
<summary style='cursor:pointer;'><u>Answer</u></summary>

Tutors answer: The trained model detected both the apple and orange well, both with high confidence (~97% and ~95% respectively). The detected locations also covered the objects well.
    
</details>

Whilst still in Azure, let's also test on an image we can find on the web.

Take a look at the image here: https://unsplash.com/photos/DapP9j2DJMQ. How do you think your detection model will perform?

**Task**: In the "Quick Test" window, paste the following URL into the "Image URL" section (This is a direct link to the image shown just before): https://images.unsplash.com/photo-1507260385058-676ee3f043e3?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1400&q=80

**Question**: How were the predictions?

<details>
<summary style='cursor:pointer;'><u>Answer</u></summary>

Tutors answer: My model was able to detect all 3 apples with reasonably high confidence (lowest was ~75%), but your results may vary.
    
</details>

## 2.5 Deploy model

**Task**: Navigate back to the "Performance" page and click "Publish" (in the top-left corner). Use the following settings:
* **Model name**: detect-produce
* **Prediction resource**: (the name of your resource ending in "-Prediction")

Then click "Publish"

In the next cell we interact with the model that we just trained and published/deployed on our resource. Publishing is instant; you can run the next cell right away.

In [None]:
model_name = 'detect-produce'

with open(Path('fruitdet', 'test', 'produce.jpg'), 'rb') as f:
    response = custom_vision_client.detect_image(project_id, model_name, f.read())

# The detections are returned in descending order of probability
# Here, we inspect the most confident detection
print('The whole response:    -------')
print(response)
print('------------------------------')
print()
print('Most confident detection: ', response.predictions[0])
print('Associated bounding box: ', response.predictions[0].bounding_box)

**Question**: Looking at the bounding box information returned for the most confident detection, what form are the bounding box coordinates in?

<details>
<summary style='cursor:pointer;'><u>Answer</u></summary>

The keys of the bounding box data consist of: *left*, *top*, *width*, and *height*, so the form of the bounding box is in `tlx`, `tly`, `bw`, `bh` form.
    
Additionally, the coordinates are normalized to the image dimensions (You can tell this because the coordinates are in the range [0, 1])
    
</details>

## 2.6 Test Model

Now we will visually evaluate another test image, this time with Python!

In [None]:
test_image_path = Path('fruitdet', 'test', 'produce.jpg')
conf_thresh = 0.5          # Only overlay detections with a confidence score > this

# Detect objects in image
with open(test_image_path, 'rb') as f:
    results = custom_vision_client.detect_image(project_id, model_name, f.read())

# Load the image data (and determine dimensions)
image = load_image_rgb(str(test_image_path))
im_h, im_w = image.shape[:2]

class_colours = {
    'apple': (255, 0, 0),
    'banana': (187, 88, 149),
    'orange': (0, 0, 255),
}

# Overlay all detected boxes
if results.predictions:
    objs = []
    confs = []
    colours = []
    for detection in results.predictions:
        if detection.probability > conf_thresh:
            box = detection.bounding_box
            tlx, tly, brx, bry = box.left, box.top, box.left + box.width, box.top + box.height
            tlx, tly, brx, bry = int(tlx * im_w), int(tly * im_h), int(brx * im_w), int(bry * im_h)
            objs.append((detection.tag_name, tlx, tly, brx, bry))
            confs.append(detection.probability)
            colours.append(class_colours[detection.tag_name])
    draw_detections(image, objs, colours=colours)
    annotate_class(image, objs, conf=confs, colours=colours)


# Display the image
display_image(image)

**Question**: Looking at the response in the deploy model section, it appeared that the list of predictions was very large (quite a lot larger than 3). Why do we only see 3 detections overlaid onto the image?

<details>
<summary style='cursor:pointer;'><u>Answer</u></summary>

Because our confidence threshold is set to 0.5, all other predictions have been filtered out. Set `conf_thresh` to 0 and look at all detections that have been overlaid.
    
This is a good example showing the importance of setting a confidence threshold on the predictions of your model. Choosing a good threshold can be a tricky process, and you will find that when choosing this threshold, there is a tradeoff between precision and recall.    
</details>

## 2.7 Summary

You used www.customvision.ai to upload and annotate a training set for object detection, which you used to train a model and deploy it on Azure infrastructure. This model was accessible through a REST client. Azure provides a simple wrapper for us to use to communicate: `custom_vision_client`. In each request, we provided: `project_id`, `model_name` and some image data, and got back a prediction. Finally, we checked that it was working by sending unseen images.

## 2.8 Remove Project

The free tier only allows up to 2 projects active at once. Here are the instructions for removing this project.

1. Unpublish all of your models
2. Click the eye in the top left
3. Hover your project and click the trash can

# 3. Microsoft Video Analyzer

In Lab 9, we saw how we could provide an image to a Microsoft Azure Cognitive Services to perform a suite of analysis options. Now, we will do something similar using Video, but through an entirely new interface. This new interface is not tied to Azure directly, and you cannot use a La Trobe student account to do this. If you do not have a personal Microsoft account, you should make one. You do not need any credits or a special subscription for this section.

There is no need to clean up anything after this task.

**Task**: Analyze a video with Microsoft Video Analyzer.
 1. Visit www.videoindexer.ai
 2. Sign in with "Personal Microsoft account"
 3. Click "Upload"
 4. Click "Enter URL"
 5. Enter the URL: https://aka.ms/responsible-ai-video and click "Add".
 6. Click "Upload + index".
 7. It will take about 5-10 minutes to analyze. Go stand outside and look at a plant in your garden. I mean really look at it. See if there's any insects crawling on it. How many branches does it have? Or just wait at your computer if you prefer.

## 3.1 Observe Insights

The analysis has detected many "insights".

The insights listed are:

 1. People: A list of the people shown in the video using facial features. Given the number of quick shots of groups of people, it is unlikely to have found every person. Some of the people have been identified. These are "famous" people that can be reliably searched for on the internet.
 2. Topics: A list of topics of the video. Microsoft has determined their own categorisation of topics, and fit the video into them based on the transcription.
 3. Keywords: A list of keywords, distinct from the topics, more free-form.
 4. Labels: A list of objects that were detected in the video
 5. Named entities: People and brands mentioned/shown in the video.
 6. Scenes: Segmenting the video into distinct scenes/shots.

Spend some time looking through the insights. Each insight shows where in the video timeline it was detected, and you can click on the timelines in the "Insights" pane to skip to that location. Look at the things it got correct and incorrect. The next section is a guided tour of some errors.

## 3.2 Errors

While it gets a lot right, it also makes some errors. It is useful to know in what ways this system will fail.

**Error 1**: It missed many people. In particular, in the classroom shots, only the front-most students are detected. Perhaps this isn't an "error", since it finds the most prominent people well enough.

**Error 2**: The scenes group shots haphazardly. Although the shots appear to be well detected, the scenes are apparently random collections of adjacent shots.

What errors did you find?

## 3.3 OCR

During the video, there were a number of text elements that appeared to complement the narrators spoken words. Part of the indexing/analysis used OCR on the video. Let's see how it did.

**Task**: List all the text detected with OCR.
  1. In the "Insights" pane, click "Timeline".
  2. Click the "View" dropdown.
  3. Untick "Transcript".
  4. Tick "OCR".
  5. Click "View" again to hide the dropdown.

**Question**: How well did it do?

<details>
<summary style='cursor:pointer;'><u>Answer</u></summary>

It appears to have done quite well overall! It finds the name plates for the people, and some of the code shown for a few seconds. It also recognises the 6 principles of AI ethics at 0:38.
</details>
<br />

##  3.4 Content Searching

Another useful feature of the Microsoft Video Analyzer is the ability to search your video for content by keyword. For example. Do you remember where the 2 second shot of a bee appears in the video? Let's use the Insights to find it.

**Task**: Find where the bee appears in the video.
  1. Ensure "Insights" is selected on the right-hand pane.
  2. Search "bee".
  3. Scroll to "Labels" in the insights.

**Question**: What timestamp did the bee appear?

<details>
<summary style='cursor:pointer;'><u>Answer</u></summary>

The bee appears at 0:06 for about 2s.
</details>
<br />

**Question**: Name 3 other "labels" that exclusively point to this shot of a bee. (You will have to clear "bee" from the search to see the full list of labels again)

<details>
<summary style='cursor:pointer;'><u>Answer</u></summary>

I could find: "invertibrate", "insect", "fly", "arthropod", "pest", "animal", "membrane-winged insect", "macro photography" and "net-winged insects". There are perhaps more.

This is good. You could use any of those words in your search to find the bee.
</details>
<br />

## 3.5 Microsoft Video Analyzer Summary

We uploaded a video to Microsoft Video Analyzer, observed it's insights and qualitatively evaluated it's results.

# Reminder: Shut down resource group

Make sure to close down the resource group for section 2 when you are done. See section 1.2 for instructions.

# Summary

In this lab, you saw how to train a custom object detection model using Microsoft Azure with Custom Vision resources. In addition, you also learned how to use the Microsoft Video Analyzer.