# Object Detection: AML Package for Computer Vision

Object Detection is one of the main problems in Computer Vision. Traditionally, this required expert knowledge to identify and implement so called “features” that highlight the position of objects in the image. Starting in 2012 with the famous [AlexNet paper](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf), Deep Neural Networks are used to automatically find these features.

This notebook shows how the Azure Machine Learning Package for Computer Vision can be used to train, evaluate, and deploy a [Faster R-CNN](https://arxiv.org/abs/1506.01497) object detection model. The **Azure Machine Learning Package for Computer Vision** makes it easy to perform all these steps, and internally uses [Tensorflow's implementation](https://arxiv.org/abs/1611.10012) of the model. This model was shown to produce state-of-the-art results for Pascal VOC, one of the main object detection challenges in the field. For more information, see the [Tensorflow object detection website](https://github.com/tensorflow/models/tree/master/research/object_detection).

When building and deploying a model with this package, you go through the following steps:
1.	Dataset Creation
2.	Deep Neural Network (DNN) Model Definition
3.	Model Training
4.	Evaluation and Visualization
5.	Web service Deployment
6.	Web service Load Testing

## Example Dataset

For this demo, a dataset of grocery items inside refrigerators is provided, consisting of 30 images and 8 classes (eggBox, joghurt, ketchup, mushroom, mustard, orange, squash, and water). For each jpg image, there's an annotation xml-file with similar name. 

The following figure shows the recommended folder structure. 
![folder structure](media/how-to-build-deploy-object-detection-models/data_directory.JPG)

## Image Annotation

Annotated object locations are required to train and evaluate an object detector. [LabelImg](https://tzutalin.github.io/labelImg) is an open source annotation tool that can be used to annotate images. LabelImg writes an xml-file per image in Pascal-VOC format, which can be read by this package. 

Here's a screenshot of the tool's UI.
![tool UI](media/how-to-build-deploy-object-detection-models/labeImg.JPG)

## Storage Context
The storage context is used to determine where various output files such as DNN model files are stored. For more information, see the [StorageContext documentation](https://docs.microsoft.com/en-us/python/api/cvtk.core.context.storagecontext?view=azure-ml-py-latest). Normally, the storage content does not need to be set explicitly. However, to avoid the Workbench project size limit of 25 MB, set the outputs directory to point to a location outside the AML project ("../../../../cvtk_output"). Make sure to remove the "cvtk_output" directory once it is no longer needed.

In [None]:
import warnings
warnings.filterwarnings("ignore")
import os, time
from cvtk.core import Context, ObjectDetectionDataset, TFFasterRCNN
from cvtk.utils import detection_utils
from matplotlib import pyplot as plt

# Disable printing of logging messages
from azuremltkbase.logging import ToolkitLogger
ToolkitLogger.getInstance().setEnabled(False)

# Initialize the context object
out_root_path = "../../../cvtk_output"
Context.create(outputs_path=out_root_path, persistent_path=out_root_path, temp_path=out_root_path)

# Display the images
%matplotlib inline

## Create a dataset

Create a CVTK dataset that consists of a set of images, with their respective bounding box annotations. In this example, the refrigerator images that are provided in the "../sample_data/foods/training" folder are used. Only JPEG images are supported.

In [None]:
image_folder = "../sample_data/foods/train"
data_train = ObjectDetectionDataset.create_from_dir(dataset_name='training_dataset', data_dir=image_folder,
                                                    annotations_dir="Annotations", image_subdirectory='JPEGImages')

# Show some statistics of the training image, and also give one example of the ground truth rectangle annotations
data_train.print_info()
_ = data_train.images[2].visualize_bounding_boxes(image_size = (10,10))

## Define a model

In this example, the Faster R-CNN model is used. Various parameters can be provided when defining this model. The meaning of these parameters, as well as the parameters used for training (see next section) can be found in either CVTK's API docs, or on the [Tensorflow object detection website](https://github.com/tensorflow/models/tree/master/research/object_detection). More information about Faster R-CNN model can be found at [this link](https://docs.microsoft.com/en-us/cognitive-toolkit/Object-Detection-using-Faster-R-CNN#technical-details). This model is based on Fast R-CNN and more information about it can be found [here](https://docs.microsoft.com/en-us/cognitive-toolkit/Object-Detection-using-Fast-R-CNN#algorithm-details).

In [None]:
score_threshold = 0.0       # Threshold on the detection score, use to discard lower-confidence detections.
max_total_detections = 300  # Maximum number of detections. A high value slows down training but might increase accuracy.
my_detector = TFFasterRCNN(labels=data_train.labels, 
                           score_threshold=score_threshold, 
                           max_total_detections=max_total_detections)

## Train the model

The COCO-trained Faster R-CNN model with ResNet50 is used as the starting point for training. 

In this example, the number of detector training steps is set to 350 for speedy training (~5 minutes with GPU). However, in practice, a good rule of thumb is to set the steps to 10 or more times the number of images in the training set.

Two key parameters for training are:
- Number of steps to train the model, represented by the num_seps argument. Each step trains the model with a minibatch of batch size one
- Learning rate(s), which can be set by initial_learning_rate

In [None]:
print("tensorboard --logdir={}".format(my_detector.train_dir))

# to get good results, use a larger value for num_steps, e.g., 5000.
num_steps = 350
learning_rate = 0.001 # learning rate

start_train = time.time()
my_detector.train(dataset=data_train, num_steps=num_steps, 
                  initial_learning_rate=learning_rate)
end_train = time.time()
print(end_train-start_train)

TensorBoard can be used to visualize the training progress. TensorBoard events are located in the folder specified by the model object's train_dir attribute. To view TensorBoard, follow these steps:
1. Copy the printout that starts with 'tensorboard --logdir' to a command line and run it. 
2. Copy the returned URL from the command line to a web browser to view the TensorBoard. 

The TensorBoard should look like the following screenshot. It takes a few moments for the training folder to be populated. So if TensorBoard does not show up correctly the first time try repeating steps 1-2. 

![tensorboard](media/how-to-build-deploy-object-detection-models/tensorboard.JPG)

## Evaluate the model

The 'evaluate' method is used to evaluate the model. This function requires an ObjectDetectionDataset object as an input. The evaluation dataset can be created using the same function as the one used for the training dataset. The supported metric is Average Precision as defined for the [PASCAL VOC Challenge](http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham10.pdf).  

In [None]:
image_folder = "../sample_data/foods/test"
data_val = ObjectDetectionDataset.create_from_dir(dataset_name='val_dataset', data_dir=image_folder)
eval_result = my_detector.evaluate(dataset=data_val)

The evaluation results can be printed out in a clean format.

In [None]:
# print out the performance metric values
for label_obj in data_train.labels:
    label = label_obj.name
    key = 'PASCAL/PerformanceByCategory/AP@0.5IOU/' + label
    print('{0: <15}: {1: <3}'.format(label, round(eval_result[key], 2)))
print('{0: <15}: {1: <3}'.format("overall:", round(eval_result['PASCAL/Precision/mAP@0.5IOU'], 2))) 

Similarly, you can compute the accuracy of the model on the training set. Doing this helps make sure training converged to a good solution. The accuracy on the training set after successful training is often close to 100%.

Evaluation results can also be viewed from TensorBoard, including mAP values and images with predicted bounding boxes. Copy the printout from the following code into a command line window to start the TensorBoard client. Here a port value 8008 is used to avoid conflict with the default value of 6006, which was using for viewing training status.

In [None]:
print("tensorboard --logdir={} --port=8008".format(my_detector.eval_dir))

## Score an image

Once you're satisfied with the performance of the trained model, the model object's 'score' function can be used to score new images. The returned scores can be visualized with the 'visualize' function . 

In [None]:
image_path = data_val.images[1].storage_path
detections_dict = my_detector.score(image_path)
path_save = out_root_path + "/scored_images/scored_image_preloaded.jpg"
ax = detection_utils.visualize(image_path, detections_dict, image_size=(8, 12))
path_save_dir = os.path.dirname(os.path.abspath(path_save))
os.makedirs(path_save_dir, exist_ok=True)
ax.get_figure().savefig(path_save)

##  Save the model

The trained model can be saved to disk, and loaded back into memory, as shown in the following code examples.

In [None]:
save_model_path = out_root_path + "/frozen_model/faster_rcnn.model" # Please save your model to outside of your AML workbench project folder because of the size limit of AML project
my_detector.save(save_model_path)

## Load the saved model for scoring

To use the saved model, load the model into memory with the 'load' function. You only need to load once. 

In [None]:
my_detector_loaded = TFFasterRCNN.load(save_model_path)

After the model is loaded, it can be used to score an image or a list of images. For a single image, a dictionary is returned with keys such as 'detection_boxes', 'detection_scores', and 'num_detections'. If the input is a list of images, a list of dictionary is returned, with one dictionary corresponding to one image. 

In [None]:
detections_dict = my_detector_loaded.score(image_path)

The detected objects with scores above 0.5, including labels, scores, and coordinates can be printed out.

In [None]:
look_up = dict((v,k) for k,v in my_detector.class_map.items())
n_obj = 0
for i in range(detections_dict['num_detections']):
    if detections_dict['detection_scores'][i] > 0.5:
        n_obj += 1
        print("Object {}: label={:11}, score={:.2f}, location=(top: {:.2f}, left: {:.2f}, bottom: {:.2f}, right: {:.2f})".format(
            i, look_up[detections_dict['detection_classes'][i]], 
            detections_dict['detection_scores'][i], 
            detections_dict['detection_boxes'][i][0],
            detections_dict['detection_boxes'][i][1], 
            detections_dict['detection_boxes'][i][2],
            detections_dict['detection_boxes'][i][3]))    
        
print("\nFound {} objects in image {}.".format(n_obj, image_path))           

Visualize the scores just like before.

In [None]:
path_save = out_root_path + "/scored_images/scored_image_frozen_graph.jpg"
ax = detection_utils.visualize(image_path, detections_dict, path_save=path_save, image_size=(8, 12))
# ax.get_figure() # use this code extract the returned image

## Operationalization: deploy and consume

<b>Prerequisites:</b> 
Check the **Prerequisites** section of the deployment notebook to set up your deployment CLI. You only need to set it up once for all your deployments. More deployment-related topics including IoT Edge deployment can be found in the deployment notebook.
       
<b>Deployment API:</b>

> **Examples:**
- ```deploy_obj = AMLDeployment(deployment_name=deployment_name, associated_DNNModel=dnn_model, aml_env="cluster")``` # create deployment object
- ```deploy_obj.deploy()``` # deploy web service
- ```deploy_obj.status()``` # get status of deployment
- ```deploy_obj.score_image(local_image_path_or_image_url)``` # score an image
- ```deploy_obj.delete()``` # delete the web service
- ```deploy_obj.build_docker_image()``` # build docker image without creating webservice
- ```AMLDeployment.list_deployment()``` # list existing deployment
- ```AMLDeployment.delete_if_service_exist(deployment_name)``` # delete if the service exists with the deployment name

<b>Deployment management with portal:</b>

You can go to [Azure portal](https://ms.portal.azure.com/) to track and manage your deployments. From Azure portal, find your Machine Learning Model Management account page (You can search for your model management account name). Then go to: the model management account page->Model Management->Services.

In [None]:
# ##### OPTIONAL - Interactive CLI setup helper ###### 
# # Interactive CLI setup helper, including model management account and deployment environment.
# # If you haven't setup you CLI before or if you want to change you CLI settings, you can use this block to help you interactively.
# # UNCOMMENT THE FOLLOWING LINES IF YOU HAVE NOT CREATED OR SET THE MODEL MANAGEMENT ACCOUNT AND DEPLOYMENT ENVIRONMENT

# from azuremltkbase.deployment import CliSetup
# CliSetup().run()

In [None]:
from cvtk.operationalization import AMLDeployment

# set deployment name
deployment_name = "wsdeployment"

# Create deployment object
# It will use the current deployment environment (you can check it with CLI command "az ml env show").
deploy_obj = AMLDeployment(deployment_name=deployment_name, aml_env="cluster", associated_DNNModel=my_detector, replicas=1)

# Alternatively, you can provide azure machine learning deployment cluster name (environment name) and resource group name
# to deploy your model. It will use the provided cluster to deploy. To do that, please uncomment the following lines to create 
# the deployment object.

# azureml_rscgroup = "<resource group>"
# cluster_name = "<cluster name>"
# deploy_obj = AMLDeployment(deployment_name=deployment_name, associated_DNNModel=my_detector,
#                            aml_env="cluster", cluster_name=cluster_name, resource_group=azureml_rscgroup, replicas=1)

# Check if the deployment name exists, if yes remove it first.
if deploy_obj.is_existing_service():
    AMLDeployment.delete_if_service_exist(deployment_name)
    
# create the webservice
print("Deploying to Azure cluster...")
deploy_obj.deploy()
print("Deployment DONE")

### Consume the web service

Once you created the webservice, you can score images with the deployed webservice. You have several options:

   - You can directly score the webservice with the deployment object with: deploy_obj.score_image(image_path_or_url) 
   - Or, you can use the Service endpoint url and Service key (None for local deployment) with: AMLDeployment.score_existing_service_with_image(image_path_or_url, service_endpoint_url, service_key=None)
   - Form your http requests directly to score the webservice endpoint (For advanced users).

### Score with existing deployment object
```
deploy_obj.score_image(image_path_or_url)
```

In [None]:
# Score with existing deployment object

# Score local image with file path
print("Score local image with file path")
image_path_or_url = data_train.images[0].storage_path
print("Image source:",image_path_or_url)
serialized_result_in_json = deploy_obj.score_image(image_path_or_url, image_resize_dims=[224,224])
print("serialized_result_in_json:", serialized_result_in_json[:50])

# Score image url and remove image resizing
print("Score image url")
image_path_or_url = "https://cvtkdata.blob.core.windows.net/publicimages/microsoft_logo.jpg"
print("Image source:",image_path_or_url)
serialized_result_in_json = deploy_obj.score_image(image_path_or_url)
print("serialized_result_in_json:", serialized_result_in_json[:50])


In [None]:
# Time image scoring
import timeit

num_images = 3
for img_index, img_obj in enumerate(data_train.images[:num_images]):
    print("Calling API for image {} of {}: {}...".format(img_index, num_images, img_obj.name))
    tic = timeit.default_timer()
    return_json = deploy_obj.score_image(img_obj.storage_path, image_resize_dims=[224,224])
    print("   Time for API call: {:.2f} seconds".format(timeit.default_timer() - tic))

### Score with service endpoint url and service key
```
    AMLDeployment.score_existing_service_with_image(image_path_or_url, service_endpoint_url, service_key=None)
```

In [None]:
# Import related classes and functions
from cvtk.operationalization import AMLDeployment

service_endpoint_url = "http://xxx" # please replace with your own service url
service_key = "xxx" # please replace with your own service key

# score image url
image_path_or_url = "https://cvtkdata.blob.core.windows.net/publicimages/microsoft_logo.jpg"
print("Image source:",image_path_or_url)
serialized_result_in_json = AMLDeployment.score_existing_service_with_image(image_path_or_url,service_endpoint_url, service_key = service_key, image_resize_dims=[224,224])
print("serialized_result_in_json:", serialized_result_in_json[:50])

### Score endpoint with http request directly
Following is some example code to form the http request directly in Python. You can do it in other programming languages.

In [None]:
def score_image_with_http(image, service_endpoint_url, service_key=None, parameters={}):
    """Score local image with http request

    Args:
        image (str): Image file path
        service_endpoint_url(str): web service endpoint url
        service_key(str): Service key. None for local deployment.
        parameters (dict): Additional request paramters in dictionary. Default is {}.


    Returns:
        str: serialized result 
    """
    import requests
    from io import BytesIO
    import base64
    import json

    if service_key is None:
        headers = {'Content-Type': 'application/json'}
    else:
        headers = {'Content-Type': 'application/json',
                   "Authorization": ('Bearer ' + service_key)}
    payload = []
    encoded = None
    
    # Read image
    with open(image,'rb') as f:
        image_buffer = BytesIO(f.read()) ## Getting an image file represented as a BytesIO object
        
    # Convert your image to base64 string
    # image_in_base64 : "b'{base64}'"
    encoded = base64.b64encode(image_buffer.getvalue())
    image_request = {"image_in_base64": "{0}".format(encoded), "parameters": parameters}
    payload.append(image_request)
    body = json.dumps(payload)
    r = requests.post(service_endpoint_url, data=body, headers=headers)
    try:
        result = json.loads(r.text)
        json.loads(result[0])
    except:
        raise ValueError("Incorrect output format. Result cant not be parsed: " + r.text)
    return result[0]


### Parse serialized result from webservice
The result from the web service is in json string that can be parsed.

In [None]:
image_path_or_url = image_path
print("Image source:",image_path_or_url)
serialized_result_in_json = deploy_obj.score_image(image_path_or_url)
print("serialized_result_in_json:", serialized_result_in_json[:50])

In [None]:
# Parse result from json string
import numpy as np
parsed_result = TFFasterRCNN.parse_serialized_result(serialized_result_in_json)
print("Parsed result:", parsed_result)

In [None]:
ax = detection_utils.visualize(image_path, parsed_result)
path_save = "../../../cvtk_output/scored_images/scored_image_web.jpg"
path_save_dir = os.path.dirname(os.path.abspath(path_save))
os.makedirs(path_save_dir, exist_ok=True)
ax.get_figure().savefig(path_save)

## APPENDIX 
### (A) Using a pre-trained model

Sometimes you may want to use a pre-trained model out-of-the-box. For example, since TensorFlow's Faster R-CNN model has been trained on the [COCO dataset](http://mscoco.org), you can initialize a model object and use its 'score' function to score images.

#### Initialize the model

In [None]:
my_detector_pt = TFFasterRCNN(labels=None, name="pretrained")
frozen_model_path, label_map_path = my_detector_pt.init_pretrained(use_frozen=True)
print("Frozen model written to path: " + frozen_model_path)
print("Labels written to path: " + label_map_path)

#### Score with using preloaded model

The 'score' function can be used to score images and the returned scores can be visualized.

In [None]:
detections_dict = my_detector_pt.score(image_path)
path_save = "../../../cvtk_output/scored_images/scored_image_pretrained.jpg"
image_size = (8, 12)
ax = detection_utils.visualize(image_path, detections_dict, label_map_path, path_save=path_save,
                              image_size=image_size)
# ax.get_figure()

#### Score with using frozen graph

Alternatively, you can score images using a utility function. To do that, load the frozen graph first. Run this once as the loaded graph can be reused. The frozen graph is a single file that's generated from a graph definition and a set of checkpoints, stripping away all the nodes that aren't used for forward inference. For more information, see [A Tool Developer's Guide to TensorFlow Model Files](https://www.tensorflow.org/extend/tool_developers/). So a frozen graph is easier to maintain. However, frozen graph has been [reported](https://github.com/tensorflow/models/issues/3270) as being slow due to its lack of ability to optimize the GPU/CPU assignment. 

In [None]:
detection_graph = detection_utils.load_graph(frozen_model_path)

Then score an image and visualize the scores.

In [None]:
detections_dict = detection_utils.score(detection_graph, image_path)
path_save = "../../../cvtk_output/scored_images/scored_image_pretrained_frozen.jpg"
image_size = (8, 12)
ax = detection_utils.visualize(image_path, detections_dict, label_map_path, path_save=path_save,
                              image_size=image_size)
# ax.get_figure()

### (B) Webcam scoring

The model can also be used to read frames from a webcam (or optionally from disk) and score them. As detector, a pre-trained COCO model is used, but any trained detector can be used. 

To run the following code successfully using data from disk, it is recommended that you copy the code into a Python script, remove the line "%matplotlib inline" and run it outside of Jupyter notebook. Otherwise, only a single scored image shows up and the Jupyter kernel might stop responding. This problem does not occur if the images come from an actual webcam.

In [None]:
import cv2
from cvtk.core import Context, ObjectDetectionDataset, TFFasterRCNN
from cvtk.utils.detection_utils import FilepathImageProvider, VideoImageProvider
%matplotlib inline

out_root_path = "../../../cvtk_output"
Context.create(outputs_path=out_root_path, persistent_path=out_root_path, temp_path=out_root_path)

# Initialize detector with pre-trained model
my_detector = TFFasterRCNN(labels=None, name="pretrained")
my_detector.init_pretrained()

# Choose image provider
image_provider = VideoImageProvider() # read images from webcam
# image_folder = "../sample_data/foods/test"
# data_val = ObjectDetectionDataset.create_from_dir(dataset_name='val_dataset', data_dir=image_folder)
# image_provider = FilepathImageProvider([image.storage_path for image in data_val.images])  #read images from disk
# image_provider = VideoImageProvider(cv2_video_capture=cv2.VideoCapture("movie.mp4")) #read images from video file

# Run object detection
_ = my_detector.score_multiple(image_provider, visualize=True) 

© 2018 Microsoft. All rights reserved. 