# Practice

Now try to use the model to detect faces in a video. Fill in the gaps in the code blocks below. For more information about the OpenVINO Inference Engine Python API, see the [official documentation](https://docs.openvinotoolkit.org/latest/ie_python_api/annotated.html).

In [None]:
pip install -r ${INTEL_OPENVINO_DIR}/deployment_tools/open_model_zoo/tools/downloader/requirements.in

In [None]:
!pip install -r requirements.txt

In [None]:
import os
from pathlib import Path

# Contains all data for the workshop
WORKSHOP_MODEL_PATH = Path('./model')

# Path to the Inference Engine model
# But you can use the INT8 model instead
MODEL_PATH_XML = WORKSHOP_MODEL_PATH / 'retinaface-resnet50-pytorch.xml'
MODEL_PATH_BIN = WORKSHOP_MODEL_PATH / 'retinaface-resnet50-pytorch.bin'

DEVICE = 'CPU'

DATA_PATH = Path('./data')
INPUT_VIDEO = DATA_PATH / 'input.mp4'
OUTPUT_VIDEO = DATA_PATH / 'output.MP4'

In [None]:
from IPython.display import HTML


# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{INPUT_VIDEO}" type="video/mp4"></video>""")

In [None]:
def prapare_out_video_stream(input_video_stream):
    width  = int(input_video_stream.get(3))
    height = int(input_video_stream.get(4))
    return cv2.VideoWriter(str(OUTPUT_VIDEO), cv2.VideoWriter_fourcc(*'avc1'), 20, (width, height))

In [None]:
# Import OpenCV for work with a video and images
import cv2

# Import the Inference Engine
from openvino.inference_engine import IECore, IENetwork

from RetinaFacePostProcessing.retinaface_post_processing import RetinaFacePostPostprocessor

### Step 1: Create an instance of the OpenVINO Inference Engine `IECore` class
This class represents an Inference Engine entity 
and allows you to manipulate plugins using unified interfaces. 

In [None]:
ie = IECore()

### Step 2: Read the prepared model

**WARNING**
The `IENetwork` class is designed to work with a model in the Inference Engine.
This class contains information about the network model read from the Intermediate Representation
and allows you to manipulate some model parameters such as layers affinity and output layers.

You need to create an instance of the IENetwork class.
A constructor of this class has two parameters: 
 1. path to the .xml file of the model 
 2. path to the .bin file of the model

In [None]:
net = ie.read_network(MODEL_PATH_XML, MODEL_PATH_BIN)

### Step 3: Get the name of the input layer of the model

To infer a model, you need to know input layers of the model
The object `net` contains information about inputs of the network in a property `inputs`,
which is a dictionary: key - name of the input layer, volume - representation of the input network.
In this case, you need to get only the name of the input. `input_blob` should be a string.

In [None]:
input_name = next(iter(net.input_info))
input_blob = net.input_info[input_name].input_data

print(f'Input layer of the network is {input_name}')

### Step 4: Get shape (dimensions) of the input layer of the network

* n - number of batches
* c - number of input image channels (usualy 3 - R, G and B) 
* h - height
* w - width

In [None]:
n, c, input_layer_h, input_layer_w = input_blob.shape

print(f'Input shape of the network: [{n}, {c}, {input_layer_h}, {input_layer_w}]')

In [None]:
out_blob = next(iter(net.outputs))

### Step 6: Load the network to a device

Use the instance of `IECore`.
The class `IECore` has a special function called `load_network`, which loads a network to a device.
This function prepares the network for the first inference on the device 
and returns an instance of the network prepared for an inference (execution). 
This function has many parameters, but in this case, you need to know only about two of them:
* `network` - instance of `IENetwork`
* `device_name` - string, contains a device name to infer a model on: CPU, GPU and so on.

In [None]:
network_loaded_to_device = ie.load_network(net, DEVICE)

### Step 7: Open the input video

In [None]:
input_video_stream = cv2.VideoCapture(str(INPUT_VIDEO))

### Step 8: Create an output video stream

In [None]:
out = prapare_out_video_stream(input_video_stream)

### Step 9: Function for processing inference results

The output layer of the SSD MobileNet V2 is the DetectionOutput layer.
Data layout of this layer (and `obj` variable):
 * element 0 - the batch ID (not important in our case)
 * element 1 - the class ID of a discovered object 
 * element 2 - the confidence for the object
 * element 3 and 4 - coordinates of the upper-left corner of the box of the object
 * element 5 and 6 - coordinates of bottom-right corner of the box of the object

In [None]:
def draw_boxes_in_frame(frame, obj):       
    # Step 13: Get the confidence for a discovered object
    confidence =  obj[4]
        
    # Step 14: Draw bounding boxes
    # Draw a bounding box only for objects the confidence of which is greater than a specified threshold
    # Get coordinates of a discovered object
    xmin = int(obj[0])
    ymin = int(obj[1])

    xmax = int(obj[2])
    ymax = int(obj[3])

    # Get confidence for a discovered object
    confidence = round(confidence * 100, 1)

    # Draw a box and a label
    color = (0, 255, 0)
    cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color, 2)


    # Create the title of an object
    text = f'{confidence}%'

    # Put the title to a frame
    cv2.putText(frame, text, (xmin, ymin - 7), cv2.FONT_HERSHEY_COMPLEX, 2, color, 2)

### Step 10: Loop over frames in the input video

In [None]:
frame_w = int(input_video_stream.get(3))   # float `width`
frame_h = int(input_video_stream.get(4))  # float `height`

postprocessor = RetinaFacePostPostprocessor(origin_image_size=[frame_w, frame_h], input_image_size=[input_layer_h, input_layer_w])
    
while input_video_stream.isOpened():
    # Read the next frame from the intput video 
    ret, frame = input_video_stream.read()
    # Check if the video is over
    if not ret:
        # Exit from the loop if the video is over
        break 
    
    # Resize the frame to the network input 
    in_frame = cv2.resize(frame, (input_layer_w, input_layer_h))
    
    # Change the data layout from HWC to CHW
    in_frame = in_frame.transpose((2, 0, 1))  
    
    # Reshape the frame to the network input 
    in_frame = in_frame.reshape((n, c, input_layer_h, input_layer_w))
    
    # To infer the frame, prepare the data.
    # This must be a dictionary: 
    #   key - name of the input layer (you get this early)
    #   value - input data (the prepared frame)  
    feed_dict = {
        input_name: in_frame
    }
    
    # All is ready for the main thing - inference!
    # You have read and loaded the network to the device, prepared input data and now you are ready to infer.
    
    # Step 11:
    # To start an inference, call the `infer` function of the `network_loaded_to_device` variable. 
    # We must set input data (a dictionary).
    inference_result = network_loaded_to_device.infer(feed_dict)
    
    # Great! The `inference_result` variable contains output data after inference of the network.
    # `inference_result` is a dictionary, 
    #  where key is the name of the output name, 
    #        value is data from the blob.
    
    # Step 12: Then iterate over all discovered objects   
    detected_faces = postprocessor.process_output(inference_result)
    
    for detected_face in detected_faces:
        draw_boxes_in_frame(frame, detected_face)
    
    # Write the resulting frame to the output stream
    out.write(frame)
    
input_video_stream.release()
# Save the resulting video
out.release()

In [None]:
from IPython.display import HTML

# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{OUTPUT_VIDEO}" type="video/mp4"></video>""")

Do you see boxes in the video? 
If yes, you did all right!
**Good Work!** 

## Section 16: Practice (Part 2)

What is the next step? Often from neural networks build pipelines. It is to use the results of the first neural network as an input for the next neural network. 
Let's try to build a pipeline from two networks:  first is finds a person on the video and the next to recognize the emotions of this person

We have already run the first network. And find the person on the video.
The next step is to find a network for emotion recognition.
There is a good neural network in the [OpenModelZOO](https://docs.openvinotoolkit.org/2019_R1/_docs_Pre_Trained_Models.html) - [emotions-recognition-retail-0003 network](https://docs.openvinotoolkit.org/2019_R1/_emotions_recognition_retail_0003_description_emotions_recognition_retail_0003.html)

### Step 1: Download emotions-recognition-retail-0003 network
Run the Model Downloader eith needed arguments to download the emotions-recognition-retail-0003 network:

In [None]:
!python3 ${INTEL_OPENVINO_DIR}/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name emotions-recognition-retail-0003 --precision FP16

This mode already is in OpenVINO format and you do not need to convert it.

After downloading the model you can use it:

### Step 2: Read the prepared model
The IENetwork class is designed to work with a model in the Inference Engine. This class contains information about the network model read from the Intermediate Representation and allows you to manipulate some model parameters such as layers affinity and output layers.

You need to create an instance of the IENetwork class. A constructor of this class has two parameters:

path to the .xml file of the model
path to the .bin file of the model

In [None]:
emotion_recognition_network = ie.read_network('/home/artyom/Developer/repositories/sample/notebook/intel/emotions-recognition-retail-0003/FP16/emotions-recognition-retail-0003.xml', '/home/artyom/Developer/repositories/sample/notebook/intel/emotions-recognition-retail-0003/FP16/emotions-recognition-retail-0003.bin')

### Step 3: Load the network to a device

Use the instance of `IECore`.
The class `IECore` has a special function called `load_network`, which loads a network to a device.
This function prepares the network for the first inference on the device 
and returns an instance of the network prepared for an inference (execution). 
This function has many parameters, but in this case, you need to know only about two of them:
* `network` - instance of `IENetwork`
* `device_name` - string, contains a device name to infer a model on: CPU, GPU and so on.

In [None]:
emotion_recognition_network_loaded_on_device = ie.load_network(emotion_recognition_network, 'GPU')

### Step 4: Open the input video

In [None]:
input_video_stream = cv2.VideoCapture(str(INPUT_VIDEO))

### Step 5: Create an output video stream

In [None]:
out = prapare_out_video_stream(input_video_stream)

### Step 6: Prepare a frame and run inference

In [None]:
def emotion_infer(frame):
    # Find inputs of the model

    em_input_layer = next(iter(emotion_recognition_network.input_info))
    em_input_blob = emotion_recognition_network.input_info[em_input_layer].input_data

     # Get input shape of the network
    n, c, h, w = em_input_blob.shape

    # Resize the frame to the network input 
    em_in_frame = cv2.resize(frame, (w, h))
    
    # Change the data layout from HWC to CHW
    em_in_frame = em_in_frame.transpose((2, 0, 1))  
    
    # Reshape the frame to the network input 
    em_in_frame = em_in_frame.reshape((n, c, h, w))
    
    # Find inputs of the model
    em_output_layer = out_blob = next(iter(net.outputs))
    
    # Run the inference how you did it early
    em_results = emotion_recognition_network_loaded_on_device.infer({
        em_input_layer: em_in_frame
    })
    
    # For understanding what is the result of inference this model, check documentation 
    # https://docs.openvinotoolkit.org/latest/_models_intel_emotions_recognition_retail_0003_description_emotions_recognition_retail_0003.html
    return em_results[em_output_layer]

### Step 16: Drow boxes and emotions in a frame

In [None]:
def draw_boxes_and_emotion_in_frame(obj, frame_w, frame_h, labels_map):
    threshold = 0.5
        
    # Step 13: Get the confidence for a discovered object
    confidence = 
        
    # Step 14: Draw bounding boxes
    # Draw a bounding box only for objects the confidence of which is greater than a specified threshold
    if confidence > threshold:
        # Get coordinates of a discovered object
        xmin =
        ymin =
            
        xmax =
        ymax =
            
        # and scale it to the original size of the frame
        scaled_xmin = int( xmin * frame_w )
        scaled_ymin = int( ymin * frame_h )
            
        scaled_xmax = int( xmax * frame_w )
        scaled_ymax = int( ymax * frame_h )

        # Get class ID of a discovered object
        class_id =

        # If class is person, run inference of emotions-recognition-retail-0003
        if class_id == 1:
            person_image = frame[scaled_ymin:scaled_ymax, scaled_xmin:scaled_xmax]
            # Run emotions-recognition-retail-0003
            emotions_prob = 
            
            emotions = ['neutral', 'happy', 'sad', 'surprise', 'angry']
            emotion = emotions[np.argmax(emotions_prob)]
            
        # Get confidence for a discovered object
        confidence = round(confidence * 100, 1)

        # Draw a box and a label
        color = (min(class_id * 12.5, 255), min(class_id * 7, 255), min(class_id * 5, 255))
        cv2.rectangle(frame, (scaled_xmin, scaled_ymin), (scaled_xmax, scaled_ymax), color, 2)

        # Get the label of a class
        label = labels_map[class_id]

        # Create the title of an object
        text = '{}: {}% '.format(label, confidence)
        text = '{} {}'.format(text, emotion) if class_id == 1 else text
        # Put the title to a frame
        cv2.putText(frame, text, (scaled_xmin, scaled_ymin - 7), cv2.FONT_HERSHEY_COMPLEX, 2, color, 2)

### Step 17: Loop over frames in the input video

In [None]:
import numpy as np

while input_video_stream.isOpened():
    
    # Read the next frame from the intput video 
    ret, frame = input_video_stream.read()
    # Check if the video is over
    if not ret:
        # Exit from the loop if the video is over
        break 
    # Get height and width of the frame
    frame_h, frame_w = frame.shape[:2]
    
    # Resize the frame to the network input 
    in_frame = cv2.resize(frame, (w, h))
    
    # Change the data layout from HWC to CHW
    in_frame = in_frame.transpose((2, 0, 1))  
    
    # Reshape the frame to the network input 
    in_frame = in_frame.reshape((n, c, h, w))
    
    # To infer the frame, prepare the data.
    # This must be a dictionary: 
    #   key - name of the input layer (you get this early)
    #   value - input data (the prepared frame)  
    feed_dict = {
    }
    
    # All is ready for the main thing - inference!
    # You have read and loaded the network to the device, prepared input data and now you are ready to infer.
    
    # Step 11:
    # To start an inference, call the `infer` function of the `network_loaded_to_device` variable. 
    # We must set input data (a dictionary).
    inference_result = 
    
    # Great! The `inference_result` variable contains output data after inference of the network.
    # `inference_result` is a dictionary, 
    #  where key is the name of the output name, 
    #        value is data from the blob.
    
    # Step 12: Then iterate over all discovered objects
    for obj in inference_result[out_blob][0][0]:
        draw_boxes_and_emotion_in_frame() # you need to run draw_boxes_and_emotion_in_frame with a needed argument

    # Write the resulting frame to the output stream
    out.write(frame)

# Save the resulting video
out.release()

Now the person (Artyom) on the resulting video will be detected with emotion:

In [None]:
from IPython.display import HTML

# Show a source video
HTML("""<video width="600" height="400" controls><source src="{}" type="video/mp4"></video>""".format(OUTPUT_VIDEO))

![](pictures/thankyou.PNG)