# Face Hiding Workshop Practice

Now try to use the model to detect faces in a video. Fill in the gaps in the code blocks below. For more information about the OpenVINO Inference Engine Python API, see the [official documentation](https://docs.openvinotoolkit.org/latest/ie_python_api/annotated.html).

## Step 0. Preparation.

First of all we need to install requirements fo this workshop.
We prepared a specific package to process inference results of RetinaFace. In addition we need packages like numpy to work with tensors and IPython to show a video in the notebook

In [4]:
!pip3 install -r requirements.txt

Defaulting to user installation because normal site-packages is not writeable
Collecting IPython==7.5.0
  Downloading ipython-7.5.0-py3-none-any.whl (770 kB)
[K     |████████████████████████████████| 770 kB 6.2 MB/s eta 0:00:01
[?25hCollecting matplotlib==3.3.4
  Using cached matplotlib-3.3.4-cp36-cp36m-manylinux1_x86_64.whl (11.5 MB)
Collecting requests>=2.25.1
  Downloading requests-2.26.0-py2.py3-none-any.whl (62 kB)
[K     |████████████████████████████████| 62 kB 732 kB/s  eta 0:00:01
Collecting prompt-toolkit<2.1.0,>=2.0.0
  Downloading prompt_toolkit-2.0.10-py3-none-any.whl (340 kB)
[K     |████████████████████████████████| 340 kB 13.0 MB/s eta 0:00:01
Collecting charset-normalizer~=2.0.0
  Downloading charset_normalizer-2.0.4-py3-none-any.whl (36 kB)
Installing collected packages: prompt-toolkit, charset-normalizer, requests, matplotlib, IPython
  Attempting uninstall: requests
    Found existing installation: requests 2.24.0
    Uninstalling requests-2.24.0:
      Successfu

Next step of preparation is set some constants. This is paths to input and result videos and the model.

In [62]:
from pathlib import Path

# Contains all data for the workshop
WORKSHOP_MODEL_PATH = Path('./data') / 'models'

# Path to the Inference Engine model
# But you can use the INT8 model instead
FACE_DETECTION_MODEL_PATH_XML = WORKSHOP_MODEL_PATH / 'face-detection-adas-0001.xml'
FACE_DETECTION_MODEL_PATH_BIN = WORKSHOP_MODEL_PATH / 'face-detection-adas-0001.bin'

DEVICE = 'CPU'

DATA_PATH = Path('./data')
INPUT_VIDEO = str(DATA_PATH / 'input.mp4')
OUTPUT_VIDEO = str(DATA_PATH / 'output.MP4')

Now let's show the input video

In [63]:
from IPython.display import HTML

# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{INPUT_VIDEO}" type="video/mp4"></video>""")

In [64]:
# Import OpenCV for work with a video and images
import cv2

# Import the Inference Engine
from openvino.inference_engine import IECore, IENetwork

import numpy as np

The first our function is to create output video writer.

In [65]:
def prapare_out_video_stream(input_video_stream: cv2.VideoCapture, output_video_file_path: str) -> cv2.VideoWriter:
    width  = int(input_video_stream.get(3))
    height = int(input_video_stream.get(4))
    video_writer = cv2.VideoWriter(output_video_file_path, cv2.VideoWriter_fourcc(*'avc1'), 20, (width, height))
    return video_writer

### Step 1: Create an instance of the OpenVINO Inference Engine `IECore` class
This class represents an Inference Engine entity 
and allows you to manipulate plugins using unified interfaces. 

In [66]:
ie_core = IECore()

In [16]:
!pwd

/home/u44598/face-hiding-workshop/notebook


### Step 2: Read the prepared model

You need to create an instance of the IENetwork class.
A constructor of this class has two parameters: 
 1. path to the .xml file of the model 
 2. path to the .bin file of the model

In [68]:
face_detection_network = ie_core.read_network(FACE_DETECTION_MODEL_PATH_XML, FACE_DETECTION_MODEL_PATH_BIN)

### Step 3: Get the name of the input layer of the model

To infer a model, you need to know input layers of the model
The object `retinaface_network` contains information about inputs of the network in a property `input_info`,
which is a dictionary: key - name of the input layer, volume - representation of the input network.
In this case, you need to get the name and the blob of the input .`retinaface_input_name` should be a string, `retinaface_input_blob`  should be a `DataPtr`.

In [69]:
face_detection_input_name = next(iter(face_detection_network.input_info))
face_detection_input_blob = face_detection_network.input_info[face_detection_input_name].input_data

print(f'Input layer of the RetinaFace is {face_detection_input_name}')

Input layer of the RetinaFace is data


### Step 3: Get shape (dimensions) of the input layer of the network

* n - number of batches
* c - number of input image channels (usualy 3 - R, G and B) 
* h - height
* w - width

In [70]:
face_detection_batch, face_detection_channels, face_detection_input_layer_h, face_detection_input_layer_w = face_detection_input_blob.shape

print(f'Input shape of the face-detection-adas-0001 is [{face_detection_batch}, {face_detection_channels}, {face_detection_input_layer_h}, {face_detection_input_layer_w}]')

Input shape of the face-detection-adas-0001 is [1, 3, 384, 672]


In [71]:
face_detection_blob = next(iter(face_detection_network.outputs))

### Step 4: Load the network to a device

Use the instance of `IECore`.
The class `IECore` has a special function called `load_network`, which loads a network to a device.
This function prepares the network for the first inference on the device 
and returns an instance of the network prepared for an inference (execution). 
This function has many parameters, but in this case, you need to know only about two of them:
* `network` - instance of `IENetwork`
* `device_name` - string, contains a device name to infer a model on: CPU, GPU and so on.

In [72]:
face_detection_loaded_to_device = ie_core.load_network(face_detection_network, DEVICE)

### Step 5: Open the input video

In [73]:
input_video_stream = cv2.VideoCapture(INPUT_VIDEO)

### Step 6: PreProcessing 

In [74]:
def face_detection_pre_processing(input_frame: np.ndarray, batch: int, channels: int, input_layer_height: int, input_layer_width: int) -> np.ndarray:
    # Resize the frame to the network input 
    resized_frame = cv2.resize(input_frame, (input_layer_width, input_layer_height))
    
    # Change the data layout from HWC to CHW
    transposed_frame = resized_frame.transpose((2, 0, 1))  
    
    # Reshape the frame to the network input 
    reshaped_frame = transposed_frame.reshape((batch, channels, input_layer_height, input_layer_width))
    
    return reshaped_frame

## Step 7: Inference

In [75]:
def face_detection_inference(input_frame: np.ndarray) -> np.ndarray:
    feed_dict = {
        face_detection_input_name: input_frame
    }
    
    # All is ready for the main thing - inference!
    # You have read and loaded the network to the device, prepared input data and now you are ready to infer.
    
    # Step 11:
    # To start an inference, call the `infer` function of the `network_loaded_to_device` variable. 
    # We must set input data (a dictionary).
    inference_result = face_detection_loaded_to_device.infer(feed_dict)
    
    # Great! The `inference_result` variable contains output data after inference of the network.
    # `inference_result` is a dictionary, 
    #  where key is the name of the output name, 
    #        value is data from the blob.
    
    return inference_result[face_detection_blob]

### Step 9: Prepare for post-processing

In [76]:
# Create Output video stream
output_video_stream = prapare_out_video_stream(input_video_stream, OUTPUT_VIDEO)

# Get input height and width
input_frame_width = int(input_video_stream.get(3))   # float `width`
input_frame_height = int(input_video_stream.get(4))  # float `height`

### Step 8: Function for processing inference results

In [77]:
def add_face_detection_inference_result_in_frame(original_frame: np.ndarray, detected_face: np.ndarray):       

    # Step 14: Draw bounding boxes
    # Draw a bounding box only for objects the confidence of which is greater than a specified threshold
    # Get coordinates of a discovered object
   
    # Step 13: Get the confidence for a discovered object
    confidence =  detected_face[2]
        
    if confidence < 0.5:
        return
    
    frame_h, frame_w = original_frame.shape[:2]
    
    xmin = int(detected_face[3]*frame_w)
    ymin = int(detected_face[4]*frame_h)

    xmax = int(detected_face[5]*frame_w)
    ymax = int(detected_face[6]*frame_h)
    
    w = xmax - xmin
    h = ymax - ymin
    
    face = original_frame[ymin:ymax, xmin:xmax]
    blured_face = cv2.GaussianBlur(face,(23, 23), 50)    
    original_frame[ymin:ymax, xmin:xmax] = blured_face
    
    
    # Get confidence for a discovered object
    confidence = round(confidence * 100, 1)
    
    # Draw a box and a label
    color = (0, 255, 0)
    
    # Create the title of an object
    text = f'{confidence}%'

    # Put the title to a frame
    cv2.putText(original_frame, text, (xmin, ymin - 7), cv2.FONT_HERSHEY_COMPLEX, 2, color, 2)

### Step 10: Loop over frames in the input video

In [78]:
while input_video_stream.isOpened():
    # Read the next frame from the intput video 
    ret, frame = input_video_stream.read()
    # Check if the video is over
    if not ret:
        # Exit from the loop if the video is over
        break 
    
    # Prepare frame for inference
    in_frame = face_detection_pre_processing(frame, face_detection_batch, face_detection_channels, face_detection_input_layer_h, face_detection_input_layer_w)
    
    
    inference_result = face_detection_inference(in_frame)
    
    for detected_face in inference_result[0][0]:
        add_face_detection_inference_result_in_frame(frame, detected_face)
    
    # Write the resulting frame to the output stream
    output_video_stream.write(frame)
    
input_video_stream.release()
# Save the resulting video
output_video_stream.release()

In [79]:
from IPython.display import HTML

# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{OUTPUT_VIDEO}" type="video/mp4"></video>""")

Do you see boxes in the video? 
If yes, you did all right!
**Good Work!** 

## Section 16: Practice (Part 2)

What is the next step? Often from neural networks build pipelines. It is to use the results of the first neural network as an input for the next neural network. 
Let's try to build a pipeline from two networks:  first is finds a person on the video and the next to recognize the emotions of this person

We have already run the first network. And find the person on the video.
The next step is to find a network for emotion recognition.
There is a good neural network in the [OpenModelZOO](https://docs.openvinotoolkit.org/2019_R1/_docs_Pre_Trained_Models.html) - [emotions-recognition-retail-0003 network](https://docs.openvinotoolkit.org/2019_R1/_emotions_recognition_retail_0003_description_emotions_recognition_retail_0003.html)

### Step 1: Download emotions-recognition-retail-0003 network
Run the Model Downloader eith needed arguments to download the emotions-recognition-retail-0003 network:

In [34]:
!python3 /opt/intel/openvino_2021/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name emotions-recognition-retail-0003 --precision FP16 --output_dir data/model
!mv data/model/intel/emotions-recognition-retail-0003/FP16/emotions-recognition-retail-0003.* data/models/

################|| Downloading emotions-recognition-retail-0003 ||################

... 100%, 39 KB, 291 KB/s, 0 seconds passed

... 100%, 4848 KB, 4297 KB/s, 1 seconds passed



This mode already is in OpenVINO format and you do not need to convert it.

After downloading the model you can use it:

### Step 2: Read the prepared model
The IENetwork class is designed to work with a model in the Inference Engine. This class contains information about the network model read from the Intermediate Representation and allows you to manipulate some model parameters such as layers affinity and output layers.

You need to create an instance of the IENetwork class. A constructor of this class has two parameters:

path to the .xml file of the model
path to the .bin file of the model

In [80]:
emotion_recognition_network = ie_core.read_network('data/models/emotions-recognition-retail-0003.xml', 'data/models/emotions-recognition-retail-0003.bin')

### Step 3: Load the network to a device

Use the instance of `IECore`.
The class `IECore` has a special function called `load_network`, which loads a network to a device.
This function prepares the network for the first inference on the device 
and returns an instance of the network prepared for an inference (execution). 
This function has many parameters, but in this case, you need to know only about two of them:
* `network` - instance of `IENetwork`
* `device_name` - string, contains a device name to infer a model on: CPU, GPU and so on.

In [81]:
emotion_recognition_network_loaded_on_device = ie_core.load_network(emotion_recognition_network, 'CPU')

### Step 4: Open the input video

In [82]:
input_video_stream = cv2.VideoCapture(INPUT_VIDEO)

### Step 5: Create an output video stream

In [83]:
output_video_stream = prapare_out_video_stream(input_video_stream, OUTPUT_VIDEO)

In [84]:
emotion_recognition_input_layer = next(iter(emotion_recognition_network.input_info))
emotion_recognition_input_blob = emotion_recognition_network.input_info[emotion_recognition_input_layer].input_data

print(f'Input layer of the emotions-recognition-retail-0003 is {emotion_recognition_input_layer}')

Input layer of the emotions-recognition-retail-0003 is data


In [85]:
emotion_recognition_batch, emotion_recognition_channels, emotion_recognition_input_layer_h, emotion_recognition_input_layer_w = emotion_recognition_input_blob.shape

print(f'Input shape of the RetinaFace is [{emotion_recognition_batch}, {emotion_recognition_channels}, {emotion_recognition_input_layer_h}, {emotion_recognition_input_layer_w}]')

Input shape of the RetinaFace is [1, 3, 64, 64]


In [86]:
emotion_recognition_output_layer = next(iter(emotion_recognition_network.outputs))

### Step 6: Prepare a frame and run inference

In [87]:
def emotion_infer(face):
    # Resize the frame to the network input
    resized_frame = cv2.resize(face, (emotion_recognition_input_layer_w, emotion_recognition_input_layer_h))
    
    # Change the data layout from HWC to CHW
    transposed_frame = resized_frame.transpose((2, 0, 1))  
    
    # Reshape the frame to the network input 
    reshaped_frame = transposed_frame.reshape((emotion_recognition_batch, emotion_recognition_channels, emotion_recognition_input_layer_h, emotion_recognition_input_layer_w))

    # Run the inference how you did it early
    inference_results = emotion_recognition_network_loaded_on_device.infer({
        emotion_recognition_input_layer: reshaped_frame
    })
    # For understanding what is the result of inference this model, check documentation 
    # https://docs.openvinotoolkit.org/latest/_models_intel_emotions_recognition_retail_0003_description_emotions_recognition_retail_0003.html
    return inference_results[emotion_recognition_output_layer]

### Step 16: Drow boxes and emotions in a frame

In [88]:
def get_smile_by_index(emotion_inference_result: np.ndarray) -> np.ndarray:
    emotions = ['neutral', 'happy', 'sad', 'surprise', 'anger']
    emotion_index = np.argmax(emotion_inference_result.flatten()) 
    smile_path = f'./data/{emotions[emotion_index]}.png'
    return cv2.imread(smile_path)

In [92]:
def emotion_recognition_inference_postpprocess(original_frame, detected_face, emotion_result, x_limits, y_limits):
    smile = get_smile_by_index(emotion_result)
    # Put the title to a frame
    w = x_limits[1] - x_limits[0]
    h = y_limits[1] - y_limits[0]
    print(w, h)
    resized_smile = cv2.resize(smile, (w, h))
    
    original_frame[y_limits[0]:y_limits[1], x_limits[0]:x_limits[1]] = resized_smile

### Step 17: Loop over frames in the input video

In [93]:
while input_video_stream.isOpened():
    
    # Read the next frame from the intput video 
    ret, original_frame = input_video_stream.read()
    # Check if the video is over
    if not ret:
        # Exit from the loop if the video is over
        break 
    face_detection_frame = face_detection_pre_processing(original_frame, face_detection_batch, face_detection_channels, face_detection_input_layer_h, face_detection_input_layer_w)
    face_detection_inferece_result = face_detection_inference(face_detection_frame)
    frame_h, frame_w = original_frame.shape[:2]
    for detected_face in face_detection_inferece_result[0][0]:
        if detected_face[2] < 0.5:
            continue
        # Step 13: Get the confidence for a discovered object
        xmin = int(detected_face[3]*frame_w)
        ymin = int(detected_face[4]*frame_h)

        xmax = int(detected_face[5]*frame_w)
        ymax = int(detected_face[6]*frame_h)
        
        emotion_recognition_frame = original_frame[ymin:ymax, xmin:xmax]
    
        # Get height and width of the frame
        emotion_recognition_result = emotion_infer(emotion_recognition_frame)
        emotion_recognition_inference_postpprocess(original_frame, detected_face, emotion_recognition_result, (xmin, xmax), (ymin, ymax))
        # Write the resulting frame to the output stream
    
    output_video_stream.write(original_frame)
    
input_video_stream.release()
# Save the resulting video
output_video_stream.release()

435 571


error: OpenCV(4.5.3-openvino) ../opencv/modules/imgproc/src/resize.cpp:4051: error: (-215:Assertion failed) !ssize.empty() in function 'resize'


Now the person (Artyom) on the resulting video will be detected with emotion:

In [None]:
# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{OUTPUT_VIDEO}" type="video/mp4"></video>""")