# Technique for high performance inference program - asynchronous and simultaneous inferencing
You will learn the basic technique to develop an efficient and high performance OpenVINO application using asynchronous and simultaneous inferencing.   
We'll continue to use a simple image classification program as an example to make things as simple as possible. 

### Check hardware configration of your system (DevCloud development server, in this case)
Before start, let's check how many CPU cores does the system has for the optimization work later.  
Install `psutil` Python module to do it.

In [None]:
# Linux
!pip3 install psutil
import psutil
print('# of CPU cores = {}C/{}T'.format(psutil.cpu_count(logical=False), psutil.cpu_count(logical=True)))

In [None]:
# Windows
!pip install psutil
import psutil
print('# of CPU cores = {}C/{}T'.format(psutil.cpu_count(logical=False), psutil.cpu_count(logical=True)))

### Preparing an input image and label text data files
Next, let's prepare imput image file and class label text file. Those files are in the OpenVINO install directory. We'll simply copy them to the current working directory.

In [None]:
# Linux
!cp $INTEL_OPENVINO_DIR/deployment_tools/demo/car.png .
!cp $INTEL_OPENVINO_DIR/deployment_tools/demo/squeezenet1.1.labels synset_words.txt

In [None]:
# Windows
!copy "%INTEL_OPENVINO_DIR%\deployment_tools\demo\car.png" .
!copy "%INTEL_OPENVINO_DIR%\deployment_tools\demo\squeezenet1.1.labels" synset_words.txt

### Preparing a DL model for inferencing
Download a DL model for image classification using `Model downloader` and convert it into OpenVINO IR model with `Model converter`.  
We'll use `googlenet-v1` model for this practice.

In [None]:
# Linux
!python3 $INTEL_OPENVINO_DIR/deployment_tools/open_model_zoo/tools/model_downloader/downloader.py --name googlenet-v3
!python3 $INTEL_OPENVINO_DIR/deployment_tools/open_model_zoo/tools/model_downloader/converter.py  --name googlenet-v3 --precisions FP16
!ls public/googlenet-v3/FP16 -l

In [None]:
# Windows
!python "%INTEL_OPENVINO_DIR%\deployment_tools\open_model_zoo\tools\downloader\downloader.py" --name googlenet-v3
!python "%INTEL_OPENVINO_DIR%\deployment_tools\open_model_zoo\tools\downloader\converter.py"  --name googlenet-v3 --precisions FP16
!dir public\googlenet-v3\FP16

----
The Python inferencing code starts from here.

### Initialize application for OpenVINO
This part is identical to the program in the previous image classification exercise. 
1. Import required Python modules
2. Load class label text file
3. Create an inference engine core object
4. Load IR model to memory
5. Obtain information of input and output blob

In [None]:
import time

import cv2
import numpy as np
from openvino.inference_engine import IECore

label = [ s.replace('\n', '') for s in open('synset_words.txt').readlines() ]

# Create an Inference Engine core object
ie = IECore()

# Read an IR model data to memory
model = './public/googlenet-v3/FP16/googlenet-v3'
net = ie.read_network(model=model+'.xml', weights=model+'.bin')

# Obtain the name of the input and output blob, and input blob shape
input_blob_name  = list(net.input_info.keys())[0]
output_blob_name = list(net.outputs.keys())[0]
batch,channel,height,width = net.input_info[input_blob_name].tensor_desc.dims

#### (Optional) `Query` API
Inference engine has Query API and you can obtain some information from IE plugins with query keys.

In [None]:
print(ie.get_metric('CPU', 'RANGE_FOR_ASYNC_INFER_REQUESTS'))
print(ie.get_metric('CPU', 'RANGE_FOR_STREAMS'))

### Plugin configuration
You can set special parameters to IE plugins using `set_config()` API.  
The **inferencing performance will be boosted** by configuring parameters such as `CPU_THREAD_NUM`, `CPU_BIND_THREAD`, `CPU_THROUGHPUT_STREAMS` properly.  
The other plugins or devices has its own special paramter keys. Please refer to the OpenVINO technical document library for details.
https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_Supported_Devices.html

In [None]:
ie.set_config({'CPU_THREADS_NUM'       : '0'   }, 'CPU')  # default = 0
ie.set_config({'CPU_BIND_THREAD'       : 'YES' }, 'CPU')  # default = YES
ie.set_config({'CPU_THROUGHPUT_STREAMS': '1'   }, 'CPU')  # default = 1

### Loading model data to the IE core object
Load the model data to the IE core object.  
You can specify how many infer request objects to be generated with the `num_requests` parameter.  
You can submit the same number of infer requests to Inference Engine as the number of infer request objects. (Meaning, one infer request can be sent per infer request object)

Here, we create 4 infer request objects, you can run 4 simultaneous inferencing.

In [None]:
exec_net = ie.load_network(network=net, device_name='CPU', num_requests=4)

### Setting callback function to the infer_request object
Set a callback function to the `infer_request` object. You can use noname function (lambd espression) as well.  

In this case, callback does nothing but counting completed infer requests. 

In [None]:
total_infer=0

def callback(status_code, output):
    global total_infer
    total_infer  += 1

for req in exec_net.requests:
    req.set_completion_callback(callback, req.output_blobs[output_blob_name].buffer[0])

### Reading and manipulate input image
Read the input image file and resize and transform it to fit it for input blob of the DL model using OpenCV.

In [None]:
img = cv2.imread('car.png')
img = cv2.resize(img, (width,height))
img = img.transpose((2, 0, 1))
img = img.reshape((1, channel, height, width))

### Running inference  
- Run inference (400 inferences in asynchronous and 4 inferences at a time)
- Wait for the completion of all inference tasks
- Display performance data and inference result

In [None]:
# (Workaround for a bug in Python API. Run dummy inferencing on all infer_request objects)
for req in exec_net.requests:
    req.async_infer(inputs={input_blob_name: img})

infer_slot = 0
total_infer= 0
max_infer = len(exec_net.requests)

start=time.time()

# Run inference 400 times
while total_infer<400:
    req = exec_net.requests[infer_slot]
    status = req.wait(0)
    if status == 0 or status==-11:   # Send infer request to IE when infer_request status is 0(OK) or -11(INFER_NOT_STARTED)
        res = req.async_infer(inputs={input_blob_name: img})
    infer_slot = (infer_slot+1) % max_infer

# Wait until all inference requests are completed
for req in exec_net.requests:
    while req.wait()!=0: pass

# Display performance data
total=time.time()-start
print('max_infer={} time={:.4}sec fps={}\n'.format(max_infer, total, total_infer/total))

# Display inference result
for i, req in enumerate(exec_net.requests):
    output = req.output_blobs[output_blob_name].buffer[0]
    idx = np.argsort(output)[::-1]
    print('infer_request ', i)
    for i in range(5):
        print(idx[i], output[idx[i]], label[idx[i]-1])

----
Now, you have learnt the basic technique of developing an efficient and high performance OpenVINO program using asynchronous and simultaneous inferencing.  
The points in this exercise are:
- Using asynchronous inference
- Send appropriate number infer requests to the processor to keep saturate (busy) the processor

This time, we used the default value for `CPU_THREAD_NUM`, `CPU_BIND_THREAD`, and `CPU_THROUGHPUT_STREAMS`.  
The inferencing performance **could be more than double** (on DevCloud development server) if you set optimal value for those parameters and tweak `num_requests`.

Try find the best configuration by modifying those parameters.

End of course

~~~cpp
enum StatusCode : int {
  OK = 0, GENERAL_ERROR = -1, NOT_IMPLEMENTED = -2, NETWORK_NOT_LOADED = -3,
  PARAMETER_MISMATCH = -4, NOT_FOUND = -5, OUT_OF_BOUNDS = -6, UNEXPECTED = -7,
  REQUEST_BUSY = -8, RESULT_NOT_READY = -9, NOT_ALLOCATED = -10, INFER_NOT_STARTED = -11,
  NETWORK_NOT_READ = -12
}
~~~