<p style="text-align: right">
  <img src="../images/dtlogo.png" alt="Logo" width="200">
</p>

### 💻 03 - OAKD semantic segmentation

In [45]:
%matplotlib inline

import os
import sys
import time
import yaml
import numpy as np
import cv2
from typing import List, Dict, Callable, Optional, Any

import depthai as dai

## Creating a pipeline

The "pipeline" is an object documented in the [`depthai` API](https://docs.luxonis.com/projects/api/en/latest/) and can be initialized as follows:

In [46]:
def create_pipeline() -> dai.Pipeline:
    pipeline = dai.Pipeline()
    pipeline.setOpenVINOVersion(version=dai.OpenVINO.Version.VERSION_2021_3)
    return pipeline

## Creating and configuring nodes

"Nodes" (not ROS nodes) are different types of sensing modalities of the OAK-D. For example: images, stereo, or neural networks. They are configured using a "pipeline" object as a reference, which tracks all of the nodes that get created and pass them onto the OAK-D hardware to configure the device internally once the structure is defined.

In this example, we will configure a node for the central RGB camera and nodes that are necessary to run a ML model trained for semantic segmentation in the Duckietown world. An image manipulation node is necessary to pre-process RGB images into the correct shape. Pixel intensities are normalized within the NN model to have the expected mean and standard deviation (this is taken care of during model conversion in OpenVINO).

In [69]:
def create_nodes(pipeline: dai.Pipeline) -> Dict[str, dai.Node]:
    # RGB CAMERA
    cam_rgb = pipeline.createColorCamera()
    cam_rgb.setBoardSocket(dai.CameraBoardSocket.RGB)
    cam_rgb.setResolution(
        dai.ColorCameraProperties.SensorResolution.THE_1080_P)
    cam_rgb.setColorOrder(
        dai.ColorCameraProperties.ColorOrder.RGB)
    cam_rgb.setInterleaved(False)
    cam_rgb.setPreviewSize(640, 400)

    # SEMANTIC SEGMENTATION NEURAL NETWORK
    blobfile = '../../../oakd/assets/segmentation_model.blob'
    if not os.path.exists(blobfile) or not os.path.isfile(blobfile):
        print('Model not found! Blob file: %s' % blobfile)
        assert False
    segmentation_nn = pipeline.createNeuralNetwork()
    segmentation_nn.setBlobPath(blobfile)
    segmentation_nn.input.setBlocking(False)
    segmentation_nn.setNumInferenceThreads(2)  # 3 = max for OAK-D

    # IMAGE MANIPULATION
    manip_rgb = pipeline.createImageManip()
    manip_rgb.initialConfig.setResize(640, 480)
    manip_rgb.setWaitForConfigInput(False)
    manip_rgb.setKeepAspectRatio(False)

    return {
        'rgb': cam_rgb,
        'manip': manip_rgb,
        'seg': segmentation_nn
    }

## Outputs

Output links are [a special type of node](https://docs.luxonis.com/projects/api/en/latest/components/nodes/xlink_out/) which exist at the interface of the OAK-D hardware and the interface of your computer or Duckiebot. They provide a way to access the data perceived by the OAK-D.

In [70]:
def create_output_links(pipeline: dai.Pipeline) -> Dict[str, dai.XLinkOut]:
    xout_links = {
        'rgb': pipeline.createXLinkOut(),
        'seg': pipeline.createXLinkOut()
    }
    for name, xout_link in xout_links.items():
        xout_link.setStreamName(name)
        xout_link.input.setBlocking(False)
    return xout_links

## Linking nodes to each other

The various inputs and outputs of nodes need to be linked together to define the data flow, i.e. _what_ goes _where_? This will be tracked internally by the "pipeline" object and later used to configure the OAK-D device. For the output nodes, they need to know which "upstream" node in the computational graph they should be receiving data from.

In [71]:
def link_nodes_and_outputs(nodes: Dict[str, dai.Node],
                           outputs: Dict[str, dai.XLinkOut]) -> None:
    # Raw RGB image --> Output to host
    nodes['rgb'].preview.link(outputs['rgb'].input)
    # Raw RGB image --> Image manipulation module
    nodes['rgb'].preview.link(nodes['manip'].inputImage)
    # Manipulated RGB image --> Neural network
    nodes['manip'].out.link(nodes['seg'].input)
    # Segmentation image --> Output to host
    nodes['seg'].out.link(outputs['seg'].input)

## Creating a device

Once all of the nodes are initialized, configured, and properly linked together, the device can be created. The computer or Duckiebot will try to search for an OAK-D device using the `depthai` library and to configure the hardware using our "pipeline" object.

In [72]:
def create_device(pipeline: dai.Pipeline) -> dai.Device:
    device = dai.Device(pipeline, usb2Mode=False)
    device.setLogLevel(dai.LogLevel.DEBUG)
    return device

## Output queues

Remember those special "output links"? In order to read them, we need to initialize some "output queues" which will link to those outputs on the computer or Duckiebot side of things.

If a queue is "blocking", when a queue is full, any new incoming data will be dropped from the queue until the first item is read (popped) from the queue and a new space gets freed up. In contrast, a non-blocking queue will always insert new incoming data into the queue, and if the queue is filled up, the oldest data will get dropped.

In [73]:
def create_output_queues(device: dai.Device,
                         outputs: Dict[str, dai.XLinkOut]) -> Dict[str, dai.DataOutputQueue]:
    queues = dict()
    for name, xout_link in outputs.items():
        queues[name] = device.getOutputQueue(name=name, maxSize=1, blocking=False)
    return queues

## Reading data

With everything set up, let's create a utility function to read some specific requested data from the OAK-D, and return it in a dictionary mapping from the data name to the raw data (e.g. a Numpy array containing image data).

In [74]:
def read_data(requested_data: List[str],
              output_queues: Dict[str, dai.DataOutputQueue]) -> Dict[str, Any]:
    data = {name: None for name in requested_data}
    for name in requested_data:
        if name not in output_queues.keys():
            continue
        if name == 'rgb':
            data[name] = output_queues[name].get().getCvFrame()
        elif name == 'seg':
            # Read raw bytes from the final layer (output) of the neural net
            segmentation = output_queues[name].get().getLayerFp16('output')
            # Convert to numpy array
            segmentation = np.asarray(segmentation, dtype=np.float16)
            # Reshape to (C, H, W)
            segmentation = segmentation.reshape((-1, 480, 640))
            # Compress 1-hot encoding of classes to a single dimension
            segmentation = np.argmax(segmentation, axis=0).astype(np.uint8)
            # Resize to the same image size as the original RGB
            segmentation = cv2.resize(segmentation, (640, 400), cv2.INTER_NEAREST)
            # Save to dictionary
            data[name] = segmentation
    return data

## Test and visualize

Here is a simple helper function which uses the functions we've written to set up the OAK-D, continuously read data, and visualize it. This function optionally includes an "image processing" function which post-processes the data that we will read. This will come in handy for visualizing segmentation data.

In [75]:
from IPython.display import display, Image

ImageProcFunc = Callable[[np.ndarray], np.ndarray]

def hello_oakd(output_name: str,
               im_proc: Optional[ImageProcFunc]=None) -> None:
    display_handle = display(None, display_id=True)
    pipeline = create_pipeline()
    nodes = create_nodes(pipeline)
    outputs = create_output_links(pipeline)
    link_nodes_and_outputs(nodes, outputs)
    device = create_device(pipeline)
    output_queues = create_output_queues(device, outputs)
    try:
        while True:
            data = read_data([output_name], output_queues)
            if output_name in data.keys():
                frame = data[output_name]
                if im_proc is not None:
                    frame = im_proc(frame)
                _, frame = cv2.imencode('.jpeg', frame)
                display_handle.update(Image(data=frame.tobytes()))
    except KeyboardInterrupt:
        pass
    finally:
        pass
        # display_handle.update(None)

# Test it!

In [None]:
hello_oakd('rgb')

In [77]:
def colorize(im: np.ndarray) -> np.ndarray:
    colors = [
        (0, 0, 0),
        (0, 220, 220),
        (35, 142, 107),
        (70, 70, 70),
        (60, 20, 220),
        (128, 64, 128),
        (142, 0, 0),
        (30, 170, 250),
        (140, 150, 230)
    ]
    out = np.take(colors, im, axis=0).astype(np.uint8)
    return out

hello_oakd('seg', im_proc=colorize)

<IPython.core.display.Image object>