## Introduction

The goal of this tutorial is to show you some of the practical basics of computer vision, and to allow you to get your hands dirty. You will learn the following:

- How to get image and video data in and out of python
- How to do basic image manipulation
- How to perform color thresholding to detect the position of objects
- How to perform face detection using a Haar-Cascade classifier

We will be focussing on offline processing, i.e., we will record a video with your webcam, store it, and then process it, instead of focussing on online processing, i.e., augmenting an image in real-time. Both offline, and online processing use the same techniques - which we will introduce here; online processig just has an added constraint that your code needs to be fast as well.

For each topic, you will first get a small demo introducing the concepts, and then you will solve exercises using what you have learned.

First things first, import the libraries you will use throughout the tutorial.

In [1]:
import cv2
import numpy as np

# For Visualization in Jupyter
import ipywidgets as widgets
from matplotlib import pyplot as plt
from IPython.display import display

# Get images and video into Jupyter from your webcam
from ipywebrtc import CameraStream, ImageRecorder, VideoRecorder

# How to get image and video data in and out of python

### Load an Image from File

In [None]:
#Import image
path_to_file = "logo.png"
image = cv2.imread(path_to_file)

### "Display" an Image using Numpy

In [None]:
print(type(image))
print("Image Shape: {}".format(image.shape))
print("Image dType: {}".format(image.dtype))

As you can see, in python an image is nothing but a numpy array. This is very powerfull as it enables a variety of image manipulations, as you will see later.

### Display an Image in Jupyter

In [None]:
display_image = cv2.imencode('.png', image)[1].tostring()
widgets.Image(value=display_image)

Running this code, you are converting the image from a numpy array into a byte string. This string represents a .png image, which you then pass into a Jupyter image widget as input data.

### Display an Image Using matplotlib

In [None]:
%matplotlib inline

plt.imshow(image)
plt.show()

What happened here? The image in matplotlib doesn't look like the original image or the one we displayed in the image widget above; the colors are mixed up.

The reason for this is, because OpenCV and matplotlib expect different image formats for the image. As you can see in above section, an image in python is just a numpy array. The first channel corresponds to the vertical axis, the second channel to the horizontal axis, and the third axis corresponds to the color channels.

In OpenCV the color channels are, by default, ordered Blue-Green-Red, whereas in matplotlib - and the majority of other image editing tools - colors are ordered Red-Blue-Green. In both cases a single color has up to 8-bit, which is why these are called BGR8, and RGB8 respectively. Further down in this tutorial you will encounter two additional formats that are important.

If the order of the color channels gets mixed up, the colors of the resulting image look funny. This is what is happening above, so let's correct it using one of openCVs many other functions.

In [None]:
image2 = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image2)
plt.show()

### Load Image from a Webcam in Jupyter

First we get a handle to the camera. For this to work, you will have to allow this website to use your webcam.

In [None]:
camera = CameraStream(constraints=
                      {'facing_mode': 'user',
                       'audio': False,
                       'video': { 'width': 640, 'height': 480 }
                       })
camera

In [None]:
recorder = ImageRecorder(stream=camera)
recorder

Once you are happy with the snapshot, make sure to close the webcam.

In [None]:
camera.close()

A snapshot is a byte string in .png format (by default). Therefore, after you have taken a snapshot, you need to convert it to a numpy array for further processing.

In [None]:
# extract the value from the ImageRecorder
snapshot = recorder.image.value
snapshot = np.frombuffer(snapshot, dtype=np.uint8)
snapshot = cv2.imdecode(snapshot, cv2.IMREAD_COLOR)

### Exercise 1

Let's make sure everything works as expected by visualizing the extracted image (`snapshot`) using what you learned above. You can choose to display the image using either an image widget or matplotlib.

In [None]:
# TODO: Place code here to display the image
display_image = cv2.imencode('.png', snapshot)[1].tostring()
widgets.Image(value=display_image)

### Write Image to Disk

The line below stores the image as a .png file. It is also possible to save .jpg files, by changing the filename. OpenCV will pick it up automatically and try to match the name to a supported file type.

In [None]:
cv2.imwrite('snapshot.png', snapshot)

### Load Video from Webcam in Jupyter

In [None]:
camera = CameraStream(constraints=
                      {'facing_mode': 'user',
                       'audio': False,
                       'video': { 'width': 640, 'height': 480 }
                       })
camera

In [None]:
recorder2 = VideoRecorder(stream=camera)
recorder2

In [None]:
camera.close()

This time you are not using an `ImageRecorder`, but a `VideoRecorder`. It also returns a byte string; however, this time it is returned in the format `.webm`.

Getting videos into python is a little bit more involved. You have to first save the encoded video as a file and then load it it again using OpenCV. OpenCV will then take care of decoding the video for you so that you can access it frame by frame. 

In [None]:
# write the file to disk
with open('capture.webm', 'wb') as out_file:
    out_file.write(recorder2.video.value)

In [None]:
# open the video file using OpenCV and display each frame using widgets
disp = widgets.Image()
display(disp)

video_reader = cv2.VideoCapture("capture.webm")
ret, frame = video_reader.read()
while ret:
    ret, frame = video_reader.read()
    if not ret:
        continue

    display_image = cv2.imencode('.png', frame)[1].tostring()
    disp.value = display_image
video_reader.release()

As you may have noticed, when you run the snippet above, the video is stuttering. Some frames are displayed in real time, whereas at other times it freezes and then skips a few frames.

This is because image widgets aren't built to display video. The code above feeds each frame as fast as possible rather then ensuring a consistent frame rate. We will address this in the next section, when we cover saving video.

### Display a video file in Jupyter

In [None]:
widgets.Video.from_file("capture.webm")

### Create a Video from Frames

As mentioned above, the image widget is not meant to display video. Instead, you can open the image, process it, and write it back to disk (using a suitable video encoding). Once it is stored again, you can view it using a video widget.

In [None]:
input_video = cv2.VideoCapture("capture.webm")

output_file_name = "output.mp4"
backend = cv2.CAP_ANY
fourcc_code = cv2.VideoWriter_fourcc(*"H264")
fps = 24
frame_size = (640, 480)
output_video = cv2.VideoWriter(output_file_name, backend, fourcc_code, fps, frame_size)

ret, frame = input_video.read()
while ret:
    ret, frame = input_video.read()
    if not ret:
        continue

    output_video.write(frame)
input_video.release()
output_video.release()

*Note: If you are having issues writing .mp4 files, you can change the file type to .webm. To do this replace the second line with* `output_video = cv2.VideoWriter("output.webm", cv2.CAP_ANY, cv2.VideoWriter_fourcc(*"VP80"), 24, (640,480))`.

### Exercise 2

Display the created file using a video widget.

In [None]:
widgets.Video.from_file("output.mp4")

### Exercise 3

Load the video you created above, split it frame by frame, and save the frames to disk. 

Note: Since a video will have a lot of frames, it is advisable to store them in a sub-folder. You can create one using jupyters line magic. For example to create the directory `imgs` you would do: `!mkdir imgs`.

In [None]:
foo = cv2.VideoCapture("capture.webm")
ret, frame = foo.read()
path = "img/img_{}.png"

counter = 0
while ret:
    ret, frame = foo.read()
    if not ret:
        break

    location = path.format(counter)
    cv2.imwrite(path.format(counter), frame)
    counter += 1
foo.release()

### Exercise 4

Using the batch of images you created in Exercise 3, convert the images back to a video. Afterwards, inspect the created video and verify that it matches the original input video.

In [None]:
writer = cv2.VideoWriter("exercise4.mp4", cv2.CAP_ANY, cv2.VideoWriter_fourcc(*"H264"), 24, (640,480))
path = "img/img_{}.png"

for counter in range(44):
    location = path.format(counter)
    
    img = cv2.imread(location)
    writer.write(img)

writer.release()

## How to perform basic image manipulation in OpenCV

You already know about RGB and BGR from above (displaying images with matplotlib); here, you will learn how to convert images into grayscale and what the HSV color space is about.

For these conversions, you will have to use the snapshot you took at the very beginning of this tutorial.

In [None]:
display_image = cv2.imencode('.png', snapshot)[1].tostring()
widgets.Image(value=display_image)

### Convert an BGR image to Grayscale

In [None]:
#conversion
snapshot_gray = cv2.cvtColor(snapshot, cv2.COLOR_BGR2GRAY)

# display the image
display_image = cv2.imencode('.png', snapshot_gray)[1].tostring()
widgets.Image(value=display_image)

An important thing to notice here is that the gray color space only has a single channel (intensity). The resulting image is a rank 2 tensor, i.e., a matrix.

In [None]:
print("Image Shape: {}".format(snapshot_gray.shape))

### Convert BGR into HSV

In [None]:
#convert to HSV
snapshot_hsv = cv2.cvtColor(snapshot, cv2.COLOR_BGR2HSV)

#display the image
display_image = cv2.imencode('.png', snapshot_hsv)[1].tostring()
widgets.Image(value=display_image)

HSV stands for Hue, Saturation, Value and is a [very interesting color space](https://en.wikipedia.org/wiki/HSL_and_HSV). It is inspired by how humans perceive color when mixing paint. It is very handy for color thresholding, because it projects variations in color onto one axis, while projecting changes in illumination onto the other two. This allows for a more robust color segmentation. You will do color thresholding further down in this tutorial.

The reason the image looks funny is because we again violate assumptions. `imencode` converts a matrix to a byte string, and assumes that the matrix is an image in BGR format; however, we feed it an image in HSV format. Thus we can see some interesting colors.

### Stacking Images

In [None]:
# convert the gray image into a rank 3 tensor to match the format of the colored images
snapshot_gray_img = np.stack([snapshot_gray] * 3, axis = 2)

# stack up the images - like you would stack matrices
stacked = np.hstack([snapshot_hsv, snapshot, snapshot_gray_img])

# display the new image
display_image = cv2.imencode('.png', stacked)[1].tostring()
widgets.Image(value=display_image)

### Displaying Text on an Image

In [None]:
#convert to HSV
snapshot_hsv = cv2.cvtColor(snapshot, cv2.COLOR_BGR2HSV)

# display a text on the image
text_string = "HSV"
position = (5, 50) #of bottom left corner of the text
font = cv2.FONT_HERSHEY_SIMPLEX
font_size = 2
font_color = (255, 255, 255) # Remember its BGR
font_thickness = 2
cv2.putText(snapshot_hsv, text_string, position,
        font, font_size, font_color, font_thickness)

#display the image
display_image = cv2.imencode('.png', snapshot_hsv)[1].tostring()
widgets.Image(value=display_image)

### Exercise 5

Load the video that you took in the first part of this tutorial and convert it to gray. You can choose to save it as a file or to display the gray frames directly. Either way show your result by displaying it below.

In [None]:
foo = cv2.VideoCapture("capture.webm")

img_display = widgets.Image()
display(img_display)


ret, frame = foo.read()
while ret:
    ret, frame = foo.read()
    if not ret:
        break
    
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        
    display_image = cv2.imencode('.png', frame_gray)[1].tostring()
    img_display.value = display_image
    
foo.release()

### Exercise 6

Record a video that is at least 10 seconds long. Then, use this video and place a label in the top right corner denoting that it is in BGR format. After 5 seconds convert the remaining video into HSV and change the label to indicate that the video is now in HSV.

In [None]:
from IPython.display import display

foo = cv2.VideoCapture("capture.webm")

img_display = widgets.Image()
display(img_display)

ret, frame = foo.read()
counter = 0
while ret:
    ret, frame = foo.read()
    if not ret:
        break
    
    if counter <= 24 * 5:
        cv2.putText(frame, "BGR", (5, 50),
                cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 255, 255), 2)
    else:
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
        cv2.putText(frame, "HSV", (5, 50),
                cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 255, 255), 2)
  
    display_image = cv2.imencode('.png', frame)[1].tostring()
    img_display.value = display_image
    counter += 1

foo.release()

### Exercise 7

Write a piece of code that loads the video you recorded in the first part of the tutorial and, for each frame, converts it into both a gray image and a HSV image. Then, stack the images in order HSV - RGB - gray and write them to disk.

In [None]:
    from IPython.display import display

foo = cv2.VideoCapture("capture.webm")
writer = cv2.VideoWriter("exercise7.mp4", cv2.CAP_ANY, cv2.VideoWriter_fourcc(*"H264"), 24, (640*3,480))

img_display = widgets.Image()
display(img_display)


ret, frame = foo.read()
while ret:
    ret, frame = foo.read()
    if not ret:
        break
    
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    frame_hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    
    frame_gray_img = np.stack([frame_gray] * 3, axis = 2)
    frame_stacked = np.hstack([frame_hsv, frame, frame_gray_img])
    
    display_image = cv2.imencode('.png', frame_stacked)[1].tostring()
    img_display.value = display_image
    
    writer.write(frame_stacked)
foo.release()
writer.release()

In [None]:
widgets.Video.from_file("exercise7.mp4")

## How to perform Color Thresholding  to detect the position of Objects

First you will have to record an image that contains an object we want to detect. For this, grab an object that has a solid color, such as a pen, or a banana.

In [None]:
camera = CameraStream(constraints=
                      {'facing_mode': 'user',
                       'audio': False,
                       'video': { 'width': 640, 'height': 480 }
                       })

In [None]:
recorder = ImageRecorder(stream=camera)
recorder

In [None]:
camera.close()

In [None]:
snapshot = recorder.image.value
snapshot = np.frombuffer(snapshot, dtype=np.uint8)
snapshot = cv2.imdecode(snapshot, cv2.IMREAD_COLOR)

Using `inRange` you can filter an interval for each color channel. For each pixel the command checks 
```
lower < pixel < upper
```
and creates a mask. The result is a 2-dimensional tensor (matrix) in which a value is 1, if all channels satisfy above condition, or 0 if at least one of the color values of the pixel lies outside the specified range. You can then use this mask to remove parts of the image that you are not interested in, i.e., set them to black.

In [None]:
disp = widgets.Image()

def threshold(img, lower, upper):
    mask = cv2.inRange(img, np.array(lower), np.array(upper))
    masked = cv2.bitwise_and(img, img, mask = mask)
        
    display_image = cv2.imencode('.png', masked)[1].tostring()
    disp.value = display_image

To explore this, you can use Jupyter's interact function. It allows you to specify the upper/lower values of the threshold dynamically using sliders. Try moving them around and see if you can isolate the object.

In [None]:
from ipywidgets import interact, IntSlider

def foo(lower_r, lower_g, lower_b, upper_r, upper_g, upper_b):
    lower = [lower_b, lower_g, lower_r]
    upper = [upper_b, upper_g, upper_r]
    threshold(snapshot, lower, upper)
    display(lower, upper)


widgets.interact(foo, 
                 lower_r=IntSlider(min=0, max=255, step=1, value=0), 
                 lower_g=IntSlider(min=0, max=255, step=1, value=0),
                 lower_b=IntSlider(min=0, max=255, step=1, value=0),
                 upper_r=IntSlider(min=0, max=255, step=1, value=255),
                 upper_g=IntSlider(min=0, max=255, step=1, value=255), 
                 upper_b=IntSlider(min=0, max=255, step=1, value=255))

In [None]:
display(disp)

While this is already quite potent, you may have noticed that the threshold values are quite sensitive to changes in illumination. You can test this by shining a light, e.g., your phone's flash, onto the object and taking a picture. This should create a nice color gradient and you will see the problem.

To make the detection more robost, it is common to use the HSV colorspace that we saw above. Because color is projected onto a different dimension than illumination (roughly), changes in illumination don't affect the Hue (color) channel as much. Hence, even if you have a color gradient on the object, you should still be able to detect it entirely, if the gradient isn't too stark.

Let's look at the difference in code, and explore it similar to what we did previously.

In [None]:
disp2 = widgets.Image()

def thresholdHSV(img, lower, upper):
    img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)    
    mask = cv2.inRange(img_hsv, np.array(lower), np.array(upper))
    masked = cv2.bitwise_and(img, img, mask = mask)
    
    display_image = cv2.imencode('.png', masked)[1].tostring()
    disp2.value = display_image

In [None]:
from ipywidgets import interact, IntSlider

def foo2(lower_H, lower_S, lower_V, upper_H, upper_S, upper_V):
    lower = [lower_H, lower_S, lower_V]
    upper = [upper_H, upper_S, upper_V]
    thresholdHSV(snapshot, lower, upper)
    display(lower, upper)

widgets.interact(foo2, 
                 lower_H=IntSlider(min=0, max=255, step=1, value=0), 
                 lower_S=IntSlider(min=0, max=255, step=1, value=0),
                 lower_V=IntSlider(min=0, max=255, step=1, value=0),
                 upper_H=IntSlider(min=0, max=255, step=1, value=255),
                 upper_S=IntSlider(min=0, max=255, step=1, value=255), 
                 upper_V=IntSlider(min=0, max=255, step=1, value=255))

In [None]:
disp2

Detecting the pixels that correspond to an object is nice; however, we usualy want to do more then just know which pixels might be part of the object. This is where contours come in handy. 

A contour is a line that moves along pixels of equal intensity (same value). In colored images it can be hard to specify what equal value means, so one typically considers grayscale images for this type of analysis. In the case of a binary image like the mask, contour analysis performs similar to connected component analysis. `findContours` is the command that will perform this analysis for you, and the usage is shown below.

After you computed the contours in the image, you may want to know where the detected shape is located at. For this you can compute the [image moments](https://en.wikipedia.org/wiki/Image_moment) of the detected shape. Moment is a term that comes from mechanical engineering, and is a measure of inertia of an object under force. We assume that the shape we detected is an object that has a certain amount of mass loacted at each pixel and from that compute the different moments an object of that shape and mass distribution would have. With this trick, it is possible to compute the center of mass of an object (as well as area, orientation, ...).

In the example below we compute the center of mass, and then put a dot there and label it as center. You can play around with the image again, and see what it does.

In [None]:
disp2 = widgets.Image()

def thresholdHSV(img, lower, upper):
    img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)    
    mask = cv2.inRange(img_hsv, np.array(lower), np.array(upper))
    masked = cv2.bitwise_and(img, img, mask = mask)
    
    contours, hierarchy = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    cv2.drawContours(masked, contours, -1, (0, 0, 255), 2)
    
    for c in contours:
        # compute the center of the contour
        M = cv2.moments(c)
        if M["m00"] == 0:
            continue

        cX = int(M["m10"] / M["m00"])
        cY = int(M["m01"] / M["m00"])
        cv2.circle(masked, (cX, cY), 7, (255, 255, 255), -1)
        cv2.putText(masked, "center", (cX - 20, cY - 20),
            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
    
    display_image = cv2.imencode('.png', masked)[1].tostring()
    disp2.value = display_image

In [None]:
from ipywidgets import interact, IntSlider

def foo2(lower_H, lower_S, lower_V, upper_H, upper_S, upper_V):
    lower = [lower_H, lower_S, lower_V]
    upper = [upper_H, upper_S, upper_V]
    thresholdHSV(snapshot, lower, upper)
    display(lower, upper)

widgets.interact(foo2, 
                 lower_H=IntSlider(min=0, max=255, step=1, value=0), 
                 lower_S=IntSlider(min=0, max=255, step=1, value=0),
                 lower_V=IntSlider(min=0, max=255, step=1, value=0),
                 upper_H=IntSlider(min=0, max=255, step=1, value=255),
                 upper_S=IntSlider(min=0, max=255, step=1, value=255), 
                 upper_V=IntSlider(min=0, max=255, step=1, value=255))

In [None]:
disp2

As you can see, the algorithm picks up some artifacts and treats them as objects, too. How to solve this depends on the concrete scenario. In this case we can assume that artifacts are small with respect to the desired object, and hence all but the largest object can be discarded,

In [None]:
disp2 = widgets.Image()

def thresholdHSV(img, lower, upper):
    img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)    
    mask = cv2.inRange(img_hsv, np.array(lower), np.array(upper))
    masked = img.copy() #cv2.bitwise_and(img, img, mask = mask)
    
    contours, hierarchy = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    
    max_idx = 0
    max_val = 0
    for idx, c in enumerate(contours):
        if cv2.contourArea(c) > max_val:
            max_idx = idx
            max_val = cv2.contourArea(c)

    cv2.drawContours(masked, contours[max_idx], -1, (0, 0, 255), 2)
    M = cv2.moments(contours[max_idx])
    cX = int(M["m10"] / M["m00"])
    cY = int(M["m01"] / M["m00"])
    cv2.circle(masked, (cX, cY), 7, (255, 255, 255), -1)
    cv2.putText(masked, "center", (cX - 20, cY - 20),
        cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
    
    display_image = cv2.imencode('.png', masked)[1].tostring()
    disp2.value = display_image

In [None]:
from ipywidgets import interact, IntSlider

def foo2(lower_H, lower_S, lower_V, upper_H, upper_S, upper_V):
    lower = [lower_H, lower_S, lower_V]
    upper = [upper_H, upper_S, upper_V]
    thresholdHSV(snapshot, lower, upper)
    display(lower, upper)

widgets.interact(foo2, 
                 lower_H=IntSlider(min=0, max=255, step=1, value=0), 
                 lower_S=IntSlider(min=0, max=255, step=1, value=0),
                 lower_V=IntSlider(min=0, max=255, step=1, value=0),
                 upper_H=IntSlider(min=0, max=255, step=1, value=255),
                 upper_S=IntSlider(min=0, max=255, step=1, value=255), 
                 upper_V=IntSlider(min=0, max=255, step=1, value=255))

In [None]:
disp2

### Exercise 8

Record a video where you move the object you used for the color thresholding around in the scene. Then, using the threshold values you estimated when experimenting with above sliders, write a snippet that will read in a video file, and perform the following steps:

1. convert the video into HSV
3. performs color thresholding on the HSV images
4. estimates the position of the object based on the HSV image
5. displays a circle at the estimated center of mass of the object
6. stacks the original image and the thresholded image horizontally
7. labels each image
8. creates a video of the frames

Your output should look somewhat similar to the example video below.

In [7]:
camera = CameraStream(constraints=
                      {'facing_mode': 'user',
                       'audio': False,
                       'video': { 'width': 640, 'height': 480 }
                       })
camera

CameraStream(constraints={'facing_mode': 'user', 'audio': False, 'video': {'width': 640, 'height': 480}})

In [3]:
recorder2 = VideoRecorder(stream=camera)
recorder2

VideoRecorder(stream=CameraStream(constraints={'facing_mode': 'user', 'audio': False, 'video': {'width': 640, …

In [8]:
camera.close()

In [9]:
with open('color_threshold.webm', 'wb') as out_file:
    out_file.write(recorder2.video.value)

In [10]:
from IPython.display import display

foo = cv2.VideoCapture("color_threshold.webm")
writer = cv2.VideoWriter("exercise8.mp4", cv2.CAP_ANY, cv2.VideoWriter_fourcc(*"X264"), 12, (2*640,480))

lower_hsv = [15, 67, 95]
upper_hsv = [33, 200, 229]

lower_rgb = [139, 52, 100]
upper_rgb = [206, 114, 134]


ret, frame = foo.read()
while ret:
    ret, frame = foo.read()
    if not ret:
        break
    
    # color threshold using HSV
    frame_hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)   
    mask = cv2.inRange(frame_hsv, np.array(lower_hsv), np.array(upper_hsv))
    masked_hsv = cv2.bitwise_and(frame, frame, mask = mask)
    
    # draw contours from HSV thresholding
    contours, hierarchy = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    
    max_idx = 0
    max_val = 0
    for idx, c in enumerate(contours):
        if cv2.contourArea(c) > max_val:
            max_idx = idx
            max_val = cv2.contourArea(c)
        
    # compute the center of the biggest contour
    cv2.drawContours(masked_hsv, contours[max_idx], -1, (255, 255, 255), 2)
    M = cv2.moments(contours[max_idx])
    cX = int(M["m10"] / M["m00"])
    cY = int(M["m01"] / M["m00"])
    cv2.circle(masked_hsv, (cX, cY), 7, (255, 255, 255), -1)
    cv2.putText(masked_hsv, "center", (cX - 20, cY - 20),
        cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
    cv2.putText(masked_hsv, "HSV", (5, 30),
            cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 4)
    
    
    cv2.putText(frame, "Original", (5, 30),
            cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 4)
    cv2.drawContours(frame, contours[max_idx], -1, (0, 0, 255), 2)
    cv2.circle(frame, (cX, cY), 7, (0, 0, 255), -1)
    cv2.putText(frame, "center", (cX - 20, cY - 20),
        cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
    
    frame_stacked = np.hstack([frame, masked_hsv])
    writer.write(frame_stacked)
foo.release()
writer.release()

IndexError: list index out of range

In [11]:
widgets.Video.from_file("exercise8.mp4")

Video(value=b'')

## Face Detection in OpenCV

As the final topic of this lab, you will look into face detection using haar cascades; one of the classic methods to detect faces. It is computationally light weight, and often used as the first step in computer vision pipelines that processes faces.

Although deep learning is definitely on the rise, and will undoubtably replace haar cascades at some point in the future, there are still some limitations when it comes to running deep networks in real-time (25 FPS and more), or on embedded devices such as your smartphone. Hence it is good to know about both approaches.

![Some example Haar Features](https://i.pinimg.com/originals/1d/ce/fc/1dcefc0ea496c458cf73cc6721c055b4.jpg)

So what are haar-cascades? A haar cascade is a combination of two things (1) a set of [haar-like features](https://en.wikipedia.org/wiki/Haar-like_feature), and (2) a cascade classifier. A haar feature is a function that computes a real number from an image, and you can see a visualization of some example haar features in the image above. It is computed by summing all pixel values of the original image that are covered by the white region, and subtracting from that all pixels values that are covered by the black region. This results in a single real number, and, if you use a whole bunch of them, you can create a feature vector from an image. A cascade classifier on the other hand is a function that assigns a label to a set of features. It does so in stages, testing more and more features each time, hence the name cascade, and it tries to reject the example as quickly as possible.

A haar-cascade detects the position of a face using a sliding window. That is, it takes the input image, chopps it up into many small parts, and classifies each part into either being a face or not being a face. The majority of these chunks will not be faces; this is what makes haar-cascades so fast, because they are able to reject these regions quickly without spending too much computational resources on them.

In OpenCV it is quite easy to do all of this with a single function call: `cv2.CascadeClassifier.detectMultiScale`. You will have to supply it with a set of features to use, and an image to classify. Fortunately, OpenCV provides a sample set of features, and you can find it in the folder where this notebook is located. First, take a picture to detect a face in.

In [None]:
camera = CameraStream(constraints=
                      {'facing_mode': 'user',
                       'audio': False,
                       'video': { 'width': 640, 'height': 480 }
                       })
camera

In [None]:
recorder = ImageRecorder(stream=camera)
recorder

In [None]:
camera.close()

In [None]:
snapshot = recorder.image.value
snapshot = np.frombuffer(snapshot, dtype=np.uint8)
snapshot = cv2.imdecode(snapshot, cv2.IMREAD_COLOR)

Next, create the class which will perform the face detection

In [None]:
face_cascade = cv2.CascadeClassifier('frontal_face_features.xml')

It takes as input a gray image and outputs a list of bounding boxes (rectangles) that contain faces

In [None]:
snapshot_gray = cv2.cvtColor(snapshot, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(snapshot_gray)

All that is left is to draw these rectangles onto the image to visualize the classification

In [None]:
detection = snapshot.copy()
for (x, y, w, h) in faces:
    cv2.rectangle(detection, (x, y), (x+w, y+h), (255, 0, 0), 2)

In [None]:
display_image = cv2.imencode('.png', detection)[1].tostring()
widgets.Image(value=display_image)

### Exercise 9

Write a pice of code that will load a video and that, for each frame, (1) performs face detection, and (2) draws the bounding box of the detection.

In [None]:
foo = cv2.VideoCapture("capture.webm")
writer = cv2.VideoWriter("exercise9.mp4", cv2.CAP_ANY, cv2.VideoWriter_fourcc(*"X264"), 12, (640,480))

face_cascade = cv2.CascadeClassifier('frontal_face_features.xml')

ret, frame = foo.read()
while ret:
    ret, frame = foo.read()
    if not ret:
        break
    
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    
    # Processing
    faces = face_cascade.detectMultiScale(frame_gray)
    for (x, y, w, h) in faces:
        cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
            
    writer.write(frame)

foo.release()
writer.release()

### Exercise 10 (Bonus)

*Note: For this exercise I will only provide the keywords you need to search for instead of providing example snippets for each keyword. You will have to start consulting resources online, e.g, the documentation, sooner or later if you decide to do your own projects, so it is good to start now.*

Using the code you have written above, replace the rectangle that visualizes the detection with an image that is shown on top of a face (for example an emoji). For this, you will have to do the following:

1. Remove the line that creates the rectangle in the frame
2. Load the image that should be displayed on top of a frame before the main loop (Note: You want a .png image to allow for transparent pixels and use the right option of `cv2.imread` to load the alpha channel)
3. split the loaded image into the BGR part (normal image) and alpha channel (mask)
4. resize the image to fit the region in which a face was detected (using `cv2.resize`)
5. perform `alpha blending` between the image and the frame

In [None]:
camera = CameraStream(constraints=
                      {'facing_mode': 'user',
                       'audio': False,
                       'video': { 'width': 640, 'height': 480 }
                       })
camera

In [None]:
recorder = ImageRecorder(stream=camera)
recorder

In [None]:
snapshot = recorder.image.value
snapshot = np.frombuffer(snapshot, dtype=np.uint8)
snapshot = cv2.imdecode(snapshot, cv2.IMREAD_COLOR)

In [None]:
foo = cv2.VideoCapture("capture.webm")
writer = cv2.VideoWriter("exercise10.mp4", cv2.CAP_ANY, cv2.VideoWriter_fourcc(*"X264"), 12, (640,480))

img = cv2.imread('viking.png',  cv2.IMREAD_UNCHANGED)
mask = img[:,:,3]
img = img[:,:,:3]

mask = mask.astype(float) / 255

face_cascade = cv2.CascadeClassifier('frontal_face_features.xml')

ret, frame = foo.read()
while ret:
    ret, frame = foo.read()
    if not ret:
        break
    
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    
    # Processing
    faces = face_cascade.detectMultiScale(frame_gray)
    for (x, y, w, h) in faces:
        #cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
        resized_img = cv2.resize(img, (h,w)).astype(float)
        resized_mask = cv2.resize(mask, (h,w))
        frame_float = frame[y:y+h, x:x+w].astype(float)
        blend = (1-resized_mask[..., None]) * frame_float + resized_mask[..., None] * resized_img
        frame[y:y+h, x:x+w] = blend.astype(np.uint8)
            
    writer.write(frame)

foo.release()
writer.release()