# Course 1: Getting Started with Computer Vision

In this notebook, we are learning OpenCV to perform some basic operations on images.

The session starts at 13 a.m.

## Before we start

We should install OpenCV first. You can install it by running the following command in your terminal:

```bash
pip install opencv-python
```

### Parts

1. Recognition: Recognise the object, including classification;
2. Target Detection: Count the amount, check the position;
3. Action Recognition: Recognise the action, including the gesture;
4. Generate: Generate the image via prompts.

### Application

Facial Recognition, Facial Detection (please tell recognition (recognize who the person is) and detection (determine whether the object is a person) apart), Object Detection, Image Classification, Image Generation, etc.

The Computer Vision is the process of the machine automatically processing the image or video and get the information from it.

### Coordination System

The left-top corner is the origin, and the x-axis is from left to right, the y-axis is from top to bottom, starting from `0`.

We can regard the image as a matrix. Typically, the Gray image is a 2D matrix, and the RGB image is a 3D matrix.

In `OpenCV`, it uses `BGR`, whose order is [Blue, Green, Red], instead of `RGB`.

We sometimes use the coordinate system (x, y) to check out a point, but in a matrix, the same point is (y, x).

## OpenCV - A Brief Introduction

The basic operation of OpenCV includes the reading image, converting color, etc. It is quite easy so that I won't expand it here.

### The Basic Action of Image

1. Read the image;
2. Show the image;
3. Save the image;
4. Close the window.
5. Convert the color.
6. Get the pixel value.
7. Set the pixel value.
8. Get the image information.
9. Cut the image.
10. Merge the image.
11. Resize the image.
12. Rotate the image.
13. Flip the image.
14. Draw the shape.
15. Add the text.
16. Add the border.

#### File I/O

In [16]:
import cv2

image = cv2.imread('opencv/cats.jpeg')

cv2.imshow('Cats', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

cv2.imwrite('opencv/cats.png', image)

True

#### Convert the Color

Via `cv2.cvtColor`, we can convert the color of the image. The first parameter is the image, and the second parameter is the color space conversion code.

In [17]:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

cv2.imshow('Gray', gray)
cv2.waitKey(0)
cv2.destroyAllWindows()

#### Pixel Access

Because the image is a matrix, we can access the pixel value via the index of the matrix.

In [18]:
import numpy as np

pixel_114_514 = image[114, 514]
print(pixel_114_514)
image[114, 514] = np.array([191, 98, 10])

cv2.imshow('Cats', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

[252 252 252]


#### Image Information

We can get the image information via the `shape` attribute.

As for EXIF, we can use the `exifread` library.

In [19]:
image.shape

(768, 1024, 3)

#### Image Cropping

We can crop the image via the slicing operation.

In [20]:
target = image[100:200, 200:300]

cv2.imshow('Target', target)
cv2.waitKey(0)
cv2.destroyAllWindows()

#### Image Enhancement

We can perform the image enhancement via rotating, flipping, resizing, etc.

In [21]:
rotated = cv2.rotate(image, cv2.ROTATE_90_CLOCKWISE)
flipped = cv2.flip(image, 1)
resized = cv2.resize(image, (300, 300))

cv2.imshow('Rotated', rotated)
cv2.waitKey(0)
cv2.destroyAllWindows()

cv2.imshow('Flipped', flipped)
cv2.waitKey(0)
cv2.destroyAllWindows()

cv2.imshow('Resized', resized)
cv2.waitKey(0)
cv2.destroyAllWindows()

#### Drawing

There are commonly 3 types of drawing: line, rectangle, and text.

- Line: `cv2.line`
- Rectangle: `cv2.rectangle`
- Text: `cv2.putText`

In [22]:
with_line = image.copy()
cv2.line(with_line, (0, 0), (200, 200), (255, 0, 0), 5)

cv2.imshow('With Line', with_line)
cv2.waitKey(0)
cv2.destroyAllWindows()

with_rectangle = image.copy()
cv2.rectangle(with_rectangle, (100, 100), (200, 200), (0, 255, 0), 3)

cv2.imshow('With Rectangle', with_rectangle)
cv2.waitKey(0)
cv2.destroyAllWindows()

with_text = image.copy()
cv2.putText(with_text, 'Hello, OpenCV!', (100, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

cv2.imshow('With Text', with_text)
cv2.waitKey(0)
cv2.destroyAllWindows()

### Video Operations

Via `cv2.VideoCapture`, we can read the video and perform some operations on it. If your laptop has a camera, you can use it to capture the video as id `0`.

We should release the camera after using it.

In [24]:
camera = cv2.VideoCapture(0)

while camera.isOpened():
    ret, frame = camera.read()
    if not ret:
        break
    cv2.imshow('Frame', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

camera.release()

#### Video Reading

We can read the video via `cv2.VideoCapture` and get the frame via `read`.

In [25]:
video = cv2.VideoCapture('opencv/orange.mp4')

while video.isOpened():
    ret, frame = video.read()
    if not ret:
        break
    cv2.imshow('Frame', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

video.release()

### Color Detection

We can detect the color via the HSV color space. The HSV color space is more suitable for color detection.

The HSV color space includes three components:

- Hue: The color type, which is represented by the angle.
- Saturation: The purity of the color.
- Value: The brightness of the color.

The range of HSV is:

- Hue: [0, 179]
- Saturation: [0, 255]
- Value: [0, 255]

We can use the trackbar to adjust the value of the color.

In [None]:
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

#### Masks

We can use the mask to filter the color. The mask is a binary image, which is used to filter the color.

We can use the `cv2.inRange` function to get the mask.

The first parameter is the image, and the second and third parameters are the lower bound and upper bound of the color.

The function will return a binary image, which is the mask.

We can use the `cv2.bitwise_and` function to get the result.

For example, we are fetching the green one in OpenCV logo.

In [29]:
lower = np.array([40, 50, 50])
upper = np.array([80, 255, 255])

image = cv2.imread('opencv/opencvlogo.jpg')

hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

mask = cv2.inRange(hsv, lower, upper)

result = cv2.bitwise_and(image, image, mask=mask)

cv2.imshow('Mask', mask)
cv2.waitKey(0)
cv2.destroyAllWindows()

### Edge Detection

Via convolution, we can detect the border of the object.

Edge detection is an important research area in image processing and computer vision, especially in feature extraction.The purpose of edge detection is to identify the points in digital images with more obvious brightness changes, which usually exist between the target and background regions, and are an important basis for image segmentation.
Commonly used algorithms for edge detection are: Sobel algorithm, Laplace algorithm, Canny algorithm and so on.

#### Sobel algorithm

![sobel](./opencv/sobel.png)

Sobel algorithm simulates the first-order derivatives by weighting the gray levels of the upper, lower, left and right neighboring points in the spatial neighborhood of a pixel, and the larger the derivatives are, the more drastic the change is, and the more likely it is to be an edge.

Sobel edge detection is usually directional and can detect only horizontal or vertical edges or both.

In [31]:
image = cv2.imread('opencv/cats.jpeg')

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

core = np.array([
    [-1, 0, 1],
    [-2, 0, 2],
    [-1, 0, 1]
])

result = cv2.filter2D(gray, -1, core)

cv2.imshow('Result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()

The cat is so cute!

### Contour Detection

A contour is a series of curves of connected points that represent the basic shape of an object.
When we talk about contours, we think about edges, and they are indeed very similar. Simply put, contours are continuous and edges are not all continuous (below). Edges are mainly used as features of an image, while contours are mainly used to analyze the shape of an object, such as the perimeter and area of an object, etc. It can be said that edges include contours. The operation of finding contours is generally used for black and white images, so usually threshold segmentation or Canny edge detection is used to get a binary image first.

![](./opencv/contour.png)

Before the detection, you should binarization the image via `cv2.threshold` or masks.

In [36]:
masked = cv2.imread('opencv/mask.png')
gray = cv2.cvtColor(masked, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray, 240, 255, cv2.THRESH_BINARY)

contours, hierarchy = cv2.findContours(
    thresh,
    cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE)

cv2.drawContours(masked, contours, -1, (0, 255, 0), 3)

x, y, w, h = cv2.boundingRect(contours[0])
cv2.rectangle(masked, (x, y), (x + w, y + h), (0, 255, 255), 2)
cv2.circle(masked, (x + w // 2, y + h // 2), 5, (255, 0, 0), -1)

cv2.imshow('Contours', masked)
cv2.waitKey(0)
cv2.destroyAllWindows()

Then you can count the area, which can reduce some noise.

In [37]:
cv2.contourArea(contours[0])

3630.5

## Tasks

### Task 1: Trace the ping-pong ball

The task is to trace the ping-pong ball in the video.

The color of the ping-pong ball is `orange`, and the color space is `HSV`.

#### Color Detection

First of all, we should handle the mouse action to know the color range:

In [2]:
import cv2
import numpy as np

cv2.namedWindow('mouse trace')

def get_color(event, x, y, flags, param):
    if event == cv2.EVENT_LBUTTONDOWN:
        print(hsv[y, x])

video = cv2.VideoCapture('opencv/orange.mp4')

while video.isOpened():
    ret, frame = video.read()
    if not ret:
        break
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    cv2.imshow('mouse trace', frame)
    cv2.setMouseCallback('mouse trace', get_color)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

[151  32 231]
[  5 175 179]


##### The Color Range

The color range is:

- Lower Bound: [ 0, 50, 50]
- Upper Bound: [15, 255, 255]

Then we can detect the orange ball via the mask.

In [8]:
lower = np.array([0, 160, 160])
upper = np.array([10, 200, 200])

video = cv2.VideoCapture('opencv/orange.mp4')

while video.isOpened():
    ret, frame = video.read()
    if not ret:
        break
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(hsv, lower, upper)
    result = cv2.bitwise_and(frame, frame, mask=mask)
    
    ret, thresh = cv2.threshold(mask, 127, 255, 0)
    contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cv2.drawContours(frame, contours, -1, (0, 255, 0), 3)
    
    x, y, w, h = cv2.boundingRect(contours[0])
    cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 255), 2)
    
    cv2.imshow('Result', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

### Task 2: Traffic Light Detection

The task is to detect the traffic light in the image, which includes Red, Yellow, and Green, and reflect whether the light is on.

#### Color Detection

Through the color detection, we can get the mask of the traffic light.

In [40]:
import cv2

image = cv2.imread('opencv/redlight.jpg')

hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

def onMouse(event, x, y, flags, param):
    if event == cv2.EVENT_LBUTTONDOWN:
        print(hsv[y, x])

cv2.namedWindow('Traffic Light')
cv2.setMouseCallback('Traffic Light', onMouse)

cv2.imshow('Traffic Light', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

[ 27 219 100]
[ 27 219 100]
[ 26 214 105]
[ 26 222 108]
[ 26 222 108]
[ 26 222 108]
[ 23 220 108]
[12 93 63]
[ 23 209  84]
[ 23 143  82]
[ 24 211 105]
[ 24 226 106]
[ 28 244 203]
[ 29 242 198]
[ 28 239 197]


#### Handler for detecting the traffic light

Because the active light is like white, we decided to recognize the topest 2 inactive light.

In [114]:
from cv2.typing import MatLike

def determine_color(image: MatLike):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    
    image = image[50:200, 720:1083]
    
    color_list = {
        'red1': ([170, 90, 30], [180, 200, 80]),
        'red2': ([0, 90, 30], [10, 200, 100]),
        'yellow': ([20, 150, 90], [30, 230, 100]),
        'green': ([50, 150, 20], [65, 230, 80])
    }
    
    def detect_light(image, color):
        lower = np.array(color_list[color][0])
        upper = np.array(color_list[color][1])
        
        mask = cv2.inRange(image, lower, upper)
    
        result = cv2.bitwise_and(image, image, mask=mask)
        
        ret, thresh = cv2.threshold(mask, 127, 255, 0)
        contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        cv2.drawContours(image, contours, -1, (255, 255, 255), 3)
    
        if len(contours) == 0:
            return image, 0
        
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        
        # Get the biggest contour
        contours = sorted(contours, key=cv2.contourArea, reverse=True)
        
        amount = cv2.contourArea(contours[0])
        
        if len(contours) > 0:
            x, y, w, h = cv2.boundingRect(contours[0])
            cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 255), 2)
        
        return image, amount
    
    result1 = detect_light(image, 'red1')
    result2 = detect_light(image, 'red2')
    result3 = detect_light(image, 'yellow')
    result4 = detect_light(image, 'green')
    
    red = result2[1]
    yellow = result3[1]
    green = result4[1]
    
    return red, yellow, green

(r, y, g) = determine_color(cv2.imread('opencv/greenlight.jpg'))

if r == min(r, y, g):
    print('Red')
elif y == min(r, y, g):
    print('Yellow')
else:
    print('Green')

Green
