In the world of computer, everything becomes numbers. 
That includes images. 
Computer sees images or pictures as a grid of numbers, or mathematically speaking Matrix. 
Each cell or element of the matrix represents Pixel of the image. 
The value of each cell can be 
- binary (0 or 1) for black & white images
- 0 - 255 for 8-bit gray scale images
- a set of 3 numbers for RGB images

The location or coordinate of each pixel is also important. 
Each image is 2-D plane, therefore, the coordinate is (x, y).
Please note that the origin (0, 0) locates on the top-left corner.
This is different than the mathematical number plane.
Additional, the number 1 is added as the third element of the coordinate.
For example, the pixel (0, 1) will be 
$$
\begin{pmatrix}
0\\
1 \\
1
\end{pmatrix}
$$



To move the pixel's location, we multiply a location with a 3x3 matrix. 
For example,
$$
\begin{pmatrix}
1 \\
3 \\
1
\end{pmatrix}
=
\begin{pmatrix}
1 & 0 & 1 \\
0 & 1 & 2 \\
0 & 0 & 1
\end{pmatrix}
\begin{pmatrix}
0 \\
1 \\
1
\end{pmatrix}
$$

The pixel is moved from (0, 1) to (1, 3). 

That 3x3 dictates how pixels move from one 2-D plane to another 2-D plane. The bottom-right element must be 1 all the time. That leave 8 variable to be resolved. Therefore, we need 8 variables.

Our target is to move a distorted view of the area to a flat 2-D plane. Then, we can do object detection and locate the robots. As we need 8 equations, we will pick 4 points.

In [None]:
# import libraries 
import cv2 as cv # computer vision
import numpy as np # Numpy for matrix operation
from pathlib import Path # file exploration
from PIL import Image # image library

In [None]:
# define an image's path
path = Path("../img") / 'cv-example1.jpg'
# load the image
img = Image.open(path)
img # display the image

In [None]:
# find the 4 corners (from other software)
corners = [(266, 51), (544, 144), (56, 250), (380, 405)]

In [None]:
img = np.array(img) # convert image to numpy (matrix)
for point in corners:
    img = cv.circle(img, point, 5, (255, 0, 0)) # draw corner for verification
Image.fromarray(img) # to display, we need to convert back to PIL's Image format

In [None]:
# We will map the four selected corners to these new coordinates
target = [(0, 0), (300, 0), (0, 300), (300, 300)]

In [None]:
# open the image again (with no red circle)
img = Image.open(path)
img = np.array(img)

In [None]:
# we have 8 equations to find the transformation matrix
H = cv.getPerspectiveTransform(
    # this function requires float 32 bit data type
    np.array(corners, dtype="float32"), np.array(target, dtype="float32")
)

In [None]:
# apply the transformation
img_flat = cv.warpPerspective(img, H, (300, 300))
Image.fromarray(img_flat)

Next step we want to locate the object of interest, in this case, a Raspberry Pi box. There are several ways to tackle this issue. Mainly traditional image processing and deep learning. We will explore the former method for this tutorial. 

Let's think about how our eyes and brain locate the object from the background, one obvious way is to look for color differences. There are other factors come to play as well such as texture. But we will explore the color world first. 

In [None]:
# computer see images as combination of 3 layers red, green, and blue (RGB).
# ** if you load image by OpenCV, the order of color will be BRG. 
# we load this one with PIL, so it is in RGB
Image.fromarray(img_flat[:, :, 0]) # red

In [None]:
Image.fromarray(img_flat[:, :, 1]) # blue

In [None]:
Image.fromarray(img_flat[:, :, 2]) # green

In [None]:
# RGB is the most common color space but there are many more. For example, we will explore Hue, Saturation, and Value (HSV)
img = cv.cvtColor(img_flat, cv.COLOR_BGR2HSV) # convert to HSV
Image.fromarray(img)

In [None]:
Image.fromarray(img[:, :, 0])  # hue

In [None]:
Image.fromarray(img[:, :, 1])  # saturation

In [None]:
Image.fromarray(img[:, :, 2])  # value

In [None]:
mask = cv.inRange(img, np.array([0, 100, 0]), np.array([255, 255, 255])) # put a mark for pixel that has saturation more than 100
Image.fromarray(mask)

In [None]:
contours, _ = cv.findContours(mask, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE) # find contours
Image.fromarray(cv.drawContours(img_flat.copy(), contours, -1, (0, 255, 0), 1))

In [None]:
contours = [
    cnt for cnt in contours if cv.contourArea(cnt) > 20 and cv.contourArea(cnt) < 100
] # only keep contours that fit our criteria
Image.fromarray(cv.drawContours(img_flat, contours, -1, (0, 255, 0), 1))

In [None]:
# find the center coordinate
cnt = contours[0]
moment = cv.moments(cnt)
x = int(moment["m10"] / moment["m00"])
y = int(moment["m01"] / moment["m00"])

In [None]:
# draw the coordinate
cv.putText(
    img_flat,
    f"{x}, {y}",
    (x - 20, y - 20),
    cv.FONT_HERSHEY_SIMPLEX,
    0.5,
    (0, 0, 0),
    2,
);
Image.fromarray(img_flat)