## Setup notebook

Do all imports.

In [None]:
# For numerical methods
import numpy as np

# For image processing and visualization of results
import cv2
import matplotlib.pyplot as plt
from matplotlib.patches import ConnectionPatch

# For timing
import time

## Get images

Load images from files (example).

In [None]:
# Specify filenames
img1_filename = 'image01.PNG'
img2_filename = 'image02.PNG'

# Read images
img1 = cv2.imread(img1_filename, cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread(img2_filename, cv2.IMREAD_GRAYSCALE)

# Get width and height from first image
frame_width = img1.shape[1]
frame_height = img1.shape[0]

# Verify width and height of second image are the same
assert(img2.shape[1] == frame_width)
assert(img2.shape[0] == frame_height)

Load images from video (example).

In [None]:
# Specify filename
video_filename = 'video.MOV'

# Create a video reader
video_src = cv2.VideoCapture(video_filename)

# Say what frames we want to read
# - index of first frame
i_frame_1 = 0
# - index of last frame
i_frame_2 = int(video_src.get(cv2.CAP_PROP_FRAME_COUNT)) - 1

# Read first frame
video_src.set(cv2.CAP_PROP_POS_FRAMES, i_frame_1)
success, frame = video_src.read()
assert(success)
img1 = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# Read second frame
video_src.set(cv2.CAP_PROP_POS_FRAMES, i_frame_2)
success, frame = video_src.read()
assert(success)
img2 = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

## Do detection and matching with SIFT

#### Detection

Detect features.

In [None]:
# Create a SIFT feature detector
sift = cv2.SIFT_create()

# Apply detector to find keypoints (pts) and descriptors (desc) in each image
start_time = time.time()
pts1, desc1 = sift.detectAndCompute(image=img1, mask=None)
pts2, desc2 = sift.detectAndCompute(image=img2, mask=None)
elapsed_time = time.time() - start_time
print(f'Elapsed time for detection (seconds): {elapsed_time}')

Keypoints are returned as a [tuple](https://docs.python.org/3/library/stdtypes.html#typesseq).

In [None]:
type(pts1)

Here is the first keypoint that was found in the first image:

In [None]:
pts1[0]

The important thing for us is where this keypoint is located (in image coordinates):

In [None]:
pts1[0].pt

You can count the number of elements in each tuple of keypoints just like you would in a list:

In [None]:
print(f'Found {len(pts1)} features in img1')
print(f'Found {len(pts2)} features in img2')

You can also iterate through each tuple of keypoints just like you would iterate through a list:

In [None]:
for p in pts1:
    print(p.pt)

Visualize all detected features.

In [None]:
# Create figure with two axes
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 10))

# Show each image in its respective axis
ax1.imshow(img1, cmap='gray')
ax2.imshow(img2, cmap='gray')

# FIXME: Plot all features detected in img1 on ax1 (e.g., as red dots)
for p in pts1:
    pass

# FIXME: Plot all features detected in img2 on ax2 (e.g., as red dots)
for p in pts2:
    pass

plt.show()

**FIXME.** Answer the following questions:
* Where are the features in each image?
* Where *aren't* the features in each image?

OpenCV has its own way of visualizing detected features.

In [None]:
# Create figure with two axes
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 10))

# Show each image with detected features in its respective axis
ax1.imshow(cv2.drawKeypoints(img1, pts1, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS))
ax2.imshow(cv2.drawKeypoints(img1, pts1, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS))

plt.show()

Each keypoint is associated with a descriptor. Descriptors are rows in a 2D numpy array. Let's look at the descriptor associated with the first keypoint.

In [None]:
desc1[0]

What is its length?

In [None]:
len(desc1[0])

#### Matching

##### Brute force matching

Do "brute force" matching.

In [None]:
# Create a brute-force matcher
bf = cv2.BFMatcher(
    normType=cv2.NORM_L2,
    crossCheck=True,
)

# Use brute-force matcher to find matching descriptors
start_time = time.time()
matches = bf.match(desc1, desc2)
elapsed_time = time.time() - start_time
print(f'Elapsed time for matching (seconds): {elapsed_time}')

Matches are returned as a tuple.

In [None]:
type(matches)

How many did we find?

In [None]:
print(f'found {len(matches)} matches')

Here is the first match that was found.

In [None]:
matches[0]

Each match has three things that are important:
* The index of a keypoint (and descriptor) in the first image
* The index of a keypoint (and descriptor) in the second image
* The distance between the descriptors of these two keypoints

Here are those three things for the first match found:

In [None]:
# Index of keypoint/descriptor in first image
idx1 = matches[0].queryIdx

# Index of keypoint/descriptor in second image
idx2 = matches[0].trainIdx

# Distance between descriptors
d = matches[0].distance

print(f'KP {idx1} in img1 matched KP {idx2} in img2 (distance = {d:.4f})')

Since we specified `normType=cv2.NORM_L2` when creating the matcher, the distance between two descriptors is simply the 2-norm (i.e., the standard Euclidean norm) of their difference.

In [None]:
# FIXME - compute the norm of the difference between the descriptors associated with the first match
d_check = 0.

# Check that it is the same as the distance associated with the first match
print(f'distance:\n {d:12.8f} (from match)\n {d_check:12.8f} (from 2-norm of difference between descriptors)')

Two keypoints are a match if the distance between their descriptors is (1) smallest, and (2) below some threshold. We usually want to sort the matches by their distance.

In [None]:
# Sort matches by distance (smallest first)
matches = sorted(matches, key = lambda m: m.distance)

Visualize the best match.

In [None]:
# Create figure with two axes
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 10))

# Show each image in its respective axis
ax1.imshow(img1, cmap='gray')
ax2.imshow(img2, cmap='gray')

# Visualize match
# - Get first match
m = matches[0]
# - Get location of keypoints associated with first match
p1 = pts1[m.queryIdx].pt
p2 = pts2[m.trainIdx].pt
# - Plot location of each keypoint as a red dot
ax1.plot(p1[0], p1[1], 'r.', markersize=12)
ax2.plot(p2[0], p2[1], 'r.', markersize=12)
# - Zoom in on location of each keypoint
s = 10 # <-- FIXME: change if necessary
ax1.set_xlim(p1[0] - s, p1[0] + s)
ax1.set_ylim(p1[1] + s, p1[1] - s)
ax2.set_xlim(p2[0] - s, p2[0] + s)
ax2.set_ylim(p2[1] + s, p2[1] - s)

Visualize the worst match.

In [None]:
# FIXME

Visualize the $n$ best matches.

In [None]:
# Choose the number of matches to show
n = 50

# Create figure
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 10))

# Show images
ax1.imshow(img1, cmap='gray')
ax2.imshow(img2, cmap='gray')

# Show first match (FIXME: modify this code to show the n best matches)
# - Get first match
m = matches[0]
# - Get location of keypoints associated with first match
p1 = pts1[m.queryIdx].pt
p2 = pts2[m.trainIdx].pt
# - Draw line connecting keypoint in first image with keypoint in second image
fig.add_artist(
    ConnectionPatch(
        p1, p2, 
        'data', 'data',
        axesA=ax1, axesB=ax2,\
        color='red',
        connectionstyle='arc3, rad=0.',
        linewidth=0.5,
    )
)
# - Draw red dot at each keypoint
ax1.plot(p1[0], p1[1], 'r.', markersize=2)
ax2.plot(p2[0], p2[1], 'r.', markersize=2)

plt.show()

**FIXME.** Answer the following questions:
* Was the "best" match actually good?
* Was the "worst" match actually bad?
* How many of the $n$ best matches are actually good?
* What can you observe about the lines that correspond to the $n$ best matches? Describe them in words. What would you want this set of lines to look like if the $n$ best matches are actually good? Do they look like this in your case?

##### Matching with kNN ($k$ nearest neighbors)

Instead of finding only the "best" match for each descriptor, we can find the $k=2$ best matches.

In [None]:
# Create a brute-force matcher
bf = cv2.BFMatcher(
    normType=cv2.NORM_L2,
    crossCheck=False,       # <-- IMPORTANT - must be False for kNN matching
)

# Find the two best matches between descriptors (with distance below some threshold)
start_time = time.time()
matches = bf.knnMatch(desc1, desc2, k=2)
elapsed_time = time.time() - start_time
print(f'Elapsed time for matching (seconds): {elapsed_time}')

Matches is now a tuple of tuples.

In [None]:
matches

The first is actually two matches.

In [None]:
matches[0]

Let's look at these two matches.

In [None]:
idx1 = matches[0][0].queryIdx
idx2 = matches[0][0].trainIdx
d = matches[0][0].distance
print(f'(idx1 = {idx1}, idx2 = {idx2}) : distance = {d}')

idx1 = matches[0][1].queryIdx
idx2 = matches[0][1].trainIdx
d = matches[0][1].distance
print(f'(idx1 = {idx1}, idx2 = {idx2}) : distance = {d}')

Note that `idx1` is the same in both cases. You can think of these as two different candidate matches for the keypoint with index `idx1`. We would prefer to be certain about matches. Notice that the first distance is less than the second distance. That is, the first candidate is the "first best" match in the second image for the keypoint with index `idx1` in the first image, and the second candidate is the "second best" match.

**FIXME.** Answer the following question:
* What relationship between the distances would indicate greater certainty that the "first best" candidate match is actually good?

Implement your condition (hint - often called the "ratio test") to create a subset of "good matches."

In [None]:
good_matches = []
for m, n in matches:
    # m is the "first best" match
    # n is the "second best" match
    if True: # <-- FIXME: replace with your condition on m.distance and n.distance
        good_matches.append(m)

print(f'found {len(good_matches)} good matches')

**FIXME.** Answer the following question:
* How does the number of good matches vary with the ratio in your ratio test?

Sort `good_matches` by distance and rename as `matches` for convenience.

In [None]:
# Sort matches by distance (smallest first)
matches = sorted(good_matches, key = lambda m: m.distance)

Visualize the best good match.

In [None]:

# FIXME

Visualize the worst good match.

In [None]:
# FIXME

Visualize all good matches.

In [None]:
# FIXME

**FIXME.** Answer the following questions:
* Was the "best good match" actually good?
* Was the "worst good match" actually bad?
* How many of the good matches are actually good?
* What can you observe about the lines that correspond to the good matches? Do they look different than the lines you saw before? Do they look more (or less) like what you want?
* Which of your answers would change if you changed the threshold ratio in your ratio test?

#### Homography

Get the points in the source image (`img1`) and the target image (`img2`) that correspond to all the good matches.

In [None]:
pts_src = []
pts_dst = []
for m in matches:
    idx1 = m.queryIdx
    idx2 = m.trainIdx
    pts_src.append(pts1[idx1].pt)
    pts_dst.append(pts2[idx2].pt)
pts_src = np.array(pts_src)
pts_dst = np.array(pts_dst)

These points ($p$ from `pts_src` and $q$ from `pts_dst`), when expressed in homogeneous coordinates, are related by a homography:

$$\begin{bmatrix} q \\ 1 \end{bmatrix} \sim H \begin{bmatrix} p \\ 1 \end{bmatrix}$$

Use your code from HW1 to estimate this homography.

In [None]:
# FIXME

Visualize the results.

In [None]:
# Create figure
fig, ax = plt.subplots(1, 1, figsize=(15, 10))

# Show target image
ax.imshow(img2, cmap='gray')

# Compare predicted and actual location of matched points in the target image
for p, q in zip(pts_src, pts_dst):
    # FIXME - Use homography to predict q from p
    q_pred = q.copy()

    # Plot the actual q and the predicted q
    ax.plot(q[0], q[1], 'b.', markersize=18)
    ax.plot(q_pred[0], q_pred[1], 'r.', markersize=9)

plt.show()

## Do detection and matching with ORB

#### Detection

Detect features.

In [None]:
# Create an ORB feature detector
orb = cv2.ORB_create()

# Apply detector to find keypoints (pts) and descriptors (desc) in each image
start_time = time.time()
pts1, desc1 = orb.detectAndCompute(image=img1, mask=None)
pts2, desc2 = orb.detectAndCompute(image=img2, mask=None)
elapsed_time = time.time() - start_time
print(f'Elapsed time for detection (seconds): {elapsed_time}')

Say how many keypoints were found.

In [None]:
# FIXME

Visualize all detected features.

In [None]:
# FIXME

#### Matching

##### Brute force matching

Find the $k=2$ best matches for each keypoint.

In [None]:
# Create a brute-force matcher
bf = cv2.BFMatcher(
    normType=cv2.NORM_HAMMING,   # <-- IMPORTANT - the ORB descriptor is binary, so we use hamming distance rather than L2 distance
    crossCheck=False,            # <-- IMPORTANT - must be False for kNN matching
)

# Find the two best matches between descriptors (with distance below some threshold)
start_time = time.time()
matches = bf.knnMatch(desc1, desc2, k=2)
elapsed_time = time.time() - start_time
print(f'Elapsed time for matching (seconds): {elapsed_time}')

Find the subset of good matches.

In [None]:
# FIXME

Sort `good_matches` by distance and rename as `matches` for convenience.

In [None]:
# FIXME

Visualize the best good match.

In [None]:

# FIXME

Visualize the worst good match.

In [None]:
# FIXME

Visualize all good matches.

In [None]:
# FIXME

#### Homography

Get the points in the source image (`img1`) and the target image (`img2`) that correspond to all the good matches.

In [None]:
# FIXME

Estimate the homography between source and target images.

In [None]:
# FIXME

Visualize the results.

In [None]:
# FIXME

#### Discussion

**FIXME.** Compare results with ORB to results with SIFT, for example in terms of the following things:
* Computation time?
* Number of matches found?
* Extent to which good matches were actually good?
* Homography estimate?

## Get more information

**FIXME.** Do the following:
* Search for the 2022 edition (important!) of "Computer vision algorithms and applications" by Szeliski on the [university library website](https://library.illinois.edu)
* Download the complete PDF of this book
* Read Chapter 7.1