# SIFT (Scale-Invariant Feature Transform) #

Feature detection and description.

**+ Accurate**

**- Slow**

**- Non-free**

__1. Scale-space extrema detection:__

Use scale-space filtering to detect keypoint with different scales.

Laplacian of Gaussian (LoG) acts as a blob detector in various sizes due to different $\sigma$ but is costly.

![](img/sift_dog.jpg)

Thus, Difference of Gaussian (DoG) is used instead as an approximation of LoG. It is the difference of Gaussian blurring of an image with $\sigma$ and $k\sigma$. This process is applied for different octaves of the image in the Gaussian Pyramid.

When DoG are found for each resolution (octave/pyramid level) and scale ($\sigma$), images are searched for local extrema over scale and space. One pixel in an image is compared with its 8 neighbours, as well as 9 pixels in next scale, and 9 pixels in previous scale.

Optimal parameters are:
- number of octaves: 4
- number of scale levels: 5
- initial $\sigma$: 1.6
- k: $\sqrt{2}$

![](img/sift_local_extrema.jpg)

__2. Keypoint Localization:__

Potential keypoints are refined to eliminates low-contrast and edge keypoints. Taylor series expansion of scale space is used to get more accurate location of extrema. If the intensity at this extrema is less than a threshold, it is rejected.

DoG as higher response for edges.A 2x2 Hessian matrix is used to compute the principal curvature. Eigen values are then checked based on Harris corner detector observations: ratio of eigen values is compared to a threshold, if it is greater the keypoint is an edge and is discarded.

Thresholds:
- contrast: 0.03
- edge = 10

__3. Orientation assignement:__

A neighbourhood (size depending on the scale) is taken around the keypoint location, then the gradient magnitude and direction is calculated in that region.

An orientation histogram of 36 bins covering 360° is created. It is weighted by gradient magnitude and gaussian-weighted circular window ($\sigma = 1.5 \times keypoint scale$).

The highest peak in the histogram is taken and any peak above 80% of it is also considered to calculate the orientation.

![](img/sift_orientation_hist.png)

__4. Keypoint descriptor:__

A 16x16 neighbourhood around keypoint is taken, divided into 16 4x4 sub-blocks. For each sub-block, a 8 bin orientation histogram is created, for a total of 128 bins.

Then, it is represented as a vector to form keypoint descriptor.

__5. Keypoint matching:__

In some cases, the second closest-match may be very near to the first one. So, avery matches veryfying the following condition are rejected: $$\frac{1^{st}closest-match}{2^{nd}closest-match} < 0.8$$

It eliminates around 90% of false matches while discards only 5% correct matches.

In [2]:
# import cv2
# import numpy as np
# import matplotlib.pyplot as plt

# # image_file = "lena.png"
# image_file = "checkerboard.png"

# color_img = cv2.imread(image_file)
# img = cv2.cvtColor(color_img, cv2.COLOR_BGR2GRAY)
# color_img = cv2.cvtColor(color_img, cv2.COLOR_BGR2RGB)

# # Need non-free activated
# sift = cv2.SIFT()
# mask = np.uint8(np.ones(img.shape))
# kp, des = sift.detectAndCompute(img, None)

# color_img = cv2.drawKeypoints(color_img, kp, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

# plt.imshow(color_img)
# print(len(corners), "corners")

## References ##

[1] "Introduction to SIFT (Scale-Invariant Feature Transform)", https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.html#sift-intro

