# SIFT
- Local feature desriptors are broken down into two phases
- The first phase indentifies interesting, salient regions of an image that should be described and quantified. These regions are called keypoints and may correspond to edges or "blob"-like structures of an image.
- Then we extract and quantify the local region surrounding each keypoint. The feature vector associated with a keypoint is called local feature.
- Same [paper](https://gurus.pyimagesearch.com/wp-content/uploads/2015/06/lowe_1999.pdf) as DoG

## How it works
- SIFT feature detector requries a set of input keypoints. 
- Step 1:
    - For each of the input keypoints, SIFT takes the 16x16 pixel region surrounding the center pixel of the keypoint region.
    - From there, we divide the 16x16 region into sixteen 4x4 pixel windows.
- Step 2: 
    - For each of the 16 windows, we compute the gradient magnitude and orientation, just like we did for the HOG desecriptor.
    - Given the gradient magniture and orientation, we construct an 8-bin histogram for each of the 4x4 pixel windows.
    - Basically, we can take it this way:
        - In a particular neighborhood of 4x4, we have each pixel having its own gradient magnitude $G$ and $\theta$.
        - We construct a histo graph, where number of orientations, .ie., bins, we want to consider is 8. Each bin has an interval of 20 degrees. 
        - We loop through each pixel in the region, and add the gradient magnitude value to bin that corresponds to the pixel orientation. 
        - However, instead of raw magnitude, the algorithm utilizes Gaussian weighting,.ie., the farther the pixel is from keypoint center, less it contirbutes to overall histogram.
- Step 3:
    - Collect all 16 of these 8-bin orientation histograms and concatenate them together.
    - Given that we have 16 of these histogram, for each keypoint, we end up with a feature vector of $16*8=128-dim$
    - After concat, we L2-normalize the entire feature vector.
    - SIFT feacture vector is finished and ready to be compared to other SIFT feature vectors.
- Local descriptors return N feature vectors per image, where N is number of detected keypoints.
- This implies, if we get a 128-dim feature vector per keypoint, we end up with Nx128-dim feature vectors per image

In [1]:
import sys
sys.path.append("../../")

In [2]:
import numpy as np
import cv2
import imutils
from cv_imshow import display_image, create_subplot

In [3]:
args = {
    "image1":"../../images/fast_book_cover.png",
    "pen":"../../images/keypoint_detect/pen.jpg"
}

In [7]:
def sift(imagepath):
    image = cv2.imread(imagepath)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    #init keypoint detector
    detector = cv2.xfeatures2d.SIFT_create()
    
    #detect keypoints and extract local invariant descriptors.
    (kps, descs) = detector.detectAndCompute(gray, None)
    
    # show the shape of the keypoints and local invariant descriptors array
    print("[INFO] # of keypoints detected: {}".format(len(kps)))
    print("[INFO] feature vector shape: {}".format(descs.shape))

In [8]:
sift(args["image1"])

[INFO] # of keypoints detected: 660
[INFO] feature vector shape: (660, 128)


In [9]:
sift(args["pen"])

[INFO] # of keypoints detected: 175
[INFO] feature vector shape: (175, 128)
