# SIFT Features and RANSAC

These notes are based on a [video](https://youtu.be/oT9c_LlFBqs) by Stachniss

## Keypoint Detection and Feature Description

A **keypoint** is an image location at which a description is computed.

The feature **descriptor** summarizes the local structure around the keypoint.

## Popular Feature Extractors
- **SIFT**: scale invariant feature transform
- **SURF**: speeded-up oriented gradients
- **HOG**: histogram of oriented gradients
- **GLOH**: gradient location and orientation histogram

## Keypoints

The purpose of the **keypoints** is to look for "locally distinct" points. The procedure for finding the keypoints is
1. Gaussian smoothing
2. Difference-of-Guassians: find extrema (over smoothing scales)
3. maxima suppression at edges

## Difference of Gaussians

We can subtract differently blurred images that use Gaussian kernels. This only keeps the frequencies between the blur level of both images while filtering out the high-frequencies. This is similar to a "band-pass" filter. 

## Extrema Supression

The **Difference of Gaussains** finds blob-like and corner-like image structures, but it also leads to strong responses along edges. Edges have a degree of freedom in their correspondence-solution. Multiple or orthogonal edges are required for an edge's solution. **SIFT** uses a criterion based on the ratio between the eigenvalues of the Hessian.

## SIFT Descriptor

Image content is transformed into features that are invariant to image translation, image rotation, and scale with the **SIFT** feature extractor. The image's content is partially invariant to illumination changes and affine or 3D projection. This is a good approach for mobile robots in order to detect visual landmarks from different angles, distances, and illumination.

## SIFT Features

A **SIFT** feature is given by a vector computed at a local extreme point in the scale space:
$$\langle p, s, r, f \rangle$$
where $p$ is the location in the image, $s$ is the scale, $r$ is the orientation, and $f$ is a 128-dimensional descriptor generated from the local image gradients.

The procedure is
1. Compute the image gradients in local $16 \times 16$ area at the selected scale
2. Create an array of orientation histograms
3. Use the 8 orientations with the $4 \times 4$ histogram array to produce a 128-dimensional descriptor

## Correspondence Problem

Choosing correspondences only based on descriptor differences will lead to some wrong matches.

There is a break in the lecture [here](https://youtu.be/oT9c_LlFBqs?t=2621).

## RANdom SAmple Consensus

**RANSAC** is an approach to deal with the outliers in the correspondence problem. This approach finds the best partition of points in the inlier and outlier sets and estimates a model based on the inlier set. This has become the standard approach for dealing with outliers. 

## RANSAC Algorithm

The **RANSAC** algorithm is fairly simple:
1. **Sample** the number of data points required to fit the model
2. **Computer** the model parameters using the sampled data points
3. **Score** by the fraction of inliers within a preset threshold of the model

We then repeat this process until the best model is found with the highest confidence.

## How to Chose the Parameters?

With a number of sampled points, $s$, and an outlier ratio, $e=\frac{\text{#outliers}}{\text{#datapoints}}$, produces the probability of $s$ points being free of outliers:
$$(1-e)^s\text{.}$$

This equation can also explain the probability of failing $T$ times:
$$(1-(1-e)^{s})^{T}\text{.}$$

Another way of wording this is that this is the probability of not finding the correct corresponding points.

In order to find the number of trails, $T$, required for a specific probability, $p$, we can rewrite the equation:
$$T =\frac{\texttt{log}(1-p)}{\texttt{log}(1-(1-e)^{s})} \text{.}$$

    TODO
    - Draw pictures and notes for "Odometry Model"