### Assignment number = 2
### Name = Varun Vashishtha
### Roll number = 2022201061

### Implement the SIFT detector and descriptor. Compute cluster centers for the Bag-of-Visual-Words approach. Represent the images as histograms (of visual words) and train a linear SVM model for 10-way classification.

In [84]:
import torch
import torchvision
import torchvision.transforms as transforms
import cv2
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
import numpy as np
from sklearn.cluster import KMeans
from sklearn.svm import SVC
from torchvision import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.utils import gen_batches

In [30]:
# Loading MNIST dataset
mnist_train = datasets.MNIST(root='./data', train=True, download=True)
mnist_test = datasets.MNIST(root='./data', train=False, download=True)
training_image_dataset = mnist_train.data.numpy()
training_labels = mnist_train.targets.numpy()
testing_images = mnist_test.data.numpy()
testing_labels = mnist_test.targets.numpy()

In [None]:
# SIFT detector
sift = cv2.SIFT_create()

# Extracting SIFT descriptors
def extract_sift_descriptors(image):
    keypoints, descriptors = sift.detectAndCompute(image, None)
    return keypoints,descriptors

# Extracting SIFT descriptors
training_descriptors = []
for image in training_image_dataset:
    keypoints,descriptors = extract_sift_descriptors(image.astype(np.uint8))
    if descriptors is not None:
        training_descriptors.append(descriptors)

# Stack all descriptors into a single array
training_descriptors = np.vstack(training_descriptors)

In [31]:
training_descriptors.shape

(469496, 128)

In [35]:
training_descriptors[0]

array([ 33.,   0.,   0.,   0.,   0.,   0.,   0.,  10., 144.,   0.,   0.,
         0.,   0.,   0.,   0.,  79.,  94.,   1.,   3.,  20.,  82.,  70.,
         3.,  28.,   0.,   0.,   6.,  22.,  60.,  97.,  79.,  20.,  70.,
         3.,   0.,   0.,   0.,   0.,   0.,  10., 144.,  40.,  19.,  13.,
         0.,   0.,   0.,  43.,  66.,  13.,  89., 144.,  13.,  15.,   1.,
         7.,   0.,   0., 144., 124.,   7.,  24.,  21.,   2.,  33.,  21.,
         1.,   0.,   0.,   0.,   0.,   1.,  92., 144., 112.,  24.,   0.,
         0.,   0.,   1.,   5.,  19., 125., 144.,   4.,   0.,   0.,   0.,
         0.,   0.,  64.,  71.,   1.,   0.,   0.,   0.,   0.,   4.,   1.,
         0.,   0.,   0.,   0.,   0.,   0.,  15.,  30.,   2.,   0.,   0.,
         0.,   0.,   0.,   1.,  26.,  15.,   0.,   0.,   0.,   0.,   0.,
         0.,   1.,   4.,   0.,   0.,   0.,   0.], dtype=float32)

In [32]:
# Cluster descriptors to get visual words
kmeans_classifier = KMeans(n_clusters=10, random_state=42)
kmeans_classifier.fit(training_descriptors)

KMeans(n_clusters=10, random_state=42)

In [33]:
kmeans_classifier.cluster_centers_.shape[0]

10

In [101]:

def compute_visual_word_histogram(image, kmeans_classifier):
    keypoints,descriptors = extract_sift_descriptors(image.astype(np.uint8))
    if descriptors is None:
        return np.zeros(len(kmeans_classifier.cluster_centers_))
    labels = kmeans_classifier.predict(descriptors)
    hist, _ = np.histogram(labels, bins=range(len(kmeans_classifier.cluster_centers_)+1),density=True)
    return hist


In [73]:
def compute_visual_word_histogram(image, kmeans_classifier):
    keypoints, descriptors = extract_sift_descriptors(image.astype(np.uint8))

    if descriptors is None:
        return np.zeros(len(kmeans_classifier.cluster_centers_))

    batch_size = 1000
    batches = gen_batches(len(descriptors), batch_size)
    hist = np.zeros(len(kmeans_classifier.cluster_centers_))

    for batch in batches:
        batch_descriptors = descriptors[batch]
        labels = kmeans_classifier.predict(batch_descriptors)
        hist_batch, _ = np.histogram(labels, bins=range(len(kmeans_classifier.cluster_centers_) + 1), density=True)
        hist += hist_batch * (batch.stop - batch.start) / len(descriptors)  

    return hist


training_histogram = []
for image in training_image_dataset:
    hist = compute_visual_word_histo(image, kmeans_classifier)
    training_histogram.append(hist)

In [76]:
training_histogram = np.array(training_histogram)

training_histogram.shape

(60000, 10)

In [2]:
training_histogram[100]

array([0. , 0. , 0.2 , 0. , 0.2 , 0. , 0. , 0.6 , 0. , 0. ])


In [77]:
training_labels = np.array(training_labels)

training_labels.shape

(60000,)

In [80]:

support_vector_machine_classifier = SVC(kernel='linear', random_state=42)
support_vector_machine_classifier.fit(training_histogram, train_labels)

SVC(kernel='linear', random_state=42)

In [126]:
def evaluate_model(testing_images, testing_labels, kmeans_classifier, svm_classifier):
    test_histograms = [compute_visual_word_histogram(image, kmeans_classifier) for image in testing_images]
    predictions = svm_classifier.predict(test_histograms)
    accuracy = accuracy_score(testing_labels, predictions)
    return accuracy

test_accuracy = evaluate_model(testing_images, testing_labels, kmeans_classifier, svm)
print("Test Accuracy:", test_accuracy)


Test Accuracy: 0.3996


### Keeping everything else constant, plot how classification accuracy changes as you sweep across 6 different values for the number of clusters. Please decide what numbers are meaningful for this question. Explain the trends in classification accuracy that you observe.


In the SIFT-BoVW-SVM approach, the number of clusters (or visual words) used in the Bag of Visual Words (BoVW) model can affect the accuracy of the model. Increasing the number of clusters can lead to an increase in accuracy up to a certain point due to the following reasons:

Better Representation: With more clusters, the BoVW model can represent the visual features of the images more accurately. This can lead to better discrimination between different classes.

Reduced Ambiguity: A higher number of clusters can reduce ambiguity in feature representation. This means that each visual word is more specific to certain types of features, making it easier for the SVM classifier to distinguish between classes.

Improved Generalization: Increasing the number of clusters can lead to a more detailed representation of the visual features, which can help the model generalize better to unseen data.

However, increasing the number of clusters beyond a certain point can lead to overfitting, where the model becomes too specialized on the training data and performs poorly on unseen data. Therefore, there is a trade-off between the number of clusters and model performance, and finding the optimal number of clusters is important for achieving the best accuracy.

In [105]:
import cv2
import numpy as np
from sklearn.cluster import KMeans
from sklearn.svm import SVC

def sift_accuracy(num_clusters = 2, training_image_dataset, testing_images, training_labels, testing_labels):
    sift = cv2.SIFT_create()

    def extract_sift_descriptor(image, sift):
        keypoints, descriptors = sift.detectAndCompute(image, None)
        return keypoints, descriptors

    training_descriptors = []
    for image in training_image_dataset:
        keypoints, descriptors = extract_sift_descriptor(image.astype(np.uint8), sift)
        if descriptors is not None:
            training_descriptors.append(descriptors)

    training_descriptors = np.vstack(training_descriptors)

    kmeans_classifier = KMeans(n_clusters=num_clusters, random_state=42)
    kmeans_classifier.fit(training_descriptors)

    num_samples = 12000
    try:
        selected_indices = np.random.choice(len(training_image_dataset), num_samples, replace=False)
    except ValueError:
        selected_indices = np.arange(len(training_image_dataset))
    
    selected_images = training_image_dataset[selected_indices]
    selected_labels = training_labels[selected_indices]

    training_histogramgrams = []
    for image in selected_images:
        hist = compute_visual_word_histogram(image, kmeans_classifier)
        training_histogramgrams.append(hist)

    training_histogramgrams = np.array(training_histogramgrams)
    selected_labels = np.array(selected_labels)

    support_vector_machine_classifier = SVC(kernel='linear', random_state=42)
    support_vector_machine_classifier.fit(training_histogramgrams, selected_labels)
    test_accuracy = evaluate_model(testing_images[:5000], testing_labels[:5000], kmeans_classifier, svm)
    print("Test Accuracy:", test_accuracy)

#### 2 CLUSTERS

In [106]:
sift_accuracy(num_clusters = 2, training_image_dataset = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.1848


#### 5 CLUSTERS

In [107]:
sift_accuracy(num_clusters = 5, training_image_dataset = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.3086


#### 10 CLUSTERS

In [108]:
sift_accuracy(num_clusters = 10, training_image_dataset = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.376


#### 20 CLUSTERS

In [109]:
sift_accuracy(num_clusters = 20,training_image_dataset = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.473


#### 50 CLUSTERS

In [110]:
sift_accuracy(num_clusters = 50,training_image_dataset = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.6112


#### 100 CLUSTERS

In [111]:
sift_accuracy(num_clusters = 100, training_image_dataset = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.6692


### Show the results for 6 different hyperparameter settings. You may play with the SIFT detector or descriptor and the linear SVM. Keep the number of clusters constant based on the answer to the previous question. Explain the trends in classification accuracy that you observe

As the contrast_threshold, edge_threshold, sigma values increased the accuracy increased going from 65 to alomst 70%


Also making the kernel complex also helped in capturing the complex information

In [117]:
def sift_parameter_accuracy(num_clusters, contrast_threshold, edge_threshold, sigma, svm_kernel, training_images, testing_images, training_labels, testing_labels):
    sift = cv2.SIFT_create(contrastThreshold=contrast_threshold, edgeThreshold=edge_threshold, sigma=sigma)

    def extract_sift(image, sift_instance):
        keypoints, descriptors = sift_instance.detectAndCompute(image, None)
        return keypoints, descriptors

    training_descriptors = []
    for image in training_images:
        keypoints, descriptors = extract_sift(image.astype(np.uint8), sift)
        if descriptors is not None:
            training_descriptors.append(descriptors)

    training_descriptors = np.vstack(training_descriptors)

    kmeans = KMeans(n_clusters=num_clusters, random_state=42)
    kmeans.fit(training_descriptors)

    num_samples = 12000
    selected_indices = np.random.choice(len(training_images), num_samples, replace=False)
    selected_images = training_images[selected_indices]
    selected_labels = training_labels[selected_indices]

    training_histogramgrams = []
    for image in selected_images:
        hist = compute_histogram(image, kmeans)
        training_histogramgrams.append(hist)

    training_histogramgrams = np.array(training_histogramgrams)
    selected_labels = np.array(selected_labels)

    svm_classifier = SVC(kernel=svm_kernel, random_state=42)
    svm_classifier.fit(training_histogramgrams, selected_labels)
    test_accuracy = evaluate_accuracy(testing_images, testing_labels, kmeans, svm_classifier)
    print("Test Accuracy:", test_accuracy)

#### Linear kernel

In [119]:
sift_parameter_accuracy(num_clusters=50, contrast_threshold = 0.04, edge_threshold = 10, sigma=1.5, svm_kernel = 'linear', training_images = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.6565


#### Polynomial kernel

In [120]:
sift_parameter_accuracy(num_clusters=50, contrast_threshold = 0.04, edge_threshold = 10, sigma=1.5, svm_kernel = 'poly', training_images = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.6656


#### Radial basis function kernel

In [121]:
sift_parameter_accuracy(num_clusters=50, contrast_threshold = 0.04, edge_threshold = 10, sigma=1.5, svm_kernel = 'rbf', training_images = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.6843


#### Sigmoid Kernel


In [122]:
sift_parameter_accuracy(num_clusters=50, contrast_threshold = 0.06, edge_threshold = 15, sigma=2.0, svm_kernel = 'sigmoid', training_images = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.6617


#### Polynomial Kernel


In [123]:
sift_parameter_accuracy(num_clusters=50,  contrast_threshold = 0.06, edge_threshold = 15, sigma=2.0, svm_kernel = 'poly', training_images = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.6755


#### RBF kernel


In [124]:
sift_parameter_accuracy(num_clusters=50, contrast_threshold = 0.06, edge_threshold = 15, sigma=2.0, svm_kernel = 'rbf', training_images = training_image_dataset, testing_images = testing_images, training_labels = training_labels, testing_labels = testing_labels)

Test Accuracy: 0.6925


Challenges : 

1) Knowing all the knitty-gritty of SIFT and then segregating and making of clusters.


2) Playing with hyperparameters and get to know the reasoning behind the changes.