CS576 Assignment #1: Image Classification using Bag of Visual Words (BoVW) 
====
Primary TA : Jaehoon Yoo

TA's E-mail : wogns98@kaist.ac.kr, whieya@kaist.ac.kr

QnA Channel: https://join.slack.com/t/kaistcs576/shared_invite/zt-o3gqak0y-yj3NCb_SQFxVkqO0U6PWYw
## Instruction
- In this assignment, we will classify the images into five categories (aeroplane, backgrounds, car, horse, motorcycle, person) using Bag of Visual Word (BoVW) and Support Vector Machine (SVM).
 
- We will extract the SIFT descriptors from the images and construct a codebook. After that, we will encode the images to histogram features using codebook, and train the classifier using those features.

- As you follow the given steps, fill in the section marked ***Problem*** with the appropriate code. There are **7 problems** in total.
    - For **Problem 1 ~ Problem 4**, you will get full credits (10pt each) if you implement correctly.  
    - For **Problem 5 ~ Problem 7**, you **have to write a discussion about the results** as well as implementing the codes. Each problem takes 5pt for the correct implementation and 5 pt for proper discussion. In other words, you will get only 5pt without proper discussion even if you correctly implement the codes. To get full credit for discussion, please follow **Discussion Guidelines**.

## Discussion Guidelines
- You should write a discussion about **Problem 5 ~ Problem 7** on the **Discussion and Analysis** section. 
- Simply reporting the scores (e.g. classification accuracy) is not considered as a discussion.
- For each problem's discussion, you should explain and compare how each method improves the results. 

## Submission guidelines
- Your code and report will be all in Colab. Copy this example to your google drive and edit it to complete your assignment. 
- <font color="red"> You will get the full credit **only if** you complete the code **and** write a discussion of the results in the discussion section at the bottom of this page. </font>
- We should be able to reproduce your results using your code. Please double-check if your code runs without error and reproduces your results. Submissions failed to run or reproduce the results will get a substantial penalty. 
- <font color="red"> **DO NOT modify any of the skeleton codes when you submit.** Please write your codes only in the designated area. </font>
- As a proof that you've ran this code by yourself, **make sure your notebook contains the output of each code block.**

## Deliverables
- Download your Colab notebook, and submit it in a format: [StudentID].ipynb.
- Your assignment should be submitted through KLMS. All other submissions (e.g., via email) will not be considered as valid submissions. 

## Due date
- **23:59:59 April 7th.**
- Late submission is allowed until 23:59:59 April 9th.
- Late submission will be applied 20% penalty.



## Questions
- Please use the SLACK channel (https://join.slack.com/t/kaistcs576/shared_invite/zt-o3gqak0y-yj3NCb_SQFxVkqO0U6PWYw) as a main communication channel. 
When you post questions, please make it public so that all students can share the information. Please use the prefix "[Assignment 1]" in the subject for all questions regarding this assignment (e.g., [Assignment 1] Regarding the grading policy).



## Step 0: Set the enviroments
For this assignment, you need the special library for extracting features & training classifier (cyvlfeat & sklearn).
This step takes about 5~15 minutes.

###  0-1: Download cyvlfeat library & conda

In [None]:
# install conda on colab
!wget -c https://repo.continuum.io/archive/Anaconda3-5.3.1-Linux-x86_64.sh
!chmod +x Anaconda3-5.3.1-Linux-x86_64.sh
!bash ./Anaconda3-5.3.1-Linux-x86_64.sh -b -f -p /usr/local

# install cyvlfeat
# Reference : https://anaconda.org/menpo/cyvlfeat
# Update URL (2021/03/22)
!conda install -c menpo cyvlfeat python==3.7 -y
!conda install cython numpy scipy -y

import sys
sys.path.append('/cyvlfeat')
sys.path.append('/usr/local/lib/python3.7/site-packages/')

!git clone https://github.com/menpo/cyvlfeat.git /cyvlfeat
!cd /cyvlfeat && CFLAGS="-I$CONDA_PREFIX/include" LDFLAGS="-L$CONDA_PREFIX/lib" pip install -e ./

--2021-04-04 07:26:23--  https://repo.continuum.io/archive/Anaconda3-5.3.1-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.18.200.79, 104.18.201.79, 2606:4700::6812:c84f, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.18.200.79|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh [following]
--2021-04-04 07:26:23--  https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.130.3, 104.16.131.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.130.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 667976437 (637M) [application/x-sh]
Saving to: ‘Anaconda3-5.3.1-Linux-x86_64.sh’


2021-04-04 07:26:26 (225 MB/s) - ‘Anaconda3-5.3.1-Linux-x86_64.sh’ saved [667976437/667976437]

PREFIX=/usr/local
reinstalling: python-3.7.0-hc3d631a

###  0-2: Connect to your Google Drive.

It is required for loading the data.

Enter your authorization code to access your drive.


In [None]:
# mount drive https://datascience.stackexchange.com/questions/29480/uploading-images-folder-from-my-system-into-google-colab
import os
from google.colab import drive
drive.mount('/gdrive')

Mounted at /gdrive


### 0-3: Import modules

In [None]:
# Import libraries
import os
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import glob
import cyvlfeat
import time
import scipy
import multiprocessing

## Helper functions

In [None]:
def euclidean_dist(x, y):
    """
    :param x: [m, d]
    :param y: [n, d]
    :return:[m, n]
    """
    m, n = x.shape[0], y.shape[0]    
    eps = 1e-6 

    xx = np.tile(np.power(x, 2).sum(axis=1), (n,1)) #[n, m]
    xx = np.transpose(xx) # [m, n]
    yy = np.tile(np.power(y, 2).sum(axis=1), (m,1)) #[m, n]
    xy = np.matmul(x, np.transpose(y)) # [m, n]
    dist = np.sqrt(xx + yy - 2*xy + eps)

    return dist

def read_img(image_path):
    img = Image.open(image_path).convert('L')
    img = img.resize((480, 480))
    return np.float32(np.array(img)/255.)

def read_txt(file_path):
    with open(file_path, "r") as f:
        data = f.read()
    return data.split()
    
def dataset_setup(data_dir):
    train_file_list = []
    val_file_list = []

    for class_name in ['aeroplane','background','car','horse','motorbike','person']:
        train_txt_path = os.path.join(data_dir, class_name+'_train.txt')
        train_file_list.append(np.array(read_txt(train_txt_path)))
        val_txt_path = os.path.join(data_dir, class_name+'_val.txt')
        val_file_list.append(np.array(read_txt(val_txt_path)))

    train_file_list = np.unique(np.concatenate(train_file_list))
    val_file_list = np.unique(np.concatenate(val_file_list))

    f = open(os.path.join(data_dir, "train.txt"), 'w')
    for i in range(train_file_list.shape[0]):
        data = "%s\n" % train_file_list[i]
        f.write(data)
    f.close()

    f = open(os.path.join(data_dir, "val.txt"), 'w')
    for i in range(val_file_list.shape[0]):
        data = "%s\n" % val_file_list[i]
        f.write(data)
    f.close()

def load_train_data(data_dir):
    dataset_setup(data_dir)
    num_proc = 12 # num_process

    txt_path = os.path.join(data_dir, 'train.txt')
    file_list = read_txt(txt_path)
    image_paths = [os.path.join(data_dir+'/images', file_name+'.jpg') for file_name in file_list]
    with multiprocessing.Pool(num_proc) as pool:
      imgs = pool.map(read_img, image_paths)
      imgs = np.array(imgs)
      idxs = np.array(file_list)

    return imgs, idxs

def load_val_data(data_dir):
    dataset_setup(data_dir)
    num_proc = 12 # num_process

    txt_path = os.path.join(data_dir, 'val.txt')
    file_list = read_txt(txt_path)
    image_paths = [os.path.join(data_dir+'/images', file_name+'.jpg') for file_name in file_list]
    with multiprocessing.Pool(num_proc) as pool:
      imgs = pool.map(read_img, image_paths)
      imgs = np.array(imgs)
      idxs = np.array(file_list)
    
    return imgs, idxs

def get_labels(idxs, target_idxs):
    """
    Get the labels from file index(name).

    :param idxs(numpy.array): file index(name). shape:[num_images, ]
    :param target_idxs(numpy.array): target index(name). shape:[num_target,]
    :return(numpy.array): Target label(Binary label consisting of True and False). shape:[num_images,]
    """
    return np.isin(idxs, target_idxs)

def load_train_idxs(data_dir):
    txt_path = os.path.join(data_dir, 'train.txt')
    train_idxs = np.array(read_txt(txt_path))
    return train_idxs

def load_val_idxs(data_dir):
    txt_path = os.path.join(data_dir, 'val.txt')
    val_idxs = np.array(read_txt(txt_path))
    return val_idxs

## Step 1: Load the data

In [None]:
''' 
Set your data path for loading images & labels.
Example) CS_DATA_DIR = '/gdrive/My Drive/data'
'''

%env CS_DATA_DIR=/gdrive/My Drive/data
!mkdir -p "$CS_DATA_DIR"
!cd "$CS_DATA_DIR" && wget http://www.di.ens.fr/willow/events/cvml2013/materials/practicals/category-level/practical-category-recognition-2013a-data-only.tar.gz && tar -zxf practical-category-recognition-2013a-data-only.tar.gz

env: CS_DATA_DIR=/gdrive/My Drive/data
--2021-04-03 15:24:16--  http://www.di.ens.fr/willow/events/cvml2013/materials/practicals/category-level/practical-category-recognition-2013a-data-only.tar.gz
Resolving www.di.ens.fr (www.di.ens.fr)... 129.199.99.14
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.di.ens.fr/willow/events/cvml2013/materials/practicals/category-level/practical-category-recognition-2013a-data-only.tar.gz [following]
--2021-04-03 15:24:16--  https://www.di.ens.fr/willow/events/cvml2013/materials/practicals/category-level/practical-category-recognition-2013a-data-only.tar.gz
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘practical-category-recognition-2013a-data-only.tar.gz’

practical-category-     [ <=>                ] 964.15M  3.71MB/s    i

In [None]:
category = ['aeroplane', 'car', 'horse', 'motorbike', 'person'] # DON'T MODIFY THIS.
data_dir = os.path.join(os.environ["CS_DATA_DIR"], "practical-category-recognition-2013a", "data")

## Step 2: Bag of Visual Words (BoVW) Construction

### 2-1. (**Problem 1**): SIFT descriptor extraction & Save the descriptors (10pt)

In [None]:
def SIFT_extraction(imgs):
    """
    Extract Local SIFT descriptors from images using cyvlfeat.sift.sift().
    Refer to https://github.com/menpo/cyvlfeat
    You should set the parameters of cyvlfeat.sift.sift() as bellow.
    1.compute_descriptor = True  2.float_descriptors = True

    :param train_imgs(numpy.array): Gray-scale images in Numpy array format. shape:[num_images, width_size, height_size]
    :return(numpy.array): SIFT descriptors. shape:[num_images, ], ndarray with object(descripotrs)
    """
    # YOUR CODE HERE
    collect_descriptor = []
    for i in range(imgs.shape[0]):
      f, d = cyvlfeat.sift.sift(imgs[i],compute_descriptor = True, float_descriptors = True)
      collect_descriptor.append(d)
    return np.array(collect_descriptor)

### 2-2. (**Problem 2**): Codebook(Bag of Visual Words) construction (10pt)
In this step, you will construct the codebook using K-means clustering.

In [None]:
def get_codebook(des , k):
  """
  Construct the codebook with visual codewords using k-means clustering.
  In this step, you should use cyvlfeat.kmeans.kmeans().
  Refer to https://github.com/menpo/cyvlfeat

  :param des(numpy.array): Descriptors.  shape:[num_images, num_des_of_each_img, 128]
  :param k(int): Number of visual words.
  :return(numpy.array): Bag of visual words shape:[k, 128]
  """
  # YOUR CODE HERE
  all_des = []
  for i in range(des.shape[0]):
    for j in range(des[i].shape[0]):
      all_des.append(des[i][j]) 
  all_des = np.array(all_des) 
  centers = cyvlfeat.kmeans.kmeans(data=all_des, num_centers=k)
  return centers  # ceneters is numpy array type 

### 2-3. (**Problem 3**): Encode images to histogram feature based on codewords (10pt)

In [None]:
def extract_features(des, codebook):
  """
  Construct the Bag-of-visual-Words histogram features for images using the codebook.
  HINT: Refer to helper functions.

  :param des(numpy.array): Descriptors.  shape:[num_images, num_des_of_each_img, 128]
  :param codebook(numpy.array): Bag of visual words. shape:[k, 128]
  :return(numpy.array): Bag of visual words shape:[num_images, k]

  """
  # YOUR CODE HERE
  BOW = np.zeros(shape=(des.shape[0],codebook.shape[0]), dtype='float32')
  for i in range(des.shape[0]):
    distance_matrix = euclidean_dist(des[i], codebook)
    # distance_matrix.shape = (num_des_of_each_img, k)
    # num_des_of_each_img can be different dependent to image
    min_indexs = np.argmin(distance_matrix,axis=1) # a list of element in range [0,..k-1]
    for j in range(len(min_indexs)):
      BOW[i][min_indexs[j]] += 1
  return BOW     #BOW has type float32

## Step 3. (**Problem 4**): Train the classifiers (10pt)
Train a classifier using the sklearn library (SVC) 

In [None]:
from sklearn.svm import SVC

In [None]:
def train_classifier(features, labels, svm_params):
  """
  Train the SVM classifier using sklearn.svm.svc()
  Refer to https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

  :param features(numpy.array): Historgram representation. shape:[num_images, dim_feature]
  :param labels(numpy.array): Target label(binary). shape:[num_images,]
  :return(sklearn.svm.SVC): Trained classifier
  """
  # Your code here
  clf = SVC(**svm_params)
  clf.fit(features, labels)
  return clf


In [None]:
def Trainer(feat_params, svm_params):
    
    """
    Train the SVM classifier.

    :param feat_params(dict): parameters for feature extraction.
        ['extractor'](function pointer): function for extrat local descriptoers. (e.g. SIFT_extraction, DenseSIFT_extraction, etc)
        ['num_codewords'](int):
        ['result_dir'](str): Diretory to save codebooks & results.
        
    :param svm_params(dict): parameters for classifier training.
        ['C'](float): Regularization parameter.
        ['kernel'](str): Specifies the kernel type to be used in the algorithm.
   
    :return(sklearn.svm.SVC): trained classifier
    """
    
    extractor = feat_params['extractor']
    k = feat_params['num_codewords']
    result_dir = feat_params['result_dir']
    
    if not os.path.isdir(result_dir):
        os.mkdir(result_dir)
    
    print("Load the training data...")
    start_time = time.time()
    train_imgs, train_idxs = load_train_data(data_dir)
    print("{:.4f} seconds".format(time.time()-start_time))
    
    print("Extract the local descriptors...")
    start_time = time.time()
    train_des = extractor(train_imgs)
    np.save(os.path.join(result_dir, 'train_des.npy'), train_des)
    print("{:.4f} seconds".format(time.time()-start_time))
    
    del train_imgs
    
    print("Construct the bag of visual words...")
    start_time = time.time()
    codebook = get_codebook(train_des, k)
    np.save(os.path.join(result_dir, 'codebook.npy'), codebook)
    print("{:.4f} seconds".format(time.time()-start_time))

    print("Extract the image features...")
    start_time = time.time()    
    train_features = extract_features(train_des, codebook)
    np.save(os.path.join(result_dir, 'train_features.npy'), train_features)
    print("{:.4f} seconds".format(time.time()-start_time))

    del train_des, codebook
    
    print('Train the classifiers...')
    accuracy = 0
    models = {}
    
    for class_name in category:
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_train.txt'.format(class_name)))])
        target_labels = get_labels(train_idxs, target_idxs)
        
        models[class_name] = train_classifier(train_features, target_labels, svm_params)
        train_accuracy = models[class_name].score(train_features, target_labels) 
        print('{} Classifier train accuracy:  {:.4f}'.format(class_name ,train_accuracy))
        accuracy += train_accuracy
    
    print('Average train accuracy: {:.4f}'.format(accuracy/len(category)))
    del train_features, target_labels, target_idxs

    return models

In [None]:
feat_params = {'extractor': SIFT_extraction, 'num_codewords':1024, 'result_dir':os.path.join(data_dir,'sift_1024')}
svm_params = {'C': 1, 'kernel': 'linear'}

- Below code will take about 30~70 minutes.

In [None]:
models = Trainer(feat_params, svm_params)

Load the training data...
24.3603 seconds
Extract the local descriptors...


  app.launch_new_instance()


335.6140 seconds
Construct the bag of visual words...
3492.3808 seconds
Extract the image features...
49.1563 seconds
Train the classifiers...
aeroplane Classifier train accuracy:  1.0000
car Classifier train accuracy:  1.0000
horse Classifier train accuracy:  1.0000
motorbike Classifier train accuracy:  1.0000
person Classifier train accuracy:  0.9454
Average train accuracy: 0.9891


## Step 4: Test the classifier on validation set



In [None]:
def Test(feat_params, models):
    """
    Test the SVM classifier.

    :param feat_params(dict): parameters for feature extraction.
        ['extractor'](function pointer): function for extrat local descriptoers. (e.g. SIFT_extraction, DenseSIFT_extraction, etc)
        ['num_codewords'](int):
        ['result_dir'](str): Diretory to load codebooks & save results.
        
    :param models(dict): dict of classifiers(sklearn.svm.SVC)
    """
    
    extractor = feat_params['extractor']
    k = feat_params['num_codewords']
    result_dir = feat_params['result_dir']
    
    print("Load the validation data...")
    start_time = time.time()
    val_imgs, val_idxs = load_val_data(data_dir)
    print("{:.4f} seconds".format(time.time()-start_time))
    
    print("Extract the local descriptors...")
    start_time = time.time()
    val_des = extractor(val_imgs)
    np.save(os.path.join(result_dir, 'val_des.npy'), val_des)
    print("{:.4f} seconds".format(time.time()-start_time))
    
    
    del val_imgs
    codebook = np.load(os.path.join(result_dir, 'codebook.npy'))
    
    print("Extract the image features...")
    start_time = time.time()    
    val_features = extract_features(val_des, codebook)
    np.save(os.path.join(result_dir, 'val_features.npy'), val_features)
    print("{:.4f} seconds".format(time.time()-start_time))

    del val_des, codebook

    print('Test the classifiers...')
    accuracy = 0
    for class_name in category:
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_val.txt'.format(class_name)))])
        target_labels = get_labels(val_idxs, target_idxs)
        
        val_accuracy = models[class_name].score(val_features, target_labels)
        print('{} Classifier validation accuracy:  {:.4f}'.format(class_name ,val_accuracy))
        accuracy += val_accuracy
    
    del val_features, target_idxs, target_labels

    print('Average validation accuracy: {:.4f}'.format(accuracy/len(category)))

In [None]:
Test(feat_params ,models)

Load the validation data...
26.1141 seconds
Extract the local descriptors...


  app.launch_new_instance()


312.0194 seconds
Extract the image features...
49.8063 seconds
Test the classifiers...
aeroplane Classifier validation accuracy:  0.9406
car Classifier validation accuracy:  0.7425
horse Classifier validation accuracy:  0.9002
motorbike Classifier validation accuracy:  0.9163
person Classifier validation accuracy:  0.5829
Average validation accuracy: 0.8165


## **Problem 5**: Implement Dense SIFT (10pt)
Modify the feature extractor using the dense SIFT and evaluate the performance.

In [None]:
def DenseSIFT_extraction(imgs):
  """
  Extract Dense SIFT descriptors from images using cyvlfeat.sift.dsift().
  Refer to https://github.com/menpo/cyvlfeat
  You should set the parameters of cyvlfeat.sift.dsift() as bellow.
    1.step = 12  2.float_descriptors = True

  :param train_imgs(numpy.array): Gray-scale images in Numpy array format. shape:[num_images, width_size, height_size]
  :return(numpy.array): Dense SIFT descriptors. shape:[num_images, num_des_of_each_img, 128]
  """
  # YOUR CODE HERE
  collect_descriptor = []
  for i in range(imgs.shape[0]):
    f, d = cyvlfeat.sift.dsift(image=imgs[i], step=12, float_descriptors = True)
    collect_descriptor.append(d) 
  return np.array(collect_descriptor)

In [None]:
feat_params = {'extractor': DenseSIFT_extraction, 'num_codewords':1024, 'result_dir':os.path.join(data_dir,'dsift_1024')}
svm_params = {'C': 1, 'kernel': 'linear'}

In [None]:
models = Trainer(feat_params, svm_params)

Load the training data...
68.6236 seconds
Extract the local descriptors...
422.3650 seconds
Construct the bag of visual words...
7357.9805 seconds
Extract the image features...
100.3175 seconds
Train the classifiers...
aeroplane Classifier train accuracy:  1.0000
car Classifier train accuracy:  1.0000
horse Classifier train accuracy:  1.0000
motorbike Classifier train accuracy:  1.0000
person Classifier train accuracy:  0.9765
Average train accuracy: 0.9953


In [None]:
Test(feat_params ,models)

Load the validation data...
51.7456 seconds
Extract the local descriptors...
412.7966 seconds
Extract the image features...
95.2232 seconds
Test the classifiers...
aeroplane Classifier validation accuracy:  0.9543
car Classifier validation accuracy:  0.7963
horse Classifier validation accuracy:  0.9163
motorbike Classifier validation accuracy:  0.9131
person Classifier validation accuracy:  0.5732
Average validation accuracy: 0.8306


## **Problem 6**: Implement the Spatial Pyramid (10pt)
Modify the feature extractor using the spatial pyramid matching and evaluate the performance.


In [None]:
def SpatialPyramid(des, codebook):
  """
  Extract image representation with Spatial Pyramid Matching using your DenseSIFT descripotrs & codebook.

  :param des(numpy.array): DenseSIFT Descriptors.  shape:[num_images, num_des_of_each_img, 128]
  :param codebook(numpy.array): Bag of visual words. shape:[k, 128]

  :return(numpy.array): Image feature using SpatialPyramid [num_images, features_dim]
  """
  # YOUR CODE HERE
  # level 0
  k = codebook.shape[0]
  num_des_each_image = des.shape[1] # 1600
  num_des_each_edge = int(np.sqrt(num_des_each_image)) # 40
  all_ = []
  for index_img in range(des.shape[0]):
    his_img = []
    # level 0
    bow0 = np.zeros(shape=k, dtype='float32')
    img_all_des = des[index_img]
    distance_matrix = euclidean_dist(img_all_des, codebook) # return (num_des_of_each_img, k)
    min_indexs = np.argmin(distance_matrix,axis=1)
    for j in range(len(min_indexs)):
      bow0[min_indexs[j]] += 1
    his_img.extend(bow0)
    
    # level 1
    win_size1 = int(num_des_each_edge//2) # win_size1 =  20
    bow1s = []
    for j in range(4):
      bow = np.zeros(shape=k, dtype='float32')
      bow1s.append(bow)
    for j in range(num_des_each_image):
      closest_center_index = min_indexs[j]
      x_cor = (j%num_des_each_edge)//win_size1
      y_cor = (j//num_des_each_edge)//win_size1
      index_bow = y_cor*2 + x_cor 
      bow1s[index_bow][closest_center_index] += 1 
    for j in range(4):
      his_img.extend(bow1s[j])

    # level 2 
    win_size2 = int(num_des_each_edge//4) # win_size2 =  10
    bow2s = []
    for j in range(16):
      bow = np.zeros(shape=k, dtype='float32')
      bow2s.append(bow)
    for j in range(num_des_each_image):
      closest_center_index = min_indexs[j]
      x_cor = (j%num_des_each_edge)//win_size2
      y_cor = (j//num_des_each_edge)//win_size2
      index_bow = y_cor*4 + x_cor 
      bow2s[index_bow][closest_center_index] +=1
    for j in range(16):
      his_img.extend(bow2s[j])
    all_.append(his_img)
  return np.array(all_)


In [None]:
def SP_Trainer(des_path, codebook_path, result_dir, svm_params):
    
    """
    Train the SVM classifier using SpatialPyramid representations.

    :param des_path(str): path for loading training dataset DenseSIFT descriptors.
    :param codebook(str): path for loading codebook for DenseSIFT descriptors.
    :param result_dir(str): diretory to save features.
        
    :param svm_params(dict): parameters for classifier training.
        ['C'](float): Regularization parameter.
        ['kernel'](str): Specifies the kernel type to be used in the algorithm.
   
    :return(sklearn.svm.SVC): trained classifier
    """
    train_idxs = load_train_idxs(data_dir)
    train_des = np.load(des_path)
    codebook = np.load(codebook_path)
    train_features = SpatialPyramid(train_des, codebook)
    np.save(os.path.join(result_dir, 'train_sp_features.npy'), train_features)

    del train_des, codebook
    
    print('Train the classifiers...')
    accuracy = 0
    models = {}
    
    for class_name in category:
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_train.txt'.format(class_name)))])
        target_labels = get_labels(train_idxs, target_idxs)
        
        models[class_name] = train_classifier(train_features, target_labels, svm_params)
        train_accuracy = models[class_name].score(train_features, target_labels) 
        print('{} Classifier train accuracy:  {:.4f}'.format(class_name ,train_accuracy))
        accuracy += train_accuracy
    
    print('Average train accuracy: {:.4f}'.format(accuracy/len(category)))
    del train_features, target_labels, target_idxs

    return models

In [None]:
def SP_Test(des_path, codebook_path, result_dir, models):
    """
    Test the SVM classifier.

    :param des_path(str): path for loading validation dataset DenseSIFT descriptors.
    :param codebook(str): path for loading codebook for DenseSIFT descriptors.
    :param result_dir(str): diretory to save features.      
    :param models(dict): dict of classifiers(sklearn.svm.SVC)

    """ 
    val_idxs = load_val_idxs(data_dir)
    val_des = np.load(des_path)
    codebook = np.load(codebook_path)
    val_features = SpatialPyramid(val_des, codebook)
    np.save(os.path.join(result_dir, 'val_sp_features.npy'), val_features)


    del val_des, codebook

    print('Test the classifiers...')
    accuracy = 0
    for class_name in category:
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_val.txt'.format(class_name)))])
        target_labels = get_labels(val_idxs, target_idxs)
        
        val_accuracy = models[class_name].score(val_features, target_labels)
        print('{} Classifier validation accuracy:  {:.4f}'.format(class_name ,val_accuracy))
        accuracy += val_accuracy

    del val_features, target_idxs, target_labels

    print('Average validation accuracy: {:.4f}'.format(accuracy/len(category)))

In [None]:
#YOUR CODE HERE for training & testing with Spatial Pyramid
dense_descriptors_result_dir = os.path.join(data_dir,'dsift_1024')
train_dense_descriptors_path = os.path.join(dense_descriptors_result_dir, 'train_des.npy')
codebook_dense_descriptors_path = os.path.join(dense_descriptors_result_dir,'codebook.npy')

sp_result_dir = os.path.join(data_dir, 'sp')
if not os.path.isdir(sp_result_dir):
    os.mkdir(sp_result_dir)

sp_svm_params = {'C': 1, 'kernel': 'linear'}
sp_models = SP_Trainer(train_dense_descriptors_path, codebook_dense_descriptors_path , sp_result_dir, sp_svm_params)
val_dense_descriptors_path = os.path.join(dense_descriptors_result_dir,'val_des.npy')
SP_Test(val_dense_descriptors_path, codebook_dense_descriptors_path, sp_result_dir, sp_models)

Train the classifiers...
aeroplane Classifier train accuracy:  1.0000
car Classifier train accuracy:  1.0000
horse Classifier train accuracy:  1.0000
motorbike Classifier train accuracy:  1.0000
person Classifier train accuracy:  1.0000
Average train accuracy: 1.0000
Test the classifiers...
aeroplane Classifier validation accuracy:  0.9620
car Classifier validation accuracy:  0.8581
horse Classifier validation accuracy:  0.9507
motorbike Classifier validation accuracy:  0.9466
person Classifier validation accuracy:  0.6649
Average validation accuracy: 0.8765


## **Problem 7**: Improve classification using non-linear SVM (10pt)
Modify the classifier using the non-linear SVM and evaluate the performance. 


In [None]:
# YOUR CODE HERE to improve classification using non-linear SVM
# YOUR CODE should include training & testing with non-linear SVM.

# use rbf kernel to achieve non-linear SVM 
rbf_feat_params= {'extractor': SIFT_extraction, 'num_codewords':1024, 'result_dir':os.path.join(data_dir,'non_linear')}
rbf_svm_params = {'C': 1, 'kernel': 'rbf'}
rbf_models = Trainer(rbf_feat_params, rbf_svm_params)
Test(rbf_feat_params, rbf_models)

Load the training data...
79.0991 seconds
Extract the local descriptors...


  app.launch_new_instance()


341.9228 seconds
Construct the bag of visual words...
3499.6530 seconds
Extract the image features...
45.8527 seconds
Train the classifiers...
aeroplane Classifier train accuracy:  0.9664
car Classifier train accuracy:  0.8641
horse Classifier train accuracy:  0.9539
motorbike Classifier train accuracy:  0.9531
person Classifier train accuracy:  0.9066
Average train accuracy: 0.9288
Load the validation data...
59.8539 seconds
Extract the local descriptors...


  app.launch_new_instance()


305.2727 seconds
Extract the image features...
47.2216 seconds
Test the classifiers...
aeroplane Classifier validation accuracy:  0.9491
car Classifier validation accuracy:  0.8638
horse Classifier validation accuracy:  0.9402
motorbike Classifier validation accuracy:  0.9495
person Classifier validation accuracy:  0.6576
Average validation accuracy: 0.8720


# <font color="blue"> Discussion and Analysis </font>
## Discussion Guidelines
- You should write discussion about **Problem 5 ~ Problem 7**.
- Simply reporting the results (e.g. classification accuracy) is not considered as a discussion.
- For each problem's discussion, you should explain and compare how each method improves the results.


Please write discussions on the results above.

In [None]:
Discussion for problem 5:
Compare training accuracy and testing accuracy of the model that uses SIFT_extraction(from problem 4) vs the model that uses DenseSift_extraction(from problem 5):
Training accuracy:
                      Model uses SIFT_extraction            Model uses Densesift_extraction
Aeroplane                   1.0                        |                 1.0
Car                         1.0                        |                 1.0
Horse                       1.0                        |                 1.0
Motorbike                   1.0                        |                 1.0
Person                      0.9454                     |                 0.9765
Average accuracy            0.9891                     |                 0.9953


Testing accuracy: 
                      Model uses SIFT_extraction            Model uses Densesift_extraction
Aeroplane                   0.9406                     |                 0.9543
Car                         0.7425                     |                 0.7963
Horse                       0.9002                     |                 0.9163
Motorbike                   0.9163                     |                 0.9131
Person                      0.5829                     |                 0.5732
Average accuracy            0.8165                     |                 0.8306

Discuss: From the above statistics, we can observe that applying densesift descriptor instead of sift descriptor brings a good improvement in both training accuracy 
and testing accuracy. For training accuracy, we can see an increment of about 3% in class 'person' and an increment of about 1% in the average accuracy of 5 classes.
For testing accuracy, the improvement can be seen in all 5 classes, and class 'Car' observes the largest improvement, which is about 5%. Overall, the average testing accuracy
is improved by about 1.5%. This result makes sense because in general, densesift descriptor performs better than sift descriptor for some reasons. Firstly, densesift
descriptor provides a dense set of feature that encode spatial information of the image (which is lack in sift descriptor) and furthermore, it also helps to overcome
the problem of blob detection sensitivity in sift descriptor. Therefore, i believe that my implementation for problem 4 and 5 works quite well and provides a 
reasonable result.  


Discussion for problem 6:
Compare training accuracy and testing accuracy of the normal implementation (from problem 4) vs the implementation that uses spatial pyramid(from problem 6):
Training accuracy: 
                       Normal implementation                   Implementation uses spatial pyramid
Aeroplane                   1.0                        |                1.0
Car                         1.0                        |                1.0
Horse                       1.0                        |                1.0
Motorbike                   1.0                        |                1.0
Person                      0.9454                     |                1.0
Average accuracy            0.9891                     |                1.0

Testing accuracy: 
                       Normal implementation                   Implementation uses spatial pyramid
Aeroplane                   0.9406                     |                0.9620
Car                         0.7425                     |                0.8581
Horse                       0.9002                     |                0.9507
Motorbike                   0.9163                     |                0.9466
Person                      0.5829                     |                0.6649
Average accuracy            0.8165                     |                0.8765

Discuss: From the above statistics, we can observe that applying spatial pyramid brings a very impressive improvement for both training and testing accuracy. 
For training accuracy, applying spatial pyramid helps boost accuracies of all 5 classes to maximum (1.0 is maximum accuracy), which is a very desirable result. 
The improvement can be seen in class 'person', which is about 5.5% and overall, the average training accuracy was increased by about 1%. For testing accuracy, 
the improvement can be seen in all 5 classes, and it's worth noticing that the improvement is by at least 2% for all classes. The biggest improvement is observed 
in class 'car', which is an increment of 11%. Overall, the testing accuracy is improved by about 6%. I believe that these improvements are very reasonable because
in fact spatial pyraid matching brings a lot of benefits to the model: Firstly, it provides a histogram that encodes a notion of spatial information, which is 
useful for the classification task. Secondly, this method combines multiple resolutions in a principled fashion, therefore it's robust to failures at individual
levels. Furthermore, in our implementation for spatial pyramid matching (problem 6), we extract features from images by using DenseSift descriptor, which helps 
to overcome the problem of blob detection sensitivity in Sift descriptor. Because of these benefits, the spatial pyramid matching technique brings a big improvement 
to the performance of our model. 


Discussion for problem 7: 
Compare training accuracy and testing accuracy of the linear SVM (from problem 4) vs the non-linear SVM (problem 7):
Training accuracy: 
                        Linear SVM                                  Non-linear SVM 
Aeroplane                  1.0                         |                0.9664
Car                        1.0                         |                0.8641
Horse                      1.0                         |                0.9539
Motorbike                  1.0                         |                0.9531
Person                     0.9454                      |                0.9066
Average accuracy           0.9891                      |                0.9288

Testing accuracy: 
                        Linear SVM                                  Non-linear SVM
Aeroplane                  0.9406                      |                0.9491
Car                        0.7425                      |                0.8638
Horse                      0.9002                      |                0.9402
Motorbike                  0.9163                      |                0.9495
Person                     0.5829                      |                0.6576
Average accuracy           0.8165                      |                0.8720

Discuss: From the above statistics, we can observe that the non-linear SVM has a worse training accuracy than the linear SVM for all classes. In my opinion, this happens
because in my implementation for non-linear SVM, i use 'rbf' kernel for the SVM but unfortunately this kernel is strongly affected by the magnitude of the vector 
(indeed, the formula of 'rbf' kernel is exp(-y||xi-xj||^2), which is strongly affected by magnitude of xi,xj). And furthermore, in my implementation, i don't normalize
the histogram before fetching into SVM, therefore the magnitude of the input feature for SVM would be very large and it strongly affects the performance of 'rbf' kernel.
I think that to overcome this issue, we can normalize the historam before fetching into the non-linear SVM or using another hyperparameter setting for the non-linear SVM 
because the current hyperparameter setting may be not optimal for our non-linear SVM. For testing accuracy, we can see that applying non-linear kernel brings a very good 
improvement to the accuracy in all classes. The biggest improvement could be seen in class 'car' (which is 12%) and class 'person' (which is 7.5%). Overall, the non-linear 
kernel boosts the average testing accuracy by 6%, which is quite impressive. In my opinion, this testing accuracy improvement happens due to the fact that our image is 
complex and not linearly separable, therefore a simple linear SVM is not sufficient for SVM to separate images of different categories. But with a non-linear kernel, then 
SVM can map data from original input space into some higher dimensional space where separation of data is much easier. It is the reason why in my implementation, the 
non-linear SVM gains a higher testing accuracy than linear SVM. 

