This dataset is a part of the Office Dataset provided by Saenko et al.
https://people.eecs.berkeley.edu/~jhoffman/domainadapt/
[1] K. Saenko, B. Kulis, M. Fritz and T. Darrell, "Adapting Visual Category Models to New Domains" In Proc. ECCV, September 2010, Heraklion, Greece. 

In [None]:
import os
import numpy as np
import scipy.io as sio
import sklearn.svm as sksvm

There are 31 categories in this dataset.

In [None]:
root_dir='./webcam/interest_points'
classes=os.listdir(root_dir)
print(classes)

Load features (bag-of-words histogram) of each image and make training/validation splits.

In [None]:
ntrain=10
train_feat,train_labels,val_feat,val_labels=[],[],[],[]
for c,cls in enumerate(classes):
    files=os.listdir(root_dir+'/'+cls)
    for i,f in enumerate(files):
        x=sio.loadmat(root_dir+'/'+cls+'/'+f)['histogram']
        if i<ntrain:
            train_feat.append(x)
            train_labels.append(c)
        else:
            val_feat.append(x)
            val_labels.append(c)

In [None]:
train_feat=np.vstack(train_feat)
val_feat=np.vstack(val_feat)
train_feat=train_feat/np.sum(train_feat,axis=1).reshape(-1,1)  #normalize by the number of words
val_feat=val_feat/np.sum(val_feat,axis=1).reshape(-1,1)

Note that the sum of feature elements should be always one. (histogram)

In [None]:
print(np.sum(val_feat,axis=1))

Now, let's train a linear SVM classifier on raw features.

In [None]:
clf = sksvm.LinearSVC(C=1000, random_state=0)
clf.fit(train_feat, train_labels)

In [None]:
pred=clf.predict(val_feat)
acc=(np.sum(pred==val_labels)/len(val_labels))*100
print('Accuracy: %f' % acc)

Confirm that the kernel SVM with linear kernel produces a similar result.

In [None]:
clf = sksvm.SVC(C=1000, kernel='linear',random_state=0)
#clf = sksvm.SVC(C=1000, kernel='rbf',random_state=0)   #What about RBF kernel?
clf.fit(train_feat, train_labels)
pred=clf.predict(val_feat)
acc=(np.sum(pred==val_labels)/len(val_labels))*100
print('Accuracy: %f' % acc)

Now, let's make a more appropriate kernel for this problem.

Exercise 1: Implement Bhattacharyya kernel

In [None]:
def bc_gram(X1,X2):  #returns the Gram matrix 
    gramMat = np.zeros((X1.shape[0],X2.shape[0]))
    for i in range(gramMat.shape[0]):
        for j in range(gramMat.shape[1]):
            ###TODO:
            #gramMat[i,j] = 
    return gramMat

In [None]:
clf = sksvm.SVC(C=1000, kernel='precomputed',random_state=0)
clf.fit(bc_gram(train_feat,train_feat), train_labels)
pred=clf.predict(bc_gram(val_feat,train_feat))
acc=(np.sum(pred==val_labels)/len(val_labels))*100
print('Accuracy: %f' % acc)

Exercise 2: Implement the explicit feature map of Bhattacharyya kernel

In [None]:
###TODO:
#train_sq=
#val_sq=
clf = sksvm.LinearSVC(C=1000, random_state=0)
clf.fit(train_sq, train_labels)
pred=clf.predict(val_sq)
acc=(np.sum(pred==val_labels)/len(val_labels))*100
print('Accuracy: %f' % acc)

Exercise 3: Implement chi-square kernel

In [None]:
def chi_gram(X1,X2):
    gramMat = np.zeros((X1.shape[0],X2.shape[0]))
    for i in range(gramMat.shape[0]):
        for j in range(gramMat.shape[1]):
            ###TODO:
            #gramMat[i,j] = 
    return gramMat

In [None]:
clf = sksvm.SVC(C=1000, kernel='precomputed',random_state=0)
clf.fit(chi_gram(train_feat,train_feat), train_labels)
pred=clf.predict(chi_gram(val_feat,train_feat))
acc=(np.sum(pred==val_labels)/len(val_labels))*100
print('Accuracy: %f' % acc)