Linear SVM Classification with VGG16 Deep Net Features.

In [1]:
import pickle
from sklearn.svm import LinearSVC
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn import preprocessing
from scipy.stats.mstats import zscore
from sklearn.decomposition import PCA
import warnings
from sklearn.exceptions import ConvergenceWarning
warnings.filterwarnings(action='ignore', category=ConvergenceWarning)

Set Random Seed for Reproducible Results.

In [2]:
np.random.seed(31415926)

Import Features and Labels from Extraction code.

In [3]:
with open('Pickle_Files/Descriptors001','rb') as fp:
    Descriptors = pickle.load(fp)

with open('Pickle_Files/Labels001','rb') as fp:
    Labels = pickle.load(fp)

Z Score Features to ensure descriptors are scaled optimally during classification.

In [4]:
Descriptors = np.vstack(Descriptors)
Descriptors = zscore(Descriptors,axis = 0)

Convert Labels to from DMMP/H2O to 0/1.

In [5]:
lb = preprocessing.LabelBinarizer()
Labels = lb.fit_transform(Labels)

Specify Linear SVM with Sparsity (l1) Constraint and squared hinge loss.

In [6]:
clf = LinearSVC(penalty='l1',loss='squared_hinge',dual=False)

Obtain model performance (f1) by testing on 10-fold cross validated datasets.

In [7]:
scores = cross_val_score(clf,Descriptors,np.squeeze(Labels),cv=10, scoring = 'f1')
print("f1: %0.2f (+/- %0.1f)" % (scores.mean(), scores.std()))

f1: 0.97 (+/- 0.0)


Fit linear SVM model on enitre dataset directly to obtain optimal descriptor weightings.

In [8]:
clf.fit(Descriptors,np.squeeze(Labels)) 
weights = clf.coef_

Identify top 12 descriptor weights in classification.

In [9]:
ind = np.argsort(np.absolute(weights))
top12 = ind[0][-12:]
print('top 12 weights {0}'.format(top12))

top 12 weights [11 52 18 32  4 38  8 17 37 10 43  6]


Identify which descriptors are associated with water (positive) or DMMP (negative) by the sign of the weight.

In [10]:
signs = np.sign(weights[0][top12])
print('sign of weights{} with positive class Water and negative class DMMP'.format(signs))

sign of weights[ 1. -1. -1.  1.  1. -1.  1.  1.  1.  1. -1. -1.] with positive class Water and negative class DMMP


Identify percentage of total weight represented by the top 12 descriptors.

In [11]:
Sum = np.sum(np.absolute(weights))
dev = []
for i in top12:
    j = 100*np.absolute(weights[0][i])/Sum
    dev.append(j)  
print('Percent of Total Assigned Weight {}'.format(dev))

Percent of Total Assigned Weight [2.861189936862044, 2.9990805985644937, 3.009031309895833, 3.135788259994103, 3.1968553879215182, 3.442387218867918, 3.459960281284844, 3.551141614581032, 3.8998089793160284, 4.9258011501351024, 5.103992486143733, 5.209156346471031]


Develop a second linear SVM model that utilizes only the top 12 weights identified and test performance with 10-fold cross validation datasets.

This is done in order to ensure there is optimal weighting while the model is constrained ot only these top 12 descriptors.

In [12]:
Descriptors1 = Descriptors[:,top12]
scores = cross_val_score(clf,Descriptors1,np.squeeze(Labels),cv = 5, scoring = 'f1')
print("f1: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

f1: 0.92 (+/- 0.06)


Fit linear SVM model on enitre dataset directly to obtain optimal feature weightings.

In [13]:
clf.fit(Descriptors1,np.squeeze(Labels)) 
weights1 = clf.coef_
ind = np.argsort(np.absolute(weights1))
top12value = top12[ind] 
top121 = ind[0][-12:]
print('top 10 weights {0}'.format(top12value))

top 10 weights [[18 17 10 38 52 11 37  8 43  6  4 32]]


Identify which descriptors are associated with water (positive) or DMMP (negative) by the sign of the weight.

In [14]:
signs1 = np.sign(weights1[0][top121])
print('sign of weights{} with positive class Water'.format(signs1))

sign of weights[-1.  1.  1. -1. -1.  1.  1.  1. -1. -1.  1.  1.] with positive class Water


Identify percentage of total weight represented by the descriptors.

In [15]:
Sum = np.sum(np.absolute(weights1))
dev = []
for i in top121:
    j = 100*np.absolute(weights1[0][i])/Sum
    dev.append(j)
print('Percent of Total Assigned Weight {}'.format(dev))

Percent of Total Assigned Weight [2.7948844599467173, 4.983588129209622, 5.117212207331654, 5.5286598753995495, 6.75972994812766, 6.92130749507327, 8.152736255320038, 8.856376134740291, 9.46455947245293, 12.10009469558193, 13.21900055442952, 16.101850772386825]
