# Step 1: Compute the texture descriptions for the training images.

For each training image, calculate a vector of GLCM features.  Which GLCM features and the set of displacements you choose to you use are up to you (note that displacements for `skimage.feature.graycomatrix()` need to be specified by distances and angles in radians rather than change in x and y directions).  Experiment to obtain the best possible classification rate.  Use conservative choices to begin with until everything is working, then come back and experiemnt.  As described in the Topic 10 lecture notes, use `skimage.feature.graycomatrix()` and `skimage.feature.graycoprops()` to calculate GLCM features.  You'll probably want to use `normed=True` with `graycomatrix`.  Your GLCM features should be stored as a 120-row array by m-element array, (m will depend on how many different features and displacements you used and whether or not you combine values for different displacements or not, e.g., by taking their mean).  

_Hint: Pay close attention to the format of the return values of  `graycomatrix()` and `graycoprops()`._

Also, for each training image, calculate the rotationally invariant LBP features using `skiamge.feature.local_binary_pattern()`.  You can experiment with parameters `P` and `R` to get a good classification rate, but probably `P=8` and `R=1` are good enough.   For the `method` parameter, use `'uniform'` which gives you the LBP flavour we talked about in class.   Remember that `skiamge.feature.local_binary_pattern()` returns an "LBP Image", which is an image in which the pixel value is between 0 and 9, and corresponds to one of the ten possible pattern labels.  It's up to you to turn the "LBP Image" into a 10-bin histogram, which serves as the feature vector for that image (you can use `numpy.histogram` for this but again remember to specify `bins` and `range` parameters, and that it returns two things, and you only need the first one). 

Addionally, calculate the LBP variance feature again using `skimage.feature.local_binary_pattern()` but use `method='var'` instead.  This is the VAR feature we saw in class.  Use the same P and R as before.  Build a 16-bin histogram of the resulting 'LBP-VAR' image; use `range=(0,7000)` with `numpy.hisotgram()` (this is not quite "correct", but it's good enough).  Concatenate these with the rotationally invariant LBP features so that you have a 26-element feature vector for each training image.   These should be stored as a 120-row, 26-column array.

You can do this all in one loop which builds both feature arrays.



In [195]:
# Write your code here.
import numpy as np
import skimage.segmentation as seg
import skimage.morphology as morph
import os as os
import skimage.io as io
import skimage.feature as feat
import matplotlib.pyplot as plt
from scipy import ndimage
% matplotlib inline

train_path = os.path.join('.', 'brodatztraining')

train_ftrs = np.empty(shape=(0, 64))
train_lbp = np.empty(shape=(0,26))

for root, dirs, files in os.walk(train_path):
    for filename in files:
        # ignore files that are not PNG files.
        if filename[-4:] != '.png':
            continue
        
        # concatenate variable root with filename to get the path to an input file.
        fname = os.path.join(root, filename)
        I = io.imread(fname)
        P = feat.greycomatrix(I, [1, 2, 3, 4], [0, np.pi/4, np.pi/2, 3*np.pi/4], normed=True)
        energy = feat.greycoprops(P, 'energy')
        energy = np.reshape(energy, (1, np.product(energy.shape)))
        contrast = feat.greycoprops(P, 'contrast')
        contrast = np.reshape(contrast, (1, np.product(contrast.shape)))
        correlation = feat.greycoprops(P, 'correlation')
        correlation = np.reshape(correlation, (1, np.product(correlation.shape)))
        homogeneity = feat.greycoprops(P, 'homogeneity')
        homogeneity = np.reshape(homogeneity, (1, np.product(homogeneity.shape)))
        conca = np.concatenate((energy, contrast, correlation, homogeneity), axis=1)
        train_ftrs = np.concatenate((train_ftrs, conca), axis=0)
        
        uniform = feat.local_binary_pattern(I, P=8, R=1, method='uniform')
        hist_uni, bin_edges_uni = np.histogram(uniform, bins=10, range=(0,9))
        hist_uni = np.reshape(hist_uni, (1, -1))
        var = feat.local_binary_pattern(I, P=8, R=1, method='var')
        hist_var, bin_edges_var = np.histogram(var, bins=16, range=(0,7000))
        hist_var = np.reshape(hist_var, (1, -1))
        train_lbp = np.concatenate((train_lbp, np.concatenate((hist_uni, hist_var), axis=1)), axis=0)

  keep = (tmp_a >= first_edge)
  keep &= (tmp_a <= last_edge)


# Step 2: Compute Test Image Features

Compute the exact same features as you did in step 1 for each of the test images.  Store them in the same way (these arrays will just have more rows, specifically 320 rows, one for each testing sample). 

In [196]:
# Write your code here.  
test_path = os.path.join('.', 'brodatztesting')

test_ftrs = np.empty(shape=(0, 64))
test_lbp = np.empty(shape=(0,26))
for root, dirs, files in os.walk(test_path):
    for filename in files:
        # ignore files that are not PNG files.
        if filename[-4:] != '.png':
            continue
        
        # concatenate variable root with filename to get the path to an input file.
        fname = os.path.join(root, filename)
        I = io.imread(fname)
        P = feat.greycomatrix(I, [1, 2, 3, 4], [0, np.pi/4, np.pi/2, 3*np.pi/4], normed=False)
        P_flt = P.astype(float)
        for i in range(P_flt.shape[2]):
            for j in range(P_flt.shape[3]):
                P_flt[0,0,i,j] = 0
                P_flt[0,0,i,j] = np.max(P_flt[:,:,i,j]) % 10.0
                P_flt[:,:,i,j] = P_flt[:,:,i,j] / np.sum(P_flt[:,:,i,j])
        energy = feat.greycoprops(P_flt, 'energy')
        energy = np.reshape(energy, (1, np.product(energy.shape)))
        contrast = feat.greycoprops(P_flt, 'contrast')
        contrast = np.reshape(contrast, (1, np.product(contrast.shape)))
        correlation = feat.greycoprops(P_flt, 'correlation')
        correlation = np.reshape(correlation, (1, np.product(correlation.shape)))
        homogeneity = feat.greycoprops(P_flt, 'homogeneity')
        homogeneity = np.reshape(homogeneity, (1, np.product(homogeneity.shape)))
        conca = np.concatenate((energy, contrast, correlation, homogeneity), axis=1)
        test_ftrs = np.concatenate((test_ftrs, conca), axis=0)
        
        uniform = feat.local_binary_pattern(I, P=8, R=1, method='uniform')
        hist_uni, bin_edges_uni = np.histogram(uniform, bins=10, range=(0,9))
        hist_uni = np.reshape(hist_uni, (1, -1))
        var = feat.local_binary_pattern(I, P=8, R=1, method='var')
        hist_var, bin_edges_var = np.histogram(var, bins=16, range=(0,7000))
        hist_var = np.reshape(hist_var, (1, -1))
        test_lbp = np.concatenate((test_lbp, np.concatenate((hist_uni, hist_var), axis=1)), axis=0)
        

  keep = (tmp_a >= first_edge)
  keep &= (tmp_a <= last_edge)


# Step 3: Generate Label Arrays for the Training and Testing Data

Use labels 1 for the first class, label 2 for the second class, etc.   This should be easy to do since the filenames are ordered in blocks of 15 or 40 images of each class for training and testing respectively.

In [197]:
# Write your code for step 3 here. 

train_y = []
test_y = []
image_filename = []
count_train_img = 0
count_test_img = 0

for root, dirs, files in os.walk(train_path):
    for filename in files:
        # ignore files that are not PNG files.
        if filename[-4:] != '.png':
            continue
        
        # concatenate variable root with filename to get the path to an input file.
        fname = os.path.join(root, filename)
        count_train_img = count_train_img + 1
        if (count_train_img <= 15):
            train_y.append(1)
        elif (count_train_img <= 30):
            train_y.append(2)
        elif (count_train_img <= 45):
            train_y.append(3)
        elif (count_train_img <= 60):
            train_y.append(4)
        elif (count_train_img <= 75):
            train_y.append(5)
        elif (count_train_img <= 90):
            train_y.append(6)
        elif (count_train_img <= 105):
            train_y.append(7)
        else:
            train_y.append(8)

for root, dirs, files in os.walk(test_path):
    for filename in files:
        # ignore files that are not PNG files.
        if filename[-4:] != '.png':
            continue
        
        # concatenate variable root with filename to get the path to an input file.
        fname = os.path.join(root, filename)
        image_filename.append(filename)
        count_test_img = count_test_img + 1
        if (count_test_img <= 40):
            test_y.append(1)
        elif (count_test_img <= 80):
            test_y.append(2)
        elif (count_test_img <= 120):
            test_y.append(3)
        elif (count_test_img <= 160):
            test_y.append(4)
        elif (count_test_img <= 200):
            test_y.append(5)
        elif (count_test_img <= 240):
            test_y.append(6)
        elif (count_test_img <= 280):
            test_y.append(7)
        else:
            test_y.append(8)

# Step 4:  Train an KNN classifier.  

Train an KNN  classifier using your GLCM features.  Train another one using your LBP features.



In [198]:
import sklearn.neighbors as knn

# Write your code here. This should be quite short.
ngh_ftrs = knn.KNeighborsClassifier(n_neighbors=1)
ngh_ftrs.fit(train_ftrs, train_y)

ngh_lbp = knn.KNeighborsClassifier(n_neighbors=1)
ngh_lbp.fit(train_lbp, train_y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=1, p=2,
           weights='uniform')

# Step 4:  Predict the classes of the test images

Predict the classes of the test images using both classifiers.

In [199]:
# Write your code here.  Again this should be quite short.
test_y_ftrs_pred = ngh_ftrs.predict(test_ftrs)
test_y_lbp_pred = ngh_lbp.predict(test_lbp)
test_y = np.array(test_y)

# Step 6:  Display Results

Display results as in the final step of Question 1.  For each classifier display the image filenames that were incorrectly classified, the confisuion matrix, and the classification rate.  





In [200]:
# Write your code here.
print("##################################################")
print("###############GLCM classifier####################")
print("##################################################")
err_ftrs_pred = test_y - test_y_ftrs_pred
err_ftrs_pred_abs = abs(err_ftrs_pred)
ftrs_cls_rate = sum(np.logical_not(err_ftrs_pred_abs)) * 1.0 / test_y.shape[0]
print("The classification rate was: %.12f" % ftrs_cls_rate)
print("--------------------------------------------------")
ftrs_conf_mx = np.zeros((8,8),dtype=int)
ftrs_err_idx = np.where(err_ftrs_pred)
ftrs_err_idx = ftrs_err_idx[0]
for i in ftrs_err_idx:
    if (i < 40):
        ftrs_conf_mx[0, (0 - err_ftrs_pred[i])] = ftrs_conf_mx[0, (0 - err_ftrs_pred[i])] + 1
    elif (i < 80):
        ftrs_conf_mx[1, (0 - err_ftrs_pred[i])] = ftrs_conf_mx[1, (0 - err_ftrs_pred[i])] + 1
    elif (i < 120):
        ftrs_conf_mx[2, (0 - err_ftrs_pred[i])] = ftrs_conf_mx[2, (0 - err_ftrs_pred[i])] + 1
    elif (i < 160):
        ftrs_conf_mx[3, (0 - err_ftrs_pred[i])] = ftrs_conf_mx[3, (0 - err_ftrs_pred[i])] + 1
    elif (i < 200):
        ftrs_conf_mx[4, (0 - err_ftrs_pred[i])] = ftrs_conf_mx[4, (0 - err_ftrs_pred[i])] + 1
    elif (i < 240):
        ftrs_conf_mx[5, (0 - err_ftrs_pred[i])] = ftrs_conf_mx[5, (0 - err_ftrs_pred[i])] + 1
    elif (i < 280):
        ftrs_conf_mx[6, (0 - err_ftrs_pred[i])] = ftrs_conf_mx[6, (0 - err_ftrs_pred[i])] + 1
    else:
        ftrs_conf_mx[7, (0 - err_ftrs_pred[i])] = ftrs_conf_mx[7, (0 - err_ftrs_pred[i])] + 1
ftrs_conf_mx[0,0] = 40 - sum(ftrs_conf_mx[0,:])
ftrs_conf_mx[1,1] = 40 - sum(ftrs_conf_mx[1,:])
ftrs_conf_mx[2,2] = 40 - sum(ftrs_conf_mx[2,:])
ftrs_conf_mx[3,3] = 40 - sum(ftrs_conf_mx[3,:])
ftrs_conf_mx[4,4] = 40 - sum(ftrs_conf_mx[4,:])
ftrs_conf_mx[5,5] = 40 - sum(ftrs_conf_mx[5,:])
ftrs_conf_mx[6,6] = 40 - sum(ftrs_conf_mx[6,:])
ftrs_conf_mx[7,7] = 40 - sum(ftrs_conf_mx[7,:])
print("The confusion matrix was: ")
print(ftrs_conf_mx)
print("--------------------------------------------------")
print("Incorrectly classified images was: ")
for i in ftrs_err_idx:
    print(image_filename[i])
print("##################################################")
print("################LBP classifier####################")
print("##################################################")
err_lbp_pred = test_y - test_y_lbp_pred
err_lbp_pred_abs = abs(err_lbp_pred)
lbp_cls_rate = sum(np.logical_not(err_lbp_pred_abs)) * 1.0 / test_y.shape[0]
print("The classification rate was: %.12f" % lbp_cls_rate)
print("--------------------------------------------------")
lbp_conf_mx = np.zeros((8,8),dtype=int)
lbp_err_idx = np.where(err_lbp_pred)
lbp_err_idx = lbp_err_idx[0]
for i in lbp_err_idx:
    if (i < 40):
        lbp_conf_mx[0, (0 - err_lbp_pred[i])] = lbp_conf_mx[0, (0 - err_lbp_pred[i])] + 1
    elif (i < 80):
        lbp_conf_mx[1, (0 - err_lbp_pred[i])] = lbp_conf_mx[1, (0 - err_lbp_pred[i])] + 1
    elif (i < 120):
        lbp_conf_mx[2, (0 - err_lbp_pred[i])] = lbp_conf_mx[2, (0 - err_lbp_pred[i])] + 1
    elif (i < 160):
        lbp_conf_mx[3, (0 - err_lbp_pred[i])] = lbp_conf_mx[3, (0 - err_lbp_pred[i])] + 1
    elif (i < 200):
        lbp_conf_mx[4, (0 - err_lbp_pred[i])] = lbp_conf_mx[4, (0 - err_lbp_pred[i])] + 1
    elif (i < 240):
        lbp_conf_mx[5, (0 - err_lbp_pred[i])] = lbp_conf_mx[5, (0 - err_lbp_pred[i])] + 1
    elif (i < 280):
        lbp_conf_mx[6, (0 - err_lbp_pred[i])] = lbp_conf_mx[6, (0 - err_lbp_pred[i])] + 1
    else:
        lbp_conf_mx[7, (0 - err_lbp_pred[i])] = lbp_conf_mx[7, (0 - err_lbp_pred[i])] + 1
lbp_conf_mx[0,0] = 40 - sum(lbp_conf_mx[0,:])
lbp_conf_mx[1,1] = 40 - sum(lbp_conf_mx[1,:])
lbp_conf_mx[2,2] = 40 - sum(lbp_conf_mx[2,:])
lbp_conf_mx[3,3] = 40 - sum(lbp_conf_mx[3,:])
lbp_conf_mx[4,4] = 40 - sum(lbp_conf_mx[4,:])
lbp_conf_mx[5,5] = 40 - sum(lbp_conf_mx[5,:])
lbp_conf_mx[6,6] = 40 - sum(lbp_conf_mx[6,:])
lbp_conf_mx[7,7] = 40 - sum(lbp_conf_mx[7,:])
print("The confusion matrix was: ")
print(lbp_conf_mx)
print("--------------------------------------------------")
print("Incorrectly classified images was: ")
for i in lbp_err_idx:
    print(image_filename[i])

##################################################
###############GLCM classifier####################
##################################################
The classification rate was: 0.812500000000
--------------------------------------------------
The confusion matrix was: 
[[27  3 10  0  0  0  0  0]
 [ 0 33  0  0  0  0  0  7]
 [ 0  0 17  0  0  0 17  6]
 [ 0  0  0 31  0  0  1  0]
 [ 0  0  0  0 40  0  0  0]
 [ 0  0  0  0  0 40  0  0]
 [ 0  0  0  0  0  0 32  8]
 [ 0  0  0  0  0  0  0 40]]
--------------------------------------------------
Incorrectly classified images was: 
patch-116291.png
patch-120541.png
patch-124729.png
patch-131612.png
patch-131973.png
patch-136304.png
patch-140252.png
patch-140870.png
patch-144596.png
patch-154835.png
patch-158692.png
patch-159392.png
patch-163446.png
patch-204094.png
patch-210674.png
patch-217835.png
patch-231503.png
patch-231574.png
patch-232913.png
patch-248638.png
patch-305067.png
patch-305313.png
patch-305978.png
patch-306264.png
patch-306753.

# Step 7: Reflections

Answer the following questions right here in this block:

- Discuss the performance difference of the two different texture features.  Hypothesize reasons for observed differenes.
	
	_Your answer:_ At first, I set distance = \[1, 2\] in GLCM classifiers. And the classification rate of this GLCM classifier is 69\%. Then I enlarge the distance list into \[1, 2, 3, 4\], then the classification rate is over 80\%. However, the classification rate increase slowly when I try to enlarge distance list and angle list after that. And the LBP classifier just works excellent at first time. It have a classification rate over 95\%. GLCM classifiers only use the general information of images, while LBP classifers record all rotationally invariant patterns. So LBP classifers have a much better performance than GLCM classifiers. As I inrease the distance list, GLCM classifiers can utilize more information about images, so their classification rates increase. 
    

- For each of your two classifiers, discuss the misclassified images.  Were there any classes that were particularly difficult to distinguish?  Do the misclassified images (over all classes) have anything in common that would cause them to be misclassified?  If so what do they ahve in common, and why do you think it is confusing the classifier?

	_Your answer:_ In GLCM classifiers, class 1 and class 3 are particularly difficult to distinguish. And I think that the misclassified images of these two classes all have different shape and size bright block in their images. And if a kind of particular shape and size bright block is dominant of a image, the GLCM classifier may classify this image into wrong class. In LBP classifier there are not any classes that are particularly difficult to distinguish. And I don't think that misclassified images have anything in common that would cause them be misclassified.