<H1>Extract features</H1>

<h3>Image features</h3>
<p>Image feature is a simple image pattern, based on which we can describe what we see on the image. The main role of features in computer vision(and not only) is to transform visual information into the vector space. This give us possibility to perform mathematical operations on them, for example finding similar vector(which lead us to similar image or object on the image)</p>

<div class="imgHolder">
    <img src = "Kazekeypoints.png"></img>
    <span><h2 style="text-align:center;color:red;"> Kaze keypoints </h2></span>
    <img src = "HOG.png"></img>
    <span><h2 style="text-align:center;color:red;"> Original and image with HOG features </h2></span>
</div>

<h3>How do we get features from images</h3>
<p>There are two ways of getting features from image
    <ul>
        <li>first is an image descriptors(white box algorithms)</li>
        <li>second is a neural nets(black box algorithms)</li>
        </ul>
We will work with the first one using the OpenCV library.</p>

<h3>Import required libraries</h3>

In [1]:
import numpy as np # linear algebra
import os # reading data
import cv2 # reading images
import pickle as cpickle # store data for fast processing

<h3>Setup the proper locations for the datasets folders</h3>
<p>Dataset can be found <a href="https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia">here</a>.</p>

In [2]:
trainDataDir = "C:\\Users\\thodo\\Downloads\\archive\\chest_xray\\train"
testDataDir = "C:\\Users\\thodo\\Downloads\\archive\\chest_xray\\test"
validateDataDir = "C:\\Users\\thodo\\Downloads\\archive\\chest_xray\\val"

<h3>Initialize dictionaries in which we will store the data of each category</h3>

In [3]:
training_data = {}
testing_data = {}
validate_data = {}

In [4]:
categories = ["NORMAL", "PNEUMONIA"]

<h3>Kaze descriptor</h3>
<p>KAZE detector is based on scale normalized determinant of Hessian Matrix which is computed at multiple scale levels. The maxima of detector response are picked up as feature-points using a moving window. Feature description introduces the property of rotation invariance by finding dominant orientation in a circular  neighborhood  around  each  detected  feature.  KAZE features are invariant to rotation, scale, limited affine and have more distinctiveness at varying scales with the cost  of moderate increase in computational time.</p>

<h3>Extract data using KAZE descriptor function</h3>

In [5]:
def Get_Kaze_features(image):
    try:
        alg = cv2.KAZE_create()
        # Dinding image keypoints
        kps = alg.detect(image)
        # Getting first 32 of them.
        # Number of keypoints is varies depend on image size and color pallet
        # Sorting them based on keypoint response value(bigger is better)
        vector_size = 32
        kps = sorted(kps, key=lambda x: -x.response)[:vector_size]
        # computing descriptors vector
        kps, dsc = alg.compute(image, kps)
        # Flatten all of them in one big vector - our feature vector
        dsc = dsc.flatten()
        # Making descriptor of same size
        # Descriptor vector size is 64
        needed_size = (vector_size * 64)
        if dsc.size < needed_size:
            # if we have less the 32 descriptors then just adding zeros at the
            # end of our feature vector
            dsc = np.concatenate([dsc, np.zeros(needed_size - dsc.size)])
        return dsc
    except cv2.error as e:
        print('Error: ' + e)
        return None

<h3>HOG descriptor</h3>

<p>The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy. </p>

<h3>Extract data using HOG descriptor function</h3>

In [6]:
def Get_Hog_Features(image):
    try:
        cell_size = (10, 10)  # h x w in pixels
        block_size = (2, 2)  # h x w in cells
        nbins = 9  # number of orientation bins

        # winSize is the size of the image cropped to an multiple of the cell size
        hog = cv2.HOGDescriptor(_winSize=(image.shape[1] // cell_size[1] * cell_size[1],
                                          image.shape[0] // cell_size[0] * cell_size[0]),
                                _blockSize=(block_size[1] * cell_size[1],
                                            block_size[0] * cell_size[0]),
                                _blockStride=(cell_size[1], cell_size[0]),
                                _cellSize=(cell_size[1], cell_size[0]),
                                _nbins=nbins)

        n_cells = (image.shape[0] // cell_size[0], image.shape[1] // cell_size[1])
        dsc = hog.compute(image) \
            .reshape(n_cells[1] - block_size[1] + 1,
                     n_cells[0] - block_size[0] + 1,
                     block_size[0], block_size[1], nbins) \
            .transpose((1, 0, 2, 3, 4))
        return dsc.flatten()
    except cv2.error as e:
        print('Error: ' + e)
        return None

<h3>Generic function for feature extraction</h3>
<p>Function needs as parameters:
<ol>
  <li>Image path</li>
  <li>Descriptor (Empty, KAZE, HOG)</li>
</ol>
Returns an array.</p>

In [7]:
def extract_features(image_path, extractFeaturesUsing = ''):
    # make sure that image is grayscale
    image_array = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    # resize image to 100'100
    image = cv2.resize(image_array, (100, 100))
    if extractFeaturesUsing == '':
        return image.flatten()
    elif extractFeaturesUsing == 'KAZE':
        return Get_Kaze_features(image)
    elif extractFeaturesUsing == 'HOG':
        return Get_Hog_Features(image)
    return None

<h3>Load data function</h3>
<p>Function needs as parameters:
<ol>
  <li>Dictionary that we are trying to construct (one of the three we initialized at the start).</li>
  <li>Directory location.</li>
  <li>Name of pickle in which we will store the data for fast access.</li>
  <li>Descriptor with which we will extract the features (None, KAZE, HOG).</li>
</ol>
After the load function finishes, stores the dictionary into a pickle file and so we are now able to load the data again really fast.</p>

In [8]:
def LoadData(dictionaryToStoreData, dataDir, pickleName, extractFeaturesUsing = ''):
    # load training data
    for category in categories:
        path = os.path.join(dataDir, category)
        class_num = categories.index(category)
        for img in os.listdir(path):
            name = img.split('/')[-1].lower()
            try:
                imageLocation = os.path.join(path, img)
                features = extract_features(imageLocation, extractFeaturesUsing)
                dictionaryToStoreData[imageLocation] = [features, class_num]
            except:
                print("An exception occurred while extracting features from image " + name)
    # saving all our feature vectors in pickled file
    with open(pickleName + '.pickle', 'wb') as fp:
        cpickle.dump(dictionaryToStoreData, fp)

<h3> Example of LoadData use </h3>
<p> Load data using HOG descriptor</p>

In [9]:
LoadData(training_data, trainDataDir, 'trainingDataUsingHog', 'HOG')
LoadData(testing_data, testDataDir, 'testingDataUsingHog', 'HOG')
LoadData(validate_data, validateDataDir, 'validateDataUsingHog', 'HOG')