# Preprocessors  
This notebook makes use of the findings of the explorations notebooks, and using the `tools/save4later.py` module, creates and stores the preprod's appropriately for the models to load seamlessly.  

First, let's import the `tools/`

In [1]:
# General imports
import numpy as np
import pandas as pd

# Import our tools
import sys
sys.path.append('/Users/will/Github/facial-keypoint-detection/scripts')

# Import submit module from our tools subfolder
from tools import getdata, save4later

And let's import all our data

In [2]:
# Load data (no dev) WITH NONAs set to False --> ~7k
_loaded = getdata.load_data(0, test=False, nonas=False)

FEATURES = _loaded['features']
print 'Number of features:', len(FEATURES)

train_data = _loaded['training']['data']
train_labels = _loaded['training']['labels']
print 'Training dataset size: ', train_data.shape

Number of features: 30
Training dataset size:  (7049,)


In [4]:
# Load data WITH NONAs set to True --> ~2k
_loaded_nonas = getdata.load_data(0, test=False, nonas=True)

train_nonas_data = _loaded_nonas['training']['data']
train_nonas_labels = _loaded_nonas['training']['labels']
print 'Training dataset size: ', train_nonas_data.shape

Training dataset size:  (2140,)


## Contents  

* **Noise Cancelling**:
   * Mask Faces  


* **Edge Detection**:
   * Sobel filter  
   * Laplace transform  


* **Noise Reduction**:
   * Gaussian blur  
   * Region Adjacency Graph (RAG)  
 

* **Finding Contours**:  
   * Histogram of Oriented Gradients (HOG)  
   * Watershed algorithm  

## 1. Noise Cancelling: Mask Faces
Using `explorations/face-detection.ipynb` creates the masked-face preprod.  

Process:  
* All the images are run through OpenCV's face-detector (*see the exploration notebook for more detail and references on this algorithm*)  
* Images for which just one face is detected are masked (set all pixels outside the detected face to 0)  

*IMPORTANT NOTE* - the base `environment.yml` does not include installation of OpenCV2, since it is an optional branch in this project. In order to get the build including OpenCV refer to with `/adv-envs/environment-ocv.yml`.  


In [5]:
import cv2

In [6]:
CASC_PATH = '../haarcascade_frontalface_default.xml'

# Create the haar cascade classifier
faceCascade = cv2.CascadeClassifier(CASC_PATH)

def detect_face(img, min_face_size):
    ''' Detect faces in the image'''
    face = faceCascade.detectMultiScale(
        img,
        scaleFactor=1.1,
        minNeighbors=5,
        minSize=(min_face_size, min_face_size),
        flags = cv2.cv.CV_HAAR_SCALE_IMAGE)
    return face

In [9]:
def mask_faces(dataset):
    ''' Masked faces detected '''
    masked = []
    
    for i in xrange(len(dataset)):
        old_img = dataset[i].reshape(96,96)

        # detect face
        _face = detect_face(old_img, min_face_size=10)
                        
        # only crop if we detect one face
        if len(_face) == 1:
            x, y, w, h = _face[0]
            new_img = np.zeros_like(old_img)
            new_img[y:y+h, x:x+w] = old_img[y:y+h, x:x+w]
            
            masked.append(new_img.flatten())
            
        else:
            masked.append(old_img.flatten())
        
    return masked

Let's **mask** all the images now!

In [6]:
masked_train_data = mask_faces( train_data )

And now save using the `save4later` tool ...

In [7]:
save4later.save_preprod(masked_train_data, 'masked',
                        description='Masked data (removed background of all faces)')

In [8]:
save4later.list_preprods()


    PREPROC'D DATASETS
 Total: 2

  + masked.pk : Masked data (removed background of all faces)
  + test.pk : This is a test


Great, there it is! Now let's do the same for the NoNAs dataset as well (only ~2k faces) 

In [10]:
masked_train_nonas_data = mask_faces(train_nonas_data)

In [12]:
save4later.save_preprod(masked_train_nonas_data, 'masked_nonas',
                        description='Masked data (only faces with no NA features)')

In [13]:
save4later.list_preprods()


    PREPROC'D DATASETS
 Total: 3

  + masked.pk : Masked data (removed background of all faces)
  + test.pk : This is a test
  + masked_nonas.pk : Masked data (only faces with no NA features)


**Now we will try skimage transformations**  

We can either replace the original by the transformed or add the, for example, sobel filtered image to the image (not replace the original), in which case so we will have $96*96*2$ features.

In [44]:
from skimage.filters import sobel, laplace, gaussian
from skimage.feature import hog

In [31]:
def preprocess_set(transform, dataset, keep_orig=False):
    ''' Applies preprocessor, and appends result to the original image '''
    transformed = []
    
    for i in xrange(len(dataset)):
        old_img = dataset[i].reshape(96,96)
        new_img = transform(old_img)
        
        if keep_orig:
            new_entry = np.hstack( (old_img.flatten(),new_img.flatten()) )
        else:
            new_entry = new_img.flatten()
        
        transformed.append(new_entry)

    return transformed


## Noise reduction  
Apply Gaussian blurs on the training images

In [51]:
blur_img = lambda x: gaussian(x, sigma=1, mode='mirror')

In [52]:
gauss_train_nonas_data = preprocess_set(blur_img, train_nonas_data, keep_orig=True)

In [53]:
save4later.save_preprod(gauss_train_nonas_data, 'gauss_nonas',
                        description='Blurred images (only faces with no NA features)',
                        overwrite=True)

## Edge detection

For more detail refer to `explorations/image-processing.ipynb` notebook.  

### Sobel Filter  

In [28]:
sobel_train_nonas_data = preprocess_set(sobel, train_nonas_data, keep_orig=True)

In [32]:
save4later.save_preprod(sobel_train_nonas_data, 'orig_sobel_nonas',
                        description='Original + Sobel filter (only faces with no NA features)')

In [36]:
sobel_noorig_train_data = preprocess_set(sobel, train_nonas_data)
save4later.save_preprod(sobel_noorig_train_data, 'sobel_nonas',
                        description='Sobel filter (only faces with no NA features)',
                        overwrite=True)

In [37]:
save4later.list_preprods()


    PREPROC'D DATASETS
 Total: 5

  + test.pk : This is a test
  + masked.pk : Masked data (removed background of all faces)
  + orig_sobel_nonas.pk : Original + Sobel filter (only faces with no NA features)
  + sobel_nonas.pk : Sobel filter (only faces with no NA features)
  + masked_nonas.pk : Masked data (only faces with no NA features)


### Laplace Transform on Gaussian Blurred

In [40]:
lap_gauss = lambda x: laplace(gaussian(x, sigma=2, mode='mirror'))

In [41]:
lapgauss_noorig_train_data = preprocess_set(lap_gauss, train_nonas_data)

In [42]:
save4later.save_preprod(lapgauss_noorig_train_data, 'lapgauss_nonas',
                        description='Laplace+Gaussian transform (only faces with no NA features)',
                        overwrite=True)

In [43]:
save4later.list_preprods()


    PREPROC'D DATASETS
 Total: 6

  + sobel_nonas.pk : Sobel filter (only faces with no NA features)
  + masked_nonas.pk : Masked data (only faces with no NA features)
  + orig_sobel_nonas.pk : Original + Sobel filter (only faces with no NA features)
  + lapgauss_nonas.pk : Laplace+Gaussian transform (only faces with no NA features)
  + test.pk : This is a test
  + masked.pk : Masked data (removed background of all faces)


## Finding Contours + Noise reduction

### Blurred Histogram of Oriented Gradients (HOG)

In [45]:
def get_hog(img):
    ''' Perform Histogram of Oriented Gradients '''
    # Histogram of Oriented Gradients (HOG)
    _, _hog = hog(img, orientations=8, pixels_per_cell=(10, 10),
                    cells_per_block=(1, 1), visualise=True)
    return _hog

In [47]:
blur_hog = lambda x: gaussian(get_hog(x), sigma=2.5, mode='mirror')

In [48]:
bhog_noorig_train_data = preprocess_set(blur_hog, train_nonas_data)

In [49]:
save4later.save_preprod(bhog_noorig_train_data, 'bhog_nonas',
                        description='Blurred HOG (only faces with no NA features)',
                        overwrite=True)

In [54]:
save4later.list_preprods()


    PREPROC'D DATASETS
 Total: 8

  + bhog_nonas.pk : Blurred HOG (only faces with no NA features)
  + sobel_nonas.pk : Sobel filter (only faces with no NA features)
  + masked_nonas.pk : Masked data (only faces with no NA features)
  + orig_sobel_nonas.pk : Original + Sobel filter (only faces with no NA features)
  + lapgauss_nonas.pk : Laplace+Gaussian transform (only faces with no NA features)
  + test.pk : This is a test
  + masked.pk : Masked data (removed background of all faces)
  + gauss_nonas.pk : Blurred images (only faces with no NA features)
