# Nina's first experiments with the facial keypoints detection

In [1]:
# For relative imports
import sys
sys.path.append('/Users/ninakuklisova/facial-keypoint-detection/scripts')

# Import submit module from our tools subfolder
from tools import submit, getdata

Now we are good to use any function in our `submit.py` module. Since we commented our module appropriately, we can even pull up the "documentation":

In [2]:
submit.create_submission?

## Baseline submission

Now that we have our package loaded let's recreate the baseline model for the first feature.  

But first let's do our normal imports

In [3]:
import numpy as np
import pandas as pd

from sklearn.neighbors import KNeighborsRegressor

Let us load the data using `tools.getdata` module:  

In [4]:
# Load data (no dev)
_loaded = getdata.load_data(0, test=True, nonas=True)

FEATURES = _loaded['features']
print 'Number of features:', len(FEATURES)

train_data = _loaded['training']['data']
train_labels = _loaded['training']['labels']
print 'Training dataset size: ', train_data.shape

test_data = _loaded['test']['data']
print 'Test dataset size: ', test_data.shape

Number of features: 30
Training dataset size:  (2140,)
Test dataset size:  (1783,)


Now, let's try the Principal Component Analysis, as in our homework project 3.

In [5]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.mixture import GMM
from matplotlib.colors import LogNorm

We first check the shape of the dataset.

In [12]:
train_data

array([array([238, 236, 237, ...,  70,  75,  90], dtype=uint8),
       array([219, 215, 204, ...,   1,   1,   1], dtype=uint8),
       array([144, 142, 159, ...,  78,  78,  77], dtype=uint8), ...,
       array([31, 40, 47, ..., 39, 51, 75], dtype=uint8),
       array([  7,   1,   5, ..., 179, 177,  57], dtype=uint8),
       array([ 68,  19,  19, ..., 125, 124, 119], dtype=uint8)], dtype=object)

We start with transforming the data into the appropriate shape for PCA.

In [13]:
train_data = map(lambda t: list(t), train_data)

In [16]:
train_data[0]

[238,
 236,
 237,
 238,
 240,
 240,
 239,
 241,
 241,
 243,
 240,
 239,
 231,
 212,
 190,
 173,
 148,
 122,
 104,
 92,
 79,
 73,
 74,
 73,
 73,
 74,
 81,
 74,
 60,
 64,
 75,
 86,
 93,
 102,
 100,
 105,
 109,
 114,
 121,
 127,
 132,
 134,
 137,
 137,
 140,
 139,
 138,
 137,
 137,
 140,
 141,
 143,
 144,
 147,
 148,
 149,
 147,
 147,
 148,
 145,
 147,
 144,
 146,
 147,
 147,
 143,
 134,
 130,
 130,
 128,
 116,
 104,
 98,
 90,
 82,
 78,
 85,
 88,
 86,
 80,
 77,
 87,
 108,
 111,
 115,
 128,
 133,
 188,
 242,
 252,
 250,
 248,
 251,
 250,
 250,
 250,
 235,
 238,
 236,
 238,
 238,
 237,
 238,
 242,
 241,
 239,
 237,
 233,
 215,
 195,
 187,
 156,
 119,
 103,
 93,
 78,
 68,
 73,
 75,
 75,
 72,
 75,
 70,
 61,
 66,
 77,
 91,
 96,
 106,
 108,
 113,
 120,
 125,
 131,
 134,
 138,
 135,
 138,
 139,
 145,
 144,
 144,
 142,
 140,
 141,
 141,
 148,
 147,
 150,
 149,
 152,
 151,
 149,
 150,
 147,
 148,
 144,
 148,
 144,
 146,
 146,
 143,
 139,
 128,
 132,
 135,
 128,
 112,
 104,
 97,
 87,
 78,
 79,
 83,

We start with a PCA with 2 components.

In [18]:
pca_mod = PCA(n_components = 2)
pca_mod.fit_transform(train_data)
    
print 'Explained variance ratio: \n', np.sum(pca_mod.explained_variance_ratio_ )
print 'Cumulative explained variance: \n', np.sum(pca_mod.explained_variance_ratio_)

Explained variance ratio: 
0.454453273597
Cumulative explained variance: 
0.454453273597


Now, let's see what could we get with a higher number of components.

In [19]:
for n_comp in range(1, 51):
    print n_comp, ' components'
    pca_mod = PCA(n_components = n_comp)
    pca_mod.fit(train_data)

    print 'Explained variance ratio: \n', np.sum(pca_mod.explained_variance_ratio_ )
    print 'Cumulative explained variance: \n', np.sum(pca_mod.explained_variance_ratio_)
    print '\n'

1  components
Explained variance ratio: 
0.307988470855
Cumulative explained variance: 
0.307988470855


2  components
Explained variance ratio: 
0.454453273597
Cumulative explained variance: 
0.454453273597


3  components
Explained variance ratio: 
0.539972827308
Cumulative explained variance: 
0.539972827308


4  components
Explained variance ratio: 
0.596422333511
Cumulative explained variance: 
0.596422333511


5  components
Explained variance ratio: 
0.630014520552
Cumulative explained variance: 
0.630014520552


6  components
Explained variance ratio: 
0.653776910588
Cumulative explained variance: 
0.653776910588


7  components
Explained variance ratio: 
0.676868263989
Cumulative explained variance: 
0.676868263989


8  components
Explained variance ratio: 
0.693959235959
Cumulative explained variance: 
0.693959235959


9  components
Explained variance ratio: 
0.707903330651
Cumulative explained variance: 
0.707903330651


10  components
Explained variance ratio: 
0.72014568414

Now, let's try the Gaussian Mixture models.

In [5]:
# GMM



### coming soon

And now we can enjoy the `tools.submit` module!

In [6]:
submit.create_generate(test_data, models, 'tools_example', verbose=True)

Predicting "left_eye_center_x"... done! (136.8s)

... Created the csv file: ../../data/submissions/tools_example_submission.csv


In [7]:
%ls ../../data/submissions/

IdLookupTable.csv                       full_knregressor_submission_nonull.csv
README.md                               tools_example_submission.csv


This function is just a wrapper, we could've used the functions `submit.create_submission` and then `submit.generate_csv`:

```python
# Create predictions
kn_predictions = submit.create_submission(test_data, models, 'tools_example')

# Generate submition csv from predictions
submit.generate_csv(kn_predictions, 'tools_example')
```