# Facial Feature - Baseline Submission  

The following notebook contains the baseline submission by the team composed of Alex, Ankit, Annalaissa, Nina and Guillermo for the [Facial Keypoints Detection](https://www.kaggle.com/c/facial-keypoints-detection) Kaggle competition.  

Several Machine Learning teachniques were initially tested but the selected approach was the use of a *default* **K-Nearest Neighbors Regressor**, with accuracies on development data around 90% - 95%.  

A KN-Regression model was trained independently for each feature coordinate.  

In [1]:
import numpy as np
import pandas as pd
import time

from sklearn.neighbors import KNeighborsRegressor

## Load and preprocess data  
Preprocessing filters all images with null labels.  
No dev set created for baseline submission.  

In [2]:
# Load data
full_data = pd.read_csv("data/training.csv")

# Preprocess images into unsigned 8byte integer arrays
get_img = lambda x: np.uint8( map(int, x.split()) )
full_data['img_processed'] = map( get_img, full_data['Image'])

# fill na
data_nonas = full_data.dropna()

In [3]:
# Get training data
train_data = data_nonas.iloc[:,31].values
train_labels = np.round(data_nonas.iloc[:,:30].values)

# For base submission no (dev data)

In [4]:
# Load test_data
full_test = pd.read_csv("data/test.csv")

# Preprocess images into unsigned 8byte integer arrays
get_img = lambda x: np.uint8( map(int, x.split()) )
full_test['img_processed'] = map( get_img, full_test['Image'])

# Get training data
test_data = full_test[['img_processed']].copy()

## Train KN-Regressor model for each feature coordinate

In [5]:
# Get list of models
kn_regressors = []

# Get all models
for i, facial_feature in enumerate(data_nonas.columns[:-2]):
    
    knn = KNeighborsRegressor()
    knn.fit(train_data.tolist(), train_labels[:,i])
    
    kn_regressors.append( (facial_feature, knn) )

## Predict Test data and create submission file

In [82]:
def create_submission(test, models, label='baseline', verbose=False):
    ''' Predict and generate submission file 
    Inputs: 
        test - test dataset on which to predict
        models - list of tuples: [ (feature, model), ... ]
        label - label for identification of the submission file
    
    Usage: >> create_submission( <test_data>, <list_of_models> [, <label> ] )
    '''
    
    predicted_df = pd.DataFrame()
    
    # get predictions on test dataset
    for (f, mod) in models:  # 'models' is a list of tuples ('facial_feature', model)
        
        if verbose:
            print 'Predicting "{}"...'.format(f),
        
        _start = time.time()  # start timer
        predicted_df[f] = mod.predict(test.iloc[:,0].tolist())
        _elapsed = time.time() - _start
        
        if verbose:
            print 'done! ({:.1f}s)'.format(_elapsed)
    
    # create the csv file
    generate_csv(predicted_df, label)
    
    return predicted_df

def generate_csv(df, label):
    ''' Generate csv file with the submission format
    Inputs:
        df - dataframe with predictions
        label - label to identify the submission file
    
    Usage: >> generate_csv(<data_frame_with_predictions>, <label>)
    '''
    
    # Get full flat frame
    out = pd.DataFrame()
    out['Location'] = df.values.flatten()
    out['RowId'] = np.arange(1,len(out)+1)
    out = out[['RowId','Location']]
    
    # Unpivot data, filter with SampleSubmission
    unpivot = pd.melt(kn_predictions.reset_index(), id_vars='index')
    unpivot.columns = ['ImageId', 'FeatureName', 'Location']
    scored_sub = pd.merge(id_t[['RowId', 'ImageId', 'FeatureName']], unpivot,
                          on=['ImageId', 'FeatureName'], how='left')
        
    # Export only RowId and Location columns
    final = scored_sub[['RowId','Location']]
    with open('data/{}_submission.csv'.format(label), 'wb') as f:
        final.to_csv(f, index=False)
    
    print '... Created the csv file: data/{}_submission.csv'.format(label)


In [7]:
# Create submission!! :)
kn_predictions = create_submission(test_data, kn_regressors, 'full_knregressor', True)

Predicting "left_eye_center_x"... done! (55.6s)
Predicting "left_eye_center_y"... done! (54.8s)
Predicting "right_eye_center_x"... done! (53.9s)
Predicting "right_eye_center_y"... done! (54.1s)
Predicting "left_eye_inner_corner_x"... done! (54.3s)
Predicting "left_eye_inner_corner_y"... done! (54.1s)
Predicting "left_eye_outer_corner_x"... done! (55.1s)
Predicting "left_eye_outer_corner_y"... done! (57.6s)
Predicting "right_eye_inner_corner_x"... done! (54.5s)
Predicting "right_eye_inner_corner_y"... done! (55.0s)
Predicting "right_eye_outer_corner_x"... done! (54.2s)
Predicting "right_eye_outer_corner_y"... done! (54.0s)
Predicting "left_eyebrow_inner_end_x"... done! (54.0s)
Predicting "left_eyebrow_inner_end_y"... done! (54.2s)
Predicting "left_eyebrow_outer_end_x"... done! (55.2s)
Predicting "left_eyebrow_outer_end_y"... done! (54.2s)
Predicting "right_eyebrow_inner_end_x"... done! (55.3s)
Predicting "right_eyebrow_inner_end_y"... done! (58.9s)
Predicting "right_eyebrow_outer_end_x"

Generate csv again, because I changed the function definition while debugging. (Overwriting incorrect version of `full_knregressor_submission.csv`

In [79]:
generate_csv(kn_predictions, 'full_knregressor')

... Created the csv file: data/full_knregressor_submission.csv


Check submission csv file

In [80]:
_df = pd.read_csv('data/full_knregressor_submission.csv')

print _df.shape
_df.head()