### Contents
1. A simple Multi Logistic Perceptron (MLP) model 
2. Peform PCA dimensionality reduction -> A simple MLP model
3. Reformate features into 4D (No. samples, 8, 8, channels:margin,texture,shape) -> A simple CNN model
4. Performance summary

### Load libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
%matplotlib inline
import tensorflow as tf



### A simple MLP model

In [2]:
csv_data = pd.read_csv("../train.csv")

In [3]:
# extract species for numeric encoding 
species = sorted(csv_data.species.unique())
species_dict = {specie:index for index, specie in enumerate(species)}

In [4]:
all_data = csv_data.replace({'species':species_dict})

***
**Build a simple Neural Network using shape, margin, texture features to get a sense of performance**

**Split training set and testing set**

In [8]:
train_X, test_X, train_y, test_y = train_test_split(all_data.drop(['species'],axis=1), all_data['species'],
                                                    test_size=330, stratify=all_data['species'], random_state=916)

**Preprocess data**
 - center data and whiten X ??
 - one-hot-encode y
 - change data type

In [9]:
num_labels = len(species)

In [10]:
train_y_encode = np.arange(num_labels)==np.array(train_y)[:,None]
test_y_encode = np.arange(num_labels)==np.array(test_y)[:,None]

In [11]:
train_y_encode = train_y_encode.astype(np.float32)
test_y_encode = test_y_encode.astype(np.float32)

In [12]:
train_X_numpy = train_X.drop(['id'],axis=1).as_matrix().astype(np.float32)
test_X_numpy = test_X.drop(['id'],axis=1).as_matrix().astype(np.float32)

In [13]:
print "training set size:\t", train_X_numpy.shape, '\t', train_y_encode.shape
print "testing set size:\t", test_X_numpy.shape, '\t', test_y_encode.shape

training set size:	(660, 192) 	(660, 99)
testing set size:	(330, 192) 	(330, 99)


**Build the MLP model**

In [51]:
batch_size = 64
num_features = train_X_numpy.shape[1]
hidden_layer_size = 1024

In [52]:
def initialize(scope, shape, wt_initializer, center=True, scale=True):
    with tf.variable_scope(scope, reuse=None) as sp:
        wt = tf.get_variable("weights", shape, initializer=wt_initializer)
        bi = tf.get_variable("biases", shape[-1], initializer=tf.constant_initializer(1.))
        if center:
            beta = tf.get_variable("beta", shape[-1], initializer=tf.constant_initializer(0.0))
        if scale:
            gamma = tf.get_variable("gamma", shape[-1], initializer=tf.constant_initializer(1.0))
        moving_avg = tf.get_variable("moving_mean", shape[-1], initializer=tf.constant_initializer(0.0), \
                                     trainable=False)
        moving_var = tf.get_variable("moving_variance", shape[-1], initializer=tf.constant_initializer(1.0), \
                                     trainable=False)
        sp.reuse_variables()

In [61]:
init_lr = 0.001
graph = tf.Graph()
with graph.as_default():
    # prepare input 
    train_X_tf = tf.placeholder(tf.float32, shape=[batch_size, num_features])
    train_y_tf = tf.placeholder(tf.float32, shape=[batch_size, num_labels])
    valid_X_tf, valid_y_tf = tf.constant(test_X_numpy), tf.constant(test_y_encode)
    
    # initialize multi-layer parameters
    layers = [{'scope':'hidden_layer', 'shape':[num_features, hidden_layer_size], 
               'initializer':tf.contrib.layers.xavier_initializer()},
              {'scope':'output_layer', 'shape':[hidden_layer_size, num_labels],
               'initializer':tf.contrib.layers.xavier_initializer()}]
    for layer in layers:
        initialize(layer['scope'], layer['shape'], layer['initializer'])
    
    # build model
    def model(X, layer_scopes, is_training, keep_prob, decay=0.9):
        for scope in layer_scopes:
            with tf.variable_scope(scope, reuse=True):
                wt = tf.get_variable("weights")
                bi = tf.get_variable("biases")
            X = tf.matmul(X, wt) + bi
            # Batch Normalizaion
            X = tf.contrib.layers.batch_norm(X, decay=decay, center=True, scale=True, is_training=is_training,
                                            updates_collections=None, scope=scope, reuse=True)
            # ReLu Activation
            X = tf.nn.relu(X)
            # Dropout for non-output layers
            if scope!=layer_scopes[-1]:
                X = tf.nn.dropout(X, keep_prob)
        return X
    
    # setup a few parameters
    layer_scopes = [l['scope'] for l in layers]
    keep_prob = tf.placeholder(tf.float32)
    
    # compute log loss logloss = - 1/N*Sum(yij*log(pij)) -> yij is 1 if observation i is in class j otherwise 0
    train_logits = model(train_X_tf, layer_scopes, True, keep_prob)
    train_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(train_logits, train_y_tf))
    valid_logits = model(valid_X_tf, layer_scopes, False, keep_prob)
    valid_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(valid_logits, valid_y_tf))
    
    # setup optimizer
    global_step = tf.Variable(0)
    learning_rate = init_lr
    #tf.train.exponential_decay(init_lr, global_step, decay_steps=5000, decay_rate=0.5, staircase=True)
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(train_loss, global_step=global_step)

In [62]:
num_steps = 10000
with tf.Session(graph=graph) as sess:
    tf.initialize_all_variables().run()
    print("Initialized")
    for step in range(num_steps):
        offset = (step * batch_size) % (train_y_encode.shape[0] - batch_size)
        batch_X = train_X_numpy[offset: (offset+batch_size), :]
        batch_y = train_y_encode[offset: (offset+batch_size), :]
        feed_dict = {train_X_tf: batch_X, train_y_tf: batch_y, keep_prob: 0.5}
        _, tloss = sess.run([optimizer, train_loss], feed_dict=feed_dict)
        if step%500==0:
            vloss = sess.run(valid_loss, feed_dict={keep_prob: 1.0})
            print('Epoch: %d:\tTrain Loss: %.6f\tValid Loss: %.6f' \
                %(step, tloss, vloss))
    print("Finished training")
    vloss = sess.run(valid_loss, feed_dict={keep_prob: 1.0})
    print("Final valid loss: %.6f" %(vloss))

Initialized
Epoch: 0:	Train Loss: 4.772790	Valid Loss: 5.229158
Epoch: 500:	Train Loss: 0.212472	Valid Loss: 0.368460
Epoch: 1000:	Train Loss: 0.163448	Valid Loss: 0.274983
Epoch: 1500:	Train Loss: 0.055808	Valid Loss: 0.239082
Epoch: 2000:	Train Loss: 0.034699	Valid Loss: 0.209699
Epoch: 2500:	Train Loss: 0.031398	Valid Loss: 0.188297
Epoch: 3000:	Train Loss: 0.031013	Valid Loss: 0.178511
Epoch: 3500:	Train Loss: 0.013220	Valid Loss: 0.176729
Epoch: 4000:	Train Loss: 0.009799	Valid Loss: 0.172518
Epoch: 4500:	Train Loss: 0.006094	Valid Loss: 0.163564
Epoch: 5000:	Train Loss: 0.015565	Valid Loss: 0.160167
Epoch: 5500:	Train Loss: 0.004902	Valid Loss: 0.145837
Epoch: 6000:	Train Loss: 0.004115	Valid Loss: 0.154634
Epoch: 6500:	Train Loss: 0.008227	Valid Loss: 0.146779
Epoch: 7000:	Train Loss: 0.002953	Valid Loss: 0.147245
Epoch: 7500:	Train Loss: 0.003330	Valid Loss: 0.146537
Epoch: 8000:	Train Loss: 0.001930	Valid Loss: 0.141243
Epoch: 8500:	Train Loss: 0.001662	Valid Loss: 0.128760
Ep

***
### Apply PCA to reduce input feature sizes and then build a simple MLP 

In [27]:
def pca_transform(feature, n_components):
    cols = [col for col in list(all_data.columns) if feature in col]
    selected_cols = all_data[cols]
    pca_obj = PCA(n_components=n_components, whiten=True)
    pca_obj.fit(selected_cols)
    print "total explained variance by ",n_components,'principle components: \t',sum(pca_obj.explained_variance_ratio_)
    transformed = pca_obj.transform(selected_cols)
    return transformed

In [19]:
margin = pca_transform("margin", 8)

total explained variance by  8 principle components: 	0.809681416276


In [21]:
texture = pca_transform("texture", 10)

total explained variance by  10 principle components: 	0.744586306496


In [22]:
shape = pca_transform("shape",3)

total explained variance by  3 principle components: 	0.963772808809


In [24]:
data_transformed = np.concatenate((margin, texture, shape), axis=1)

**Split data**

In [63]:
train_X, test_X, train_y, test_y = train_test_split(data_transformed, all_data['species'],
                                                    test_size=330, stratify=all_data['species'], random_state=916)

In [64]:
train_y_encode = np.arange(num_labels)==np.array(train_y)[:,None]
test_y_encode = np.arange(num_labels)==np.array(test_y)[:,None]

In [65]:
train_y_encode = train_y_encode.astype(np.float32)
test_y_encode = test_y_encode.astype(np.float32)

In [66]:
train_X_numpy = train_X.astype(np.float32)
test_X_numpy = test_X.astype(np.float32)

In [67]:
print "training set size:\t", train_X_numpy.shape, '\t', train_y_encode.shape
print "testing set size:\t", test_X_numpy.shape, '\t', test_y_encode.shape

training set size:	(660, 21) 	(660, 99)
testing set size:	(330, 21) 	(330, 99)


**PCA transformed simple MLP**

In [68]:
batch_size = 64
num_features = train_X_numpy.shape[1]
hidden_layer_size = 1024

In [69]:
init_lr = 0.001
graph = tf.Graph()
with graph.as_default():
    # prepare input 
    train_X_tf = tf.placeholder(tf.float32, shape=[batch_size, num_features])
    train_y_tf = tf.placeholder(tf.float32, shape=[batch_size, num_labels])
    valid_X_tf, valid_y_tf = tf.constant(test_X_numpy), tf.constant(test_y_encode)
    
    # initialize multi-layer parameters
    layers = [{'scope':'hidden_layer', 'shape':[num_features, hidden_layer_size], 
               'initializer':tf.contrib.layers.xavier_initializer()},
              {'scope':'output_layer', 'shape':[hidden_layer_size, num_labels],
               'initializer':tf.contrib.layers.xavier_initializer()}]
    for layer in layers:
        initialize(layer['scope'], layer['shape'], layer['initializer'])
    
    # build model
    def model(X, layer_scopes, is_training, keep_prob, decay=0.9):
        for scope in layer_scopes:
            with tf.variable_scope(scope, reuse=True):
                wt = tf.get_variable("weights")
                bi = tf.get_variable("biases")
            X = tf.matmul(X, wt) + bi
            # Batch Normalizaion
            X = tf.contrib.layers.batch_norm(X, decay=decay, center=True, scale=True, is_training=is_training,
                                            updates_collections=None, scope=scope, reuse=True)
            # ReLu Activation
            X = tf.nn.relu(X)
            # Dropout for non-output layers
            if scope!=layer_scopes[-1]:
                X = tf.nn.dropout(X, keep_prob)
        return X
    
    # setup a few parameters
    layer_scopes = [l['scope'] for l in layers]
    keep_prob = tf.placeholder(tf.float32)
    
    # compute log loss logloss = - 1/N*Sum(yij*log(pij)) -> yij is 1 if observation i is in class j otherwise 0
    train_logits = model(train_X_tf, layer_scopes, True, keep_prob)
    train_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(train_logits, train_y_tf))
    valid_logits = model(valid_X_tf, layer_scopes, False, keep_prob)
    valid_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(valid_logits, valid_y_tf))
    
    # setup optimizer
    global_step = tf.Variable(0)
    learning_rate = init_lr
    #tf.train.exponential_decay(init_lr, global_step, decay_steps=5000, decay_rate=0.5, staircase=True)
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(train_loss, global_step=global_step)

In [70]:
num_steps = 10000
with tf.Session(graph=graph) as sess:
    tf.initialize_all_variables().run()
    print("Initialized")
    for step in range(num_steps):
        offset = (step * batch_size) % (train_y_encode.shape[0] - batch_size)
        batch_X = train_X_numpy[offset: (offset+batch_size), :]
        batch_y = train_y_encode[offset: (offset+batch_size), :]
        feed_dict = {train_X_tf: batch_X, train_y_tf: batch_y, keep_prob: 0.3}
        _, tloss = sess.run([optimizer, train_loss], feed_dict=feed_dict)
        if step%500==0:
            vloss = sess.run(valid_loss, feed_dict={keep_prob: 1.0})
            print('Epoch: %d:\tTrain Loss: %.6f\tValid Loss: %.6f' \
                %(step, tloss, vloss))
    print("Finished training")
    vloss = sess.run(valid_loss, feed_dict={keep_prob: 1.0})
    print("Final valid loss: %.6f" %(vloss))

Initialized
Epoch: 0:	Train Loss: 4.746964	Valid Loss: 5.081502
Epoch: 500:	Train Loss: 0.590641	Valid Loss: 0.575387
Epoch: 1000:	Train Loss: 0.434571	Valid Loss: 0.340101
Epoch: 1500:	Train Loss: 0.217021	Valid Loss: 0.257954
Epoch: 2000:	Train Loss: 0.135749	Valid Loss: 0.224440
Epoch: 2500:	Train Loss: 0.115030	Valid Loss: 0.192081
Epoch: 3000:	Train Loss: 0.116827	Valid Loss: 0.175697
Epoch: 3500:	Train Loss: 0.058373	Valid Loss: 0.163170
Epoch: 4000:	Train Loss: 0.049189	Valid Loss: 0.150002
Epoch: 4500:	Train Loss: 0.038955	Valid Loss: 0.149582
Epoch: 5000:	Train Loss: 0.060937	Valid Loss: 0.147339
Epoch: 5500:	Train Loss: 0.032217	Valid Loss: 0.137791
Epoch: 6000:	Train Loss: 0.018771	Valid Loss: 0.140775
Epoch: 6500:	Train Loss: 0.042064	Valid Loss: 0.137352
Epoch: 7000:	Train Loss: 0.052881	Valid Loss: 0.146210
Epoch: 7500:	Train Loss: 0.018415	Valid Loss: 0.135323
Epoch: 8000:	Train Loss: 0.009094	Valid Loss: 0.125074
Epoch: 8500:	Train Loss: 0.010724	Valid Loss: 0.141062
Ep

***
### Simple CNN

In [14]:
import sys
sys.path.append("/Users/Zhongyu/Documents/projects/CNNplayground/")
from cnn import *

In [15]:
train_X, test_X, train_y, test_y = train_test_split(all_data.drop(['species'],axis=1), all_data['species'],
                                                    test_size=330, stratify=all_data['species'], random_state=916)

**Prepare input data for CNN**

In [16]:
# One-Hot-Encode y
train_y_encode = np.arange(num_labels)==np.array(train_y)[:,None]
test_y_encode = np.arange(num_labels)==np.array(test_y)[:,None]
train_y_encode = train_y_encode.astype(np.float32)
test_y_encode = test_y_encode.astype(np.float32)

In [17]:
def reshape_feature(X_df, feature, order='F'):
    cols = [col for col in list(X_df.columns) if feature in col]
    selected_cols = X_df[cols].as_matrix()
    reshaped_cols = selected_cols.reshape((X_df.shape[0], 8,8), order=order)
    return reshaped_cols.astype(np.float32)

In [18]:
train_margin = reshape_feature(train_X, 'margin')
train_texture = reshape_feature(train_X, 'texture')
train_shape = reshape_feature(train_X, 'shape')

In [19]:
test_margin = reshape_feature(test_X, 'margin')
test_texture = reshape_feature(test_X, 'texture')
test_shape = reshape_feature(test_X, 'shape')

In [20]:
train_X_numpy = np.concatenate((train_margin[:,:,:,None], train_texture[:,:,:,None], train_shape[:,:,:,None]), axis=3)
test_X_numpy = np.concatenate((test_margin[:,:,:,None], test_texture[:,:,:,None], test_shape[:,:,:,None]), axis=3)

In [21]:
print "training set size:\t", train_X_numpy.shape, '\t', train_y_encode.shape
print "testing set size:\t", test_X_numpy.shape, '\t', test_y_encode.shape

training set size:	(660, 8, 8, 3) 	(660, 99)
testing set size:	(330, 8, 8, 3) 	(330, 99)


**Simple CNN**

In [22]:
batch_size = 64
input_shape = [batch_size, 8, 8, 3]
conv_depth = 3

In [26]:
model = cnn_graph(input_shape, num_labels)
model.setup_data(batch_size, valid_X=test_X_numpy, valid_y=test_y_encode)
#conv_wt_initializer = tf.truncated_normal_initializer(stddev=0.10)
conv_wt_initializer = tf.contrib.layers.xavier_initializer_conv2d()
conv_layer = ('conv1', 5)

# Convolutional layer
layer_name, filter_size = conv_layer[0], conv_layer[1]
model.add_convIncept_layer(layer_name, conv_depth, conv_wt_initializer, add_output_summary=False)
model.add_batchnorm_layer(layer_name+'/batchnorm', add_output_summary=False)
model.add_act_layer(layer_name+'/activation')
#model.add_pool_layer(layer_name+'/pool')

# Fully Connected layer
fc_layers = [('fc1', 3096), ('fc2', num_labels)]
fc_wt_initializer = tf.contrib.layers.variance_scaling_initializer()
for fc_layer in fc_layers:
    layer_name, num_neuron = fc_layer[0], fc_layer[1]
    model.add_fc_layer(layer_name, num_neuron, fc_wt_initializer, add_output_summary=False)
    model.add_batchnorm_layer(layer_name+'/batchnorm', add_output_summary=False)
    model.add_act_layer(layer_name+'/activation')
    if layer_name!=fc_layers[-1][0]:
        model.add_dropout_layer(layer_name+'/dropout')

# Setup learning rate and optimizer
model.setup_learning_rate(0.0005, exp_decay=False, add_output_summary=False)
train_loss = model.compute_train_loss(add_output_summary=False)
valid_loss = model.compute_valid_loss(add_output_summary=False)
optimizer = model.setup_optimizer(tf.train.AdamOptimizer, add_output_summary=False)

In [159]:
# Start a session
graph = model.get_graph()
training_steps = 20000
with tf.Session(graph=graph) as sess:
    tf.initialize_all_variables().run()
    print("initialized")
    model.set_kp_value('fc1/dropout', 0.3)
    train_feed_dict = model.get_kp_collection_dict()
    for step in range(training_steps):
        offset = (step*batch_size)%(train_X.shape[0]-batch_size)
        batch_X = train_X_numpy[offset:(offset+batch_size),:,:,:]
        batch_y = train_y_encode[offset:(offset+batch_size),:]
        train_feed_dict.update({model.train_X : batch_X,
                                model.train_y : batch_y})
        _, tloss = sess.run([optimizer, train_loss], feed_dict=train_feed_dict)
        if step%500 == 0:
            vloss = sess.run(valid_loss, feed_dict=model.kp_reference_feed_dict)
            print('Epoch: %d:\tTrain Loss: %.6f\tValid Loss: %.6f' \
                %(step, tloss, vloss))
    print("Finished training")
    vloss = sess.run(valid_loss, feed_dict=model.kp_reference_feed_dict)
    print("Final valid loss: %.6f" %(vloss))

initialized
Epoch: 0:	Train Loss: 4.823782	Valid Loss: 7.345297
Epoch: 500:	Train Loss: 0.287523	Valid Loss: 0.484034
Epoch: 1000:	Train Loss: 0.255505	Valid Loss: 0.351168
Epoch: 1500:	Train Loss: 0.103999	Valid Loss: 0.301318
Epoch: 2000:	Train Loss: 0.074699	Valid Loss: 0.269202
Epoch: 2500:	Train Loss: 0.073452	Valid Loss: 0.244439
Epoch: 3000:	Train Loss: 0.062099	Valid Loss: 0.226238
Epoch: 3500:	Train Loss: 0.026829	Valid Loss: 0.213649
Epoch: 4000:	Train Loss: 0.030227	Valid Loss: 0.204544
Epoch: 4500:	Train Loss: 0.015463	Valid Loss: 0.194076
Epoch: 5000:	Train Loss: 0.035782	Valid Loss: 0.176879
Epoch: 5500:	Train Loss: 0.015866	Valid Loss: 0.181853
Epoch: 6000:	Train Loss: 0.010127	Valid Loss: 0.173707
Epoch: 6500:	Train Loss: 0.019437	Valid Loss: 0.162272
Epoch: 7000:	Train Loss: 0.007182	Valid Loss: 0.160137
Epoch: 7500:	Train Loss: 0.010288	Valid Loss: 0.156322
Epoch: 8000:	Train Loss: 0.006460	Valid Loss: 0.149806
Epoch: 8500:	Train Loss: 0.004348	Valid Loss: 0.153534
Ep

***
### Performance Summary

| Models     | Preprocess | No. Features| Architecture |Epoches | Learning Rate| Dropout | Performance |
| -----------|:-----------| :-----:| :----:|
| Simple MLP | None       | 128|->FC->BN->ReLu->Dropout->FC->BN->ReLu->Softmax|10000|0.001|0.5|0.1093|
| PCA MLP    | PCA white  | 21 |->FC->BN->ReLu->Dropout->FC->BN->ReLu->Softmax|10000|0.001|0.3|0.1246|
| Simple CNN | None       | 8x8x3|->ConvIncept->FC->BN->ReLu->Dropout->FC->BN->ReLu->Softmax|20000|0.0005|0.3|0.1169|
