# Multilayer Perceptron
## Introduction and importing data
This is the main file for the multilayer perceptron. Here we call all the necessary functions and train the neural network. Let's get started by first importing the data.

In [1]:
import data_utils as du
import learning_utils as lu

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
data_file = 'kaggle_data/train_data.csv'
labels_file = 'kaggle_data/train_labels.csv'

data, labels = du.Import_Data(data_file, labels_file)

Imported data (4363, 264) and labels (4363, 1).


## Preprocessing data
Here we remove all features with zero variance, since they're not useful and also because they will cause a division by zero during normalization. Normalization is done here using min-max method. After normalization we shuffle the data, in order for Keras to pick a decent training set and validation set. We do not need to split the data into test and train sets, since Keras has a build in method for doing that during training.

In [3]:
clean_data = du.Remove_Zero_Variance(data)

Zero variance features removed from data. Input shape: (4363, 264). Output shape: (4363, 260).


In [4]:
normalized_features = du.Normalize(clean_data, 'min-max')

Data normalized using min-max method. Range: [-0.007439011072311145, 0.992560988927689].


In [5]:
pca = du.PCA_fit(normalized_features,0.999995)
selected_train_features = du.PCA_transform(pca, normalized_features)

(4363, 151)


In [6]:
shuffled_features, shuffled_labels = du.Shuffle(selected_train_features, labels)


Data successfully shuffled


In [7]:
resampled_features, resampled_labels = du.Resample(shuffled_features, shuffled_labels)
train_features, test_features, train_labels, test_labels = du.Split_Data(resampled_features, resampled_labels, 0.33)

  y = column_or_1d(y, warn=True)


Data successfully split. Test data ratio = 0.33


## Growing the random forest


In [8]:
rf = lu.Learn_Random_Forest(train_features, train_labels)



Best score: 0.9401961456690213, using parameters: {'estimator__max_depth': 13}


In [9]:
predictions = lu.Predict(rf, test_features)
accuracy = lu.Accuracy_Score(test_labels, predictions)

Accuracy: 0.9576778504803007


## Predict labels for submission

In [10]:
import pandas as pd
validation_data_file = 'kaggle_data/test_data.csv'
validation_data = pd.read_csv(data_file,header=None)

clean_valid_data = du.Remove_Zero_Variance(validation_data)
normalized_valid_data = du.Normalize(clean_valid_data, 'min-max')
selected_valid_features = du.PCA_transform(pca, normalized_valid_data)

Zero variance features removed from data. Input shape: (4363, 264). Output shape: (4363, 260).
Data normalized using min-max method. Range: [-0.007439011072311145, 0.992560988927689].


In [None]:
predictions_to_submit = lu.Predict(rf, test_features)