PA9: Neural Networks with Tensorflow

In this assignment, you will:

1. Implement neural networks as a powerful approach to supervised machine learning,
2. Practice using state-of-the-art software tools and programming paradigms for machine learning,
3. Investigate the impact of parameters to learning on neural network performance as evaluated on an empirical data set.

For this assignment, we will learn use a well known dataset:

[Higgs](https://archive.ics.uci.edu/ml/datasets/HIGGS). Some information regarding this dataset: The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator. The last seven features are functions of the first 21 features; these are high-level features derived by physicists to help discriminate between the two classes.

For local testing you will use the sample dataset provided to you with this notebook.
When submitting on EdX, your code will be evaluated on a much larger sample of this dataset.

The file format for each of the above data set is as follows:

• The first row contains a comma-separated list of the names of the label and attributes

• Each successive row represents a single instance

• The first entry of each instance is the label to be learned, and all other entries (following the commas) are attribute values.

• All attributes are numerical i.e. real numbers.

Exercise 1: 

Your goal is to complete the below function named train_nn that behaves as follows:

1) It should take as input six parameters:
    
    a. The path to a file containing a data set (e.g., higgs_sample.csv)
    
    b. The number of neurons to use in the hidden layer
    
    c. The learning rate to use during backpropagation
    
    d. The number of iterations to use during training
    
    e. The percentage of instances to use for a training set
    
    f. A random seed as an integer
    
    
For example, if the call to train_nn looks like train_nn(higgs_sample.csv 20 0.001 1000 0.75 12345) which will create a neural network with 20 neurons in the hidden layer, train the network using a learning rate = 0.001 and 1000 iterations through higgs_sample.csv with a random seed of 12345, where 75% of the data will be used for training (and the remaining 25% will be used for testing)

2) You should create a neural network in Tensorflow that will be learned from the training data. The key parameters to the architecture of the neural network are based on your inputted parameters and the size of your data set:
    
    a. The number of attributes in the input layer is the length of each instance’s
    attribute list (which is the same for all instances)
    
    b. The number of neurons in a hidden layer will be inputted to the program as a
    parameter. Each hidden neuron should use tf.sigmoid as its activation function.
    
    c. The number of output neurons will be 1 since it is a binary classification task, and that should use tf.sigmoid as its activation function
    
3) You should use different cost/loss functions that the network tries to minimize depending on the number of labels:
    
    a. For binary classification we will use the cross entropy loss function:
    
    TODO: get latex version from here: https://stackoverflow.com/questions/46291253/tensorflow-sigmoid-and-cross-entropy-vs-sigmoid-cross-entropy-with-logits
    
    b. You will use full batch gradient descent (No mini batching is required, but you may optionlly do it) TODO: change this if grading scheme does not conform to this.

## TODO: edit this to use cross entropy loss

$$SSE(X) = \sum_{j=1}^{n}({y_j - \hat{y}_j})^2$$


    The function tf.reduce_sum will allow you to sum across all instances.
    

4) For the implementation of Backpropagation, you should use tf.train.AdamOptimizer

For more on optimizers, you may follow this link: TODO

5) You should train your network using your inputted learning rate and for the inputted number of iterations. The iterations are simply a loop that calls Backpropagation a fixed number of times.

TODOs:

- Biases?
- Mean normalize?
- How to evaluate?


In [1]:
import pandas as pd
file_name = "higgs_small.csv"

In [2]:
## Looking a the data
input = pd.read_csv(file_name,header=None)

In [3]:
input.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,19,20,21,22,23,24,25,26,27,28
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,...,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,0.5295,0.997924,-0.016681,-0.003486,0.991385,-0.009822,0.992058,-0.001468,0.003751,1.004939,...,0.001222,-0.014789,0.999142,1.029148,1.021455,1.050877,1.012534,0.967713,1.031224,0.957864
std,0.499154,0.574965,1.002943,1.010838,0.59546,1.004828,0.477408,1.004115,1.015907,1.026965,...,1.005459,1.000998,1.396992,0.637225,0.369623,0.165939,0.404927,0.523195,0.36622,0.313337
min,0.0,0.275063,-2.425236,-1.742508,0.012355,-1.743755,0.159488,-2.941008,-1.741237,0.0,...,-2.496432,-1.742136,0.0,0.172241,0.342467,0.461183,0.384411,0.080986,0.388779,0.444956
25%,0.0,0.596061,-0.744166,-0.872486,0.57125,-0.88574,0.679817,-0.682789,-0.892627,0.0,...,-0.70836,-0.885352,0.0,0.790251,0.847034,0.985821,0.768254,0.675036,0.822794,0.770775
50%,1.0,0.85877,-0.028786,0.000643,0.886284,-0.01994,0.897522,-0.009928,0.020396,1.086538,...,-0.001294,-0.014137,0.0,0.894663,0.949071,0.98979,0.917218,0.869291,0.945466,0.871492
75%,1.0,1.248305,0.714839,0.881675,1.294626,0.856703,1.170305,0.680263,0.878984,2.173076,...,0.719931,0.846634,3.101961,1.024671,1.07956,1.021323,1.145685,1.123704,1.132049,1.055831
max,1.0,6.695388,2.429998,1.743236,5.824007,1.742818,7.064657,2.969674,1.741454,2.173076,...,2.495511,1.742817,3.101961,13.098125,7.391968,3.68226,6.583121,8.255083,4.749469,4.316365


In [4]:
import tensorflow as tf


  from ._conv import register_converters as _register_converters


In [5]:
## Bogus as of now
working_locally = True

In [6]:
import tensorflow as tf

filename_queue = tf.train.string_input_producer([file_name])


line_reader = tf.TextLineReader()
key, csv_row = line_reader.read(filename_queue)

In [7]:
record_defaults = [[0.0]]*29
all_columns = tf.decode_csv(csv_row, record_defaults=record_defaults)

In [8]:
# Turn the features back into a tensor.
features = tf.stack(all_columns[1:])
labels = tf.stack(all_columns[0])

In [9]:
# Parameters
learning_rate = 0.05
training_epochs = 400
batch_size = 10000
display_step = 1
num_examples= 10000

# Network Parameters
n_hidden_1 = 32 # 1st layer number of features
n_hidden_2 = 8 # 2nd layer number of features
n_input = 28 
n_classes = 1 

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, 1])



In [10]:
# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

In [11]:
def normalize(train):
    mean, std = train.mean(), train.std()
    train = (train - mean) / std
    return train

In [12]:

# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with SIGMOID activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.sigmoid(layer_1)
    # Hidden layer with SIGMOID activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.sigmoid(layer_2)
    # Output layer with SIGMOID activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
#     out_layer_sigmoid = tf.nn.sigmoid(out_layer)
#     return out_layer_sigmoid
    return out_layer



In [13]:

# Construct model
pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
# cost = tf.reduce_sum((y-pred)**2)

# pred_onehot = tf.round(pred)
# optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
# optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=pred)
cost = tf.reduce_mean(cross_entropy)
pred_onehot = tf.round(tf.nn.sigmoid(pred))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

In [14]:
import numpy as np

In [17]:
%%time
with tf.Session() as sess:
    #tf.initialize_all_variables().run()
    sess.run(init)
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)
    x_full = []
    y_full = []
    for i in range(num_examples):
        example, label = sess.run([features, labels])
        x_full.append(example)
        
        y_full.append(label)
    x_full = normalize(np.array(x_full))
    for epoch in range(training_epochs):
        avg_cost = 0.
#         total_batch = int(num_examples/batch_size)
        # Loop
        
        y_batch = np.array(y_full)
        full = np.hstack([x_full, np.reshape(y_batch, (y_batch.shape[0], 1))])
        if epoch == 1:
            print(full.shape)
        permuted_indices = np.random.choice(full.shape[0], full.shape[0], replace=False)
        permuted = full[permuted_indices]
        x_batch = permuted[:,:-1]
        #print(np.vstack(x_batch[:1,:]))
        y_batch = np.reshape(permuted[:,-1],(permuted[:,-1].shape[0], 1))
        # Run optimization op (backprop) and cost op (to get loss value)
        _, c, batch_preds = sess.run([optimizer, cost, pred_onehot], feed_dict={x: x_batch,
                                                      y: y_batch})
        # Compute average loss
        avg_cost = c
        acc = sum(np.array(batch_preds)==np.array(y_batch))/(num_examples)
        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost), "acc=", acc)
    print ("Optimization Finished!")
    coord.request_stop()
    coord.join(threads)

Epoch: 0001 cost= 1.383473992 acc= [0.5295]
(10000, 29)
Epoch: 0002 cost= 1.087249756 acc= [0.5294]
Epoch: 0003 cost= 0.827981651 acc= [0.5194]
Epoch: 0004 cost= 0.794910431 acc= [0.4905]
Epoch: 0005 cost= 0.877075970 acc= [0.4866]
Epoch: 0006 cost= 0.874924004 acc= [0.4854]
Epoch: 0007 cost= 0.814799666 acc= [0.4909]
Epoch: 0008 cost= 0.752783000 acc= [0.5006]
Epoch: 0009 cost= 0.720396578 acc= [0.5091]
Epoch: 0010 cost= 0.715383708 acc= [0.5208]
Epoch: 0011 cost= 0.720889330 acc= [0.5236]
Epoch: 0012 cost= 0.725045204 acc= [0.5298]
Epoch: 0013 cost= 0.723843157 acc= [0.5312]
Epoch: 0014 cost= 0.717828214 acc= [0.5335]
Epoch: 0015 cost= 0.709551454 acc= [0.5378]
Epoch: 0016 cost= 0.701934695 acc= [0.5363]
Epoch: 0017 cost= 0.697039485 acc= [0.5344]
Epoch: 0018 cost= 0.695339739 acc= [0.5354]
Epoch: 0019 cost= 0.695858002 acc= [0.5297]
Epoch: 0020 cost= 0.696946621 acc= [0.5229]
Epoch: 0021 cost= 0.697177768 acc= [0.5184]
Epoch: 0022 cost= 0.695911229 acc= [0.519]
Epoch: 0023 cost= 0.6

Epoch: 0193 cost= 0.545914650 acc= [0.7284]
Epoch: 0194 cost= 0.545391202 acc= [0.7289]
Epoch: 0195 cost= 0.544872284 acc= [0.7283]
Epoch: 0196 cost= 0.544358492 acc= [0.73]
Epoch: 0197 cost= 0.543856561 acc= [0.7292]
Epoch: 0198 cost= 0.543368280 acc= [0.7313]
Epoch: 0199 cost= 0.542913973 acc= [0.731]
Epoch: 0200 cost= 0.542507410 acc= [0.7319]
Epoch: 0201 cost= 0.542226732 acc= [0.7321]
Epoch: 0202 cost= 0.542076826 acc= [0.7329]
Epoch: 0203 cost= 0.542250931 acc= [0.7312]
Epoch: 0204 cost= 0.542149305 acc= [0.7329]
Epoch: 0205 cost= 0.541797042 acc= [0.7313]
Epoch: 0206 cost= 0.540237546 acc= [0.7335]
Epoch: 0207 cost= 0.538941026 acc= [0.7355]
Epoch: 0208 cost= 0.538586795 acc= [0.7366]
Epoch: 0209 cost= 0.538789570 acc= [0.7358]
Epoch: 0210 cost= 0.538709104 acc= [0.7352]
Epoch: 0211 cost= 0.537672222 acc= [0.7374]
Epoch: 0212 cost= 0.536627650 acc= [0.7377]
Epoch: 0213 cost= 0.536200643 acc= [0.7381]
Epoch: 0214 cost= 0.536193013 acc= [0.7395]
Epoch: 0215 cost= 0.535989463 acc= 

Epoch: 0385 cost= 0.484762996 acc= [0.7752]
Epoch: 0386 cost= 0.484613776 acc= [0.7737]
Epoch: 0387 cost= 0.484467775 acc= [0.7757]
Epoch: 0388 cost= 0.484340221 acc= [0.7737]
Epoch: 0389 cost= 0.484219134 acc= [0.7753]
Epoch: 0390 cost= 0.484133095 acc= [0.7739]
Epoch: 0391 cost= 0.484053463 acc= [0.7753]
Epoch: 0392 cost= 0.484027624 acc= [0.7738]
Epoch: 0393 cost= 0.483984709 acc= [0.7757]
Epoch: 0394 cost= 0.483995318 acc= [0.7739]
Epoch: 0395 cost= 0.483905822 acc= [0.7765]
Epoch: 0396 cost= 0.483815730 acc= [0.7747]
Epoch: 0397 cost= 0.483515710 acc= [0.7766]
Epoch: 0398 cost= 0.483158350 acc= [0.7748]
Epoch: 0399 cost= 0.482643664 acc= [0.7773]
Epoch: 0400 cost= 0.482151121 acc= [0.7766]
Optimization Finished!
CPU times: user 14.9 s, sys: 865 ms, total: 15.8 s
Wall time: 11 s
