<img src="https://ucfai.org//course/sp19/neural-nets/banner.jpg">

<div class="col-12">
    <a class="btn btn-success btn-block" href="https://ucfai.org/signup">
        First Attendance? Sign Up!
    </a>
</div>

<div class="col-12">
    <h1> Beyond the Buzzwords: Getting Started with Neural Networks  </h1>
    <hr>
</div>

<div style="line-height: 2em;">
    <p>by: 
        <strong> Alec Kerrigan</strong>
        (<a href="https://github.com/AHKerrigan">@AHKerrigan</a>)
    
        <strong> John Muchovej</strong>
        (<a href="https://github.com/ionlights">@ionlights</a>)
     on 2019-02-13</p>
</div>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


Lets set some random seeds for purposes later

In [None]:
seed = 155
np.random.seed(seed)

Next, we're going to download our dataset, and start taking a look

In [None]:
dataset = pd.read_csv("https://raw.githubusercontent.com/AHKerrigan/Epidemic-Control/master/diabetes.csv", header=None).values

In [None]:
dataset

So here we can see we've converted our entire dataset to an array. Each column in the array represents number of pregnancies, glucose levels, blood pressure, genetic information, and age. The final column is a "0" or "1", defining if the subject in the training sample is diabetic or not. If we want a nicer looking table, we could do this

In [None]:
pd.DataFrame(dataset)

Oof, thats a lot of data. Lets condense that a little better

In [None]:
pd.DataFrame(dataset).head()

We don't need those labels in our data for when we shove it into the neural network, so lets get rid of them

In [None]:
dataset = np.delete(dataset, (0), axis=0)

In [None]:
dataset

Cool, so now we don't have those pesky strings ruining our beautiful array. Next step, we're going to want to seperate the features of the data (Pregnancies, Glucose, etc) from the class labels (0 or 1 representing Diabetic status. Also, we're going to want to seperate "training" data from "test" data. Can you think of why?

Fortunately, this is python, so we can just have a library do it for us. sklearn might not get to have much fun with us in the world of Deep Learning™, but it as an extremly valuable function for us to use.

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(dataset[:,0:8], dataset[:,8], test_size=0.25, random_state=87)

Lets parse what we just did there.

In the train_test_split function, the first two parameters are the "features" (or X) of the data, and the second parameter is the "labels" (or y) of the data.

dataset[:,0:8] can be interpreted as two different ranges, seperated by a comma. The first range is my "row" ramge. [:] means I want from the first entry to the last entry. Pretty simple. The second range [0:8] tells the function I want the columns starting with column 0, and ending AND NOT INCLUDING column 8. Column 8 is our class label, and thus we don't want it in our "X"

dataset[:,8] is similar. We want all of the data, but this time you can see we don't even specify a range after the comma. We just want column 8, our class labels (y)

test_size corresponds to what percentage of the data you want to be used for testing

random_state can be thought of as a random seed

In [None]:
X_test

In [None]:
y_test

ALRIGHT, we're done preparing our data

Note: On real projects, you want to more closely inspect your data to see if there are any problems that may cause you to recieve suboptimal results (did you spot any when I created those tables?).

In a later lecture, you will learn more on data curation and preprocessing.

But for now, its time for D E E P L E A R N I N G

In [None]:
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

Alright, as you might have summized from the slides, creating a neural network is painstaking, messy, difficult, and tiring. Are you ready?

In [None]:
my_nn = keras.Sequential()
my_nn.add(Dense(units=12, input_dim=8, activation='relu'))
my_nn.add(Dense(units=8, activation='relu'))
my_nn.add(Dense(units=1, activation='sigmoid'))
my_nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Okay, I lied to you. Turns out, there are countless APIs out there that will let you create a full nueral network model in just 5 lines of code. My personal favorite is Keras, which will be what we use throughout the course. Lets go over each line

1: my_nn = keras.Sequential()

This instantiates our keras model. A Sequential model can really just be thought as what you traditionally think as a nueral network (one after the other). This will probobly be what you work with most.

2: my_nn.add(Dense(units=10, input_dim=8, activation='relu'))

This creates the first hidden layer. NOTE: In keras, the "input" layer is implicitly defined. Once we've typed this line, we actually have TWO layers to our nueral network: one of size 8 (Remember we have 8 features?), then a hidden layer of size 10. "Dense" simply refers to the fact that the nodes are fully connected, like in our slide show.

activation='relu' means that the activation function for this layer is "rectified linear unit". This of this as the line "y=x", but if x is less than 0, the function just equals 0. In experiments, it turns out that this function may be best for hidden layers, rather than a "0 or 1" function like sigmoid, but the jury really is still out.

3 and 4: 

These layers do the same thing as our previous layer, except this time we don't need to explictly state the input size, since the sequential model just assumes is the size of the last layer you added.

5: my_nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

After we've defined our model, we have to compile it. (Keras and other ML APIs actually run on more low level languages for speed, so they often need to be compiled.

Our "loss" function is that "Error" function we talked about in the lecture. Binary crossentropy is probobly the best error function for classification (0 or 1) training.

"Optimizer adam" is a tough one. In this and the previous lecture, we talked about gradient decent as a means of trianing the network. Adam is a particular version of that optimizer that has been shown to have superior results to traditional gradient decent. If you want to read more on it, go here
https://arxiv.org/abs/1412.6980

Okay, lets actually fit the model to the data and see what happens!

In [None]:
my_nn_fitted = my_nn.fit(X_train, Y_train, epochs=150, batch_size=10)

In [None]:
my_nn.evaluate(X_test, Y_test, verbose=0)

Unless I got really unlucky when I run this during the lecture, this should get between 70-75% accuracy.

This isn't amazing, but it isn't bad for such a simple model. Neural Networks are great and fun, but they aren't magic. The data needs to be clean, hyperparameters need to be tuned, and sometimes you might just want to use a completly different model altogether. The best way to get a feel for what type of problems nueral networks can best solve is to put some together yourself and seeing what results you get. In upcoming lectures, you'll learn more sophisticated types of neural networks, as well as how to tune them to achieve better results.