<a href="https://colab.research.google.com/github/will-cotton4/DS-Unit-4-Sprint-2-Neural-Networks/blob/master/U4S2SC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron: A single node in a neural network--could be in an input, hidden, or output layer.**

- **Input Layer: An input layer of a neural network is one that is exposed our raw data directly.**

- **Hidden Layer: Hidden layers are components of a neural network that are not directly exposed to either the raw data inputs or the fully processed data; rather, they operate as intermediate layers between the input and output and are thus "hidden" in the sense that they are not on either the very front or the very end of the network.**

- **Output Layer: The very last layer of the network, responsible for producing an output vector appropriate to the problem. For example, an output layer for the MNIST dataset might produce a point-prediction (a number between 0 and 9 that the number is most likely to be) or a vector of probabilities that the image might be each number.**

- **Activation: A function which combines inputs with weights and biases to produce a probability for activation of the relevant neuron.**

- **Backpropagation: An algorithm for updating weights in a neural network. The error in prediction at the end of an epoch is propagated backward through the network (hence "backpropagation") using a combination of matrix multiplication and the chain rule.**


## 2. Perceptron on AND Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [0]:
import numpy as np
class Perceptron(object):
  def __init__(self, rate = 0.01, niter = 10):
    self.rate = rate
    self.niter = niter

  def fit(self, X, y):
    """Fit training data
    X : Training vectors, X.shape : [#samples, #features]
    y : Target values, y.shape : [#samples]
    """

    # weights
    self.weight = np.zeros(1 + X.shape[1])

    # Number of misclassifications
    self.errors = []  # Number of misclassifications

    for i in range(self.niter):
      err = 0
      for xi, target in zip(X, y):
        delta_w = self.rate * (target - self.predict(xi))
        self.weight[1:] += delta_w * xi
        self.weight[0] += delta_w
        err += int(delta_w != 0.0)
      self.errors.append(err)
    return self

  def net_input(self, X):
    """Calculate net input"""
    return np.dot(X, self.weight[1:]) + self.weight[0]

  def predict(self, X):
    """Return class label after unit step"""
    return np.where(self.net_input(X) >= 0, 1, 0)

In [2]:
percept = Perceptron(niter=5000)
X = np.array([[1, 1, 1],
     [1, 0, 1],
     [0, 1, 1],
     [0, 0, 1]], dtype=float)
y = np.array([[1],
     [0],
    [0],
    [0]], dtype=float)
percept.fit(X, y)
print(percept.predict(X))

[1 0 0 0]


In [3]:
percept.predict([
    [1, 1, 0], # Should return 1
    [1, 0, 1], # Should return 0
    [1, 1, 1], # Should return 1
    [0, 0, 0], # Should return 0
])

array([1, 0, 1, 0])

Cool.

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [0]:
heart_url = 'https://raw.githubusercontent.com/will-cotton4/DS-Unit-4-Sprint-2-Neural-Networks/master/processed.cleveland.csv'

import numpy as np
import pandas as pd

cols = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs',
        'restecg', 'thalach', 'exang', 'oldpeak', 'slope',
        'ca', 'thal', 'num']
heart = pd.read_csv(heart_url, names=cols)

def clean_heart(df):
    df = df.copy()
    
    df['disease'] = (df['num'] != 0).astype(int)
    
    df = df.replace({'?':np.nan})
    df = df.fillna(method='ffill')
    return df
heart = clean_heart(heart)

In [0]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

cols = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs',
        'restecg', 'thalach', 'exang', 'oldpeak', 'slope',
        'ca', 'thal']
ss = StandardScaler()

ss.fit(heart[cols])

X = ss.transform(heart[cols])

y = heart['disease'].values.reshape(-1,1)
     

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y)

In [0]:
class NeuralNetwork:
    def __init__(self, inputs=13, hiddenNodes=13, outputNodes=1):
        # Set up Architetecture
        self.inputs = inputs
        self.hiddenNodes = hiddenNodes
        self.outputNodes = outputNodes

        #Initial weights
        self.weights1 = np.random.randn(self.inputs, self.hiddenNodes) #2x3
        self.weights2 = np.random.rand(self.hiddenNodes, self.outputNodes) #3x1

    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))

    def sigmoidPrime(self, s):
        return s * (1 - s)

    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forward.
        """

        #Weighted sume of inputs and hidden layer
        self.hidden_sum = np.dot(X, self.weights1)

        #Acivations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)

        # Weight sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)

        #Final activation of output
        self.activated_output = self.sigmoid(self.output_sum)

        return self.activated_output

    def backward(self, X, y, o):
        """
        Backward propagate through the network
        """
        self.o_error = y - o #error in output
        self.o_delta = self.o_error * self.sigmoidPrime(o) # apply derivative of sigmoid to error

        self.z2_error = self.o_delta.dot(self.weights2.T) # z2 error: how much our hidden layer weights were off
        self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden)

        self.weights1 += X.T.dot(self.z2_delta) #Adjust first set (input => hidden) weights
        self.weights2 += self.activated_hidden.T.dot(self.o_delta) #adjust second set (hidden => output) weights

    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)

In [7]:
from sklearn.metrics import accuracy_score as a_s
nn = NeuralNetwork()

for i in range(1000):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 200 == 0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print("Accuracy: \n", str(a_s(y_train, np.round(nn.feed_forward(X_train)))))
    nn.train(X_train, y_train)

+---------EPOCH 1---------+
Accuracy: 
 0.4581497797356828
+---------EPOCH 2---------+
Accuracy: 
 0.5418502202643172
+---------EPOCH 3---------+
Accuracy: 
 0.5418502202643172
+---------EPOCH 4---------+
Accuracy: 
 0.5418502202643172
+---------EPOCH 5---------+
Accuracy: 
 0.5418502202643172
+---------EPOCH 200---------+
Accuracy: 
 0.9647577092511013
+---------EPOCH 400---------+
Accuracy: 
 0.9647577092511013
+---------EPOCH 600---------+
Accuracy: 
 0.9647577092511013
+---------EPOCH 800---------+
Accuracy: 
 0.9647577092511013
+---------EPOCH 1000---------+
Accuracy: 
 0.9691629955947136


In [8]:
print("Test Accuracy: \n", str(a_s(y_test, np.round(nn.feed_forward(X_test)))))

Test Accuracy: 
 0.8157894736842105


## 4. Keras MMP <a id="Q4"></a>
 
 - Implement a Multilayer Perceptron architecture of your choosing using the Keras library.
 - Train your model and report its baseline accuracy. 


In [9]:
!tensorflow --version

/bin/bash: tensorflow: command not found


In [0]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

model = Sequential()
model.add(Dense(30, input_dim=13, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(25, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=150, verbose=0)

scores = model.evaluate(X_train, y_train)
print("train ", scores)
scores = model.evaluate(X_test, y_test)
print("test ", scores)

 - Then hyperparameter tune at least two parameters and report your model's accuracy.
 - Use an appropriate loss function for a binary classification task
 - Use an appropriate activation function on the final layer of your network.
 - Train your model using verbose output for ease of grading.
 - Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
 - When hyperparameter tuning, show you work by adding code cells for each new experiment.
 - Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
 - You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [0]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

def create_model(optimizer='adam', activation='relu', init_mode='uniform',
                 dropout_rate=0.2, neurons=15, lr=0.01, momentum=0):
  model = Sequential()
  model.add(Dense(neurons, input_dim=13, activation=activation))
  model.add(Dense(25, activation='relu'))
  model.add(Dropout(0.2))
  model.add(Dense(1, activation='sigmoid'))

  model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
  return model

from keras.wrappers.scikit_learn import KerasClassifier

model = KerasClassifier(build_fn=create_model, verbose=1)

param_grid = {"batch_size":[10],
#               "epochs": [20],
#               "optimizer": ['adam', 'SGD'],
              "lr": [0.001, 0.1],
#               "momentum": [0, 0.5],
              "activation": ['relu', 'sigmoid'],
              "init_mode":['uniform', 'glorot_normal'],
              "dropout_rate": [0, 0.2, 0.6],
              "neurons": [30]
}

model.check_params(param_grid)

In [16]:
from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X_train, y_train, epochs=100)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))



Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78



```
Best: 0.823789 using {'activation': 'sigmoid', 'batch_size': 10, 'dropout_rate': 0, 'init_mode': 'uniform', 'lr': 0.001, 'neurons': 30}
```



In [0]:
model = KerasClassifier(build_fn=create_model, verbose=1)

param_grid = {"batch_size":[10, 30, 50],
#               "epochs": [20],
              "optimizer": ['adam', 'SGD'],
              "lr": [0.001, 0.1],
              "momentum": [0, 0.5],
              "activation": ['relu', 'sigmoid'],
              "init_mode":['uniform', 'glorot_normal'],
              "dropout_rate": [0, 0.2, 0.6],
              "neurons": [15, 25, 30]
}

model.check_params(param_grid)

In [0]:
from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X_train, y_train, epochs=100)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

