<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer: 
The input layer, also known as the visible layer, is what receives input from our dataset. Its aptly named as it's the only layer exposed to/interacts with our data directly. It's worth
noting that typically node maps are drawn with one input node for each of the different inputs/features of our dataset that will be passed to the network. 

### Hidden Layer: 
After the input layer, comes the hidden layer(s). These layers can only be accessed through the input layer, we don't directly interact with them, and there can be multiple of them. They take data from the 
input layer (or other hidden layers) and apply their transformations. What really gives the neural network it's power, the more neurons in the hidden layer the more complex the model will be. 

### Output Layer:
The final part of the network is the output layer. This layer handles displaying the results of the various transformations generated by the rest of the network. Regression problems usually have a single output node, since we want an unbounded continuous value. Binary classification problems will generate only one output node, with 1 being the target class and 0 the negative class. With multi-class classification problems you're predicting multiple targets, which in turn leads to multiple output nodes.

### Neuron:
Type of cell found in the human brain, in machine learning correlates to a 'node', in which nodes try to replicate the job a neuron. Neural networks (basically the brain) are made of nodes (nuerons) that functionally and structurely frame the network. 

### Weight:
Weights are used to connect each neuron in one layer to the every neuron in the next layer(the lines connecting nodes in the visual below). Weight determines the strength of the connection of the neurons. If we increase the input then how much influence does it have on the output. Weights near zero mean changing this input will not change the output. Many algorithms will automatically set those weights to zero in order to simplify the network.

### Activation Function:
"Squishifies" the probability of generated outputs to be between 0 and 1. Different activation functions can be used in different layers, but for simpicity usally the same activation function is used. Very rarely do you use different activation functions in the same layer, because that would affect back-propegation. One example we've been using is the 'sigmoid' activation function. 

### Node Map:
A visual diagram, hopefully color-coded, outlining the structure of a neural network, input(yellow) -> hidden(green) -> output(orange). Think of it like a flow-chart showing all of the paths from inputs to outputs. Examples below:
![Neural Network Zoo](http://www.asimovinstitute.org/wp-content/uploads/2016/09/neuralnetworks.png)

### Perceptron:
A very basic neural network made up of two input nodes feeding through to an output node. First example neural network in the picture above provides a visual. 

## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

#### Your Answer Here

The neural network is fed information first into the input layer, then through the hidden layer (if it exists), and finally produces a result in the output layer. During each layer, the model applies transformations (bias, weights, activation function) that it thinks will best improve the accuracy. A full cycle of feeding this transformed data from input to output is called an epoch. After each epoch, the model updates its transformations and re-applies these updates continually to optimize accuracy. 

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [7]:
# Establish training data
import pandas as pd
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

# Instantiate df with data 
df = pd.DataFrame.from_dict(data).astype('int')

# First column of outputs
correct_outputs = [[1], [1], [1], [0]]
 
# Convert inputs to numpy array for Perceptron    
inputs = df.to_numpy()
inputs

array([[0, 0, 1],
       [1, 0, 1],
       [0, 1, 1],
       [1, 1, 0]])

In [8]:
# Sigmoid activation function and its derivative for updating weights

def sigmoid(x):
  return 1 / (1 + np.exp(-x))

def sigmoid_derivate(x):
  sx = sigmoid(x)
  return sx * (1-sx)

In [9]:
import numpy as np
# Assign random weights to be used in model (3 inputs)
weights = 2 * np.random.random((3,1)) - 1
weights

array([[-0.55803332],
       [-0.00519374],
       [ 0.02584065]])

In [10]:
# Borrow some code from class
# Update our weights 10,000 times - (fingers crossed that this process reduces error)
for iteration in range(10000):
    
    # Weighted sum of inputs / weights
    weighted_sum = np.dot(inputs, weights)
    
    # Activate!
    activated_output = sigmoid(weighted_sum)
    
    # Cac error
    error = correct_outputs - activated_output
    
    # Can add in learning rate for better performance.
    adjustments = error * sigmoid_derivate(weighted_sum)
    
    # Update the Weights
    weights += np.dot(inputs.T, adjustments)
    
print("Weights after training")
print(weights)

print("Output after training")
print(activated_output)

Weights after training
[[-2.41140995]
 [-2.40913951]
 [ 7.4903618 ]]
Output after training
[[0.99944183]
 [0.99381178]
 [0.99382573]
 [0.00799828]]


## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [11]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [16]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

feats = list(diabetes)[:-1]

# Instantiate features
X = diabetes[feats]

# Instantiate scaler
scaler = MinMaxScaler()

# Fit scaler on features and transform
X_scaled = scaler.fit_transform(X)

# Put features in np array for Perceptron
inputs = np.array(X_scaled)

# Turn target into numpy array for model 
y = np.array([[r] for r in diabetes['Outcome']])

print(X.shape, y.shape)

(768, 8) (768, 1)


In [19]:
##### Update this Class #####

class Perceptron:
    
    def __init__(self, rate = 0.1, niter = 10):
        self.rate = rate
        self.niter = niter
    
    def __sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def __sigmoid_derivative(self, x):
        sx = sigmoid(x)
        return sx * (1-sx)

    def fit(self, X, y):
        """Fit training data
        X : Training vectors, X.shape : [#samples, #features]
        y : Target values, y.shape : [#samples]
        """
        
        # Initialize random weights (assign weights of 0 for length  of features + 1(bias))
        self.weights = np.zeros(1 + X.shape[1])
        
        for i in range(self.niter):
            # Weighted sum of inputs / weights
            self.weighted_sum = np.dot(inputs, self.weights)
            # Activate!
            self.activated_output = self.sigmoid(weighted_sum)
            # Calc error
            self.error = y - self.activated_output
            # Update the Weights
            self.adjustments = error * self.sigmoid_derivate(self.weighted_sum)
            weights += np.dot(X_scaled.T, self.adjustments)
            
        return self
    
    def predict(self, X):
        """Return class label after unit step"""
        return np.where((np.dot(X, weights[1:]) + weights[0]) >= 0.0, 1, -1)

In [20]:
p = Perceptron()
p.fit(X_scaled, y)
print(p.predict(X_scaled)[:10])
print(y[:10])

ValueError: shapes (768,8) and (9,) not aligned: 8 (dim 1) != 9 (dim 0)

---
I've re-written this code so many times and for the life of me can not get this class to work. For the sake of time, and my sanity, i'm going to move onto the next assignment but i'm extremely frustrated that I can not solve this problem. 

---

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?