# About

Notes following [Sentdex's Neural Networks from Scratch in Python video series](https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3) in an attempt to properly learn/reinforce the basics of ML and neural net structure.

# Conceptual Intro

## A Singular Neuron (Node)
[Video](https://www.youtube.com/watch?v=Wo5dMEP_BbI)

This is a simple demonstration of a neuron.

### Context
A neural network is made up of many layers, each of which is made up of neurons.

### Neuron Components
- 3 Inputs (From previous layer of neurons)
- 3 Weights (From synapses connecting to prevoius layers; modifies inputs)
- Bias (A singular value added to the neuron's output)

In [8]:
inputs = [4.4, 3.2, 2.8]
weights = [5.8, 2.7, 3.1]
bias = 2
output = inputs[0]*weights[0] + inputs[1]*weights[1] + inputs[2]*weights[2]
print(output)

42.839999999999996


## A Layer of Neurons
[Video](https://www.youtube.com/watch?v=lGLto9Xd7bU)

This is a rudimentary demonstration of an output layer. 

### Components
- 3 neurons, each of which has 4 inputs and 1 output.

### More Info
- The inputs are  the same for all 3 nodes, as each node would theoretically take inputs from all nodes of the previous layer.
- Since this is a layer of 3 neurons, the output consists of 3 numbers.

In [58]:
inputs = [4.4, 3.2, -2.8, 5.1]

node1_weights = [5.8, 2.7, 3.1, 4.2]
node1_bias = 2

node2_weights = [-0.3, 2.3, -3.0, 4.2]
node2_bias = 8

node3_weights = [0.1, 6.1, 4.8, -3.2]
node3_bias = 3

layer_outputs = [
    inputs[0]*node1_weights[0] + inputs[1]*node1_weights[1] + inputs[2]*node1_weights[2] + inputs[3]*node1_weights[3] + node1_bias,
    inputs[0]*node2_weights[0] + inputs[1]*node2_weights[1] + inputs[2]*node2_weights[2] + inputs[3]*node2_weights[3] + node2_bias,
    inputs[0]*node3_weights[0] + inputs[1]*node3_weights[1] + inputs[2]*node3_weights[2] + inputs[3]*node3_weights[3] + node3_bias
]
print(layer_outputs)

[48.89999999999999, 43.86, -6.799999999999999]


### Code Example
Reimplement the layer code above by grouping weights/biases into arrays and calculating output by interating through them:

In [8]:
inputs = [4.4, 3.2, -2.8, 5.1]            # same for all nodes

layer_weights = [[5.8, 2.7, 3.1, 4.2],    # node 1
                 [-0.3, 2.3, -3.0, 4.2],  # node 2
                 [0.1, 6.1, 4.8, -3.2]]   # node 3

biases = [2, 8, 3]

layer_outputs = []

# combine/zip weights & biases into a single list of tuples, which we read from
for neuron_weights, neuron_bias in zip(layer_weights, biases):
    neuron_output = 0
    for neuron_input, neuron_weight in zip(inputs, neuron_weights):
        neuron_output += neuron_input*neuron_weight
    neuron_output += neuron_bias
    layer_outputs.append(neuron_output)
    
print(layer_outputs)

[48.89999999999999, 43.86, -6.799999999999999]


# Math Intro
[Video](https://www.youtube.com/watch?v=TEWy9vZcxW4)

### DEFINITION: Shape 
The size of the array in each dimension.

| Array                          | Shape    | Type             |
|:-------------------------------|:---------|:-----------------|
| `[1, 2, 3, 4]`                 | `(4)`    | 1D Array, Vector |
| `[[1, 2, 3, 4], [2, 5, 3, 1]]` | `(4, 2)` | 2D Array, Matrix |

### Code Example
Use vector dot products to further simply the process from the Neuron demonstration:

In [28]:
import numpy as np

inputs = [1, 3, 5, 2]
weights = [0.2, 0.8, 0.6, -0.1]
bias = 1.8

# Dot product order doesn't matter since inputs & weight are the same size
output = np.dot(inputs, weights) + bias

print(output)

7.2


### Code Example
Use vector dot products to further simply the process from the Layers demonstration:

In [27]:
import numpy as np

inputs = [1, 3, 5, 2]
layer_weights = [[5.8, 2.7, 3.1, 4.2],
                 [-0.3, 2.3, -3.0, 4.2], 
                 [0.1, 6.1, 4.8, -3.2]]
biases = [2, 8, 3]

# Dot product order of layer weights & inputs matters because the two have difference dimensions!
# In a typical machine learning library, incorrect order would result in a shape error.
output = np.dot(layer_weights, inputs) + biases

print(output)

[39.8  8.  39. ]


# Batches/Layers/Objects
[Video](https://www.youtube.com/watch?v=TEWy9vZcxW4)

### DEFINITION: Batch
A batch is a collection of inputs (samples). Batching data allows us to show the network multiple samples at a time, allowing it to generalize more effectively.

- When batch size is **too small**, it takes a long time to parse the entire dataset.
- When batch size is **too big**, the model's ability to generalize is severely impacted.
- Typical batch size is between 32 and 64.


### Code Example
Modify the previous example with batches.

In [15]:
import numpy as np

inputs = [[1, 3, 5, 2],
          [2, 0.7, -4, 1],
          [-3, 6, 3, -9]]

# Weights & Biases don't change, as the neuron layer stays the same
layer_weights = [[5.8, 2.7, 3.1, 4.2],
                 [-0.3, 2.3, -3.0, 4.2], 
                 [0.1, 6.1, 4.8, -3.2]]
biases = [2, 8, 3]

# We can't take the dot product of two 3x4 matrices, so we need to tranpose the inputs for the dot product to work.
# The end result for the calculation is the same as multiplying every input by every weight.
output = np.dot(layer_weights, np.array(inputs).transpose()) + biases

print(output)

[[ 39.8   13.29 -26.7 ]
 [  2.    25.21 -29.1 ]
 [ 38.    -9.93  82.5 ]]


### CONCEPT/EXAMPLE: Multi-Layer Network
Neural networks are made up of multiple layers, consisting of an *input layer* (X), a series of *hidden layers*, and an *output layer*.

### More Info
In this new neural network:
- Weights are initialized randomly between -1 and 1. Keeping these values small means that we're not making the data values bigger and bigger as they pass through the neural network. This saves memory and consequently prevents stack overflow errors.
- We should avoid weights of 0, as they can result in outputs with values of 0 and cause a "dead network". Hint: Initialize biases as non-zero values.
- This network is made up of *dense layers*, or fully-connected layers.

In [1]:
import numpy as np
np.random.seed(12345)

# Input data consists of 3 samples.
X = [[1, 3, 5, 2],
     [2, 0.7, -4, 1],
     [-3, 6, 3, -9]]

class DenseLayer:
    
    def __init__(self, num_inputs, num_neurons):
        # Randomly generate 2D array of weights (rows, columns). 
        # The dimensions are already flipped, removing the need to transpose.
        self.weights = 0.10 * np.random.randn(num_inputs, num_neurons)
        # Biases are a vector of zeroes, `num_neurons` in length
        self.biases = np.zeros((1, num_neurons))
        
    def forward(self, inputs):
        self.output = np.dot(inputs, self.weights) + self.biases
    
layer1 = DenseLayer(4, 5)
layer2 = DenseLayer(5, 2)

layer1.forward(X)
# layer 1's output becomes layer 2's input
layer2.forward(layer1.output)
print(layer2.output)

[[-0.36077223  0.34039262]
 [ 0.16444803 -0.07122948]
 [ 0.34158758 -0.02024933]]


# Activation Functions
[Video](https://www.youtube.com/watch?v=gmjzbpSVY1A)

An activation function defines the neural network's output, which has to be simple enough to actually act upon. For example, outputting an image rating from 0 to 1 is simpler than outputting said image rating as a vector or 2D array.


### Unit Step Function
The unit step function is a step function where values x-below 0 will result in an output of 0 and x-values of above 0 will result in an output of 1. This allows complex neural networks to output very simple binary data.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Dirac_distribution_CDF.svg/1280px-Dirac_distribution_CDF.svg.png" width=400>

### Sigmoid Funtion
Using a sigmoid function allows for more granular output than the unit step function, as it outputs a decimal that can be anywhere between -1 and 1, depending on X.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Logistic-curve.svg/1200px-Logistic-curve.svg.png" width=400>

### Rectified Linear Unit (ReLU)
When input is less than 0, output is 0, and when the input is greater than 0, the output is directly proportional to the input. Here, the output can still be granular, but unlike with the sigmoid function, it can't be less than 0.

<img src="https://sebastianraschka.com/images/faq/relu-derivative/relu_3.png" width=400>

#### Why use ReLU over Sigmoid?
- **It's fast.** Since it's linear, it's much faster to calculate than Sigmoid.
- **It's simple.** Being linear and not allowing for values less than 0 makes ReLU easier to incorporate into optimization algorithms.

#### Why not use *y=x* as an activation function instead?
Since it includes all values below 0 as well, it can only approximate linear data (or rather, it approximates all data with a line).

#### Features & Usage
- Changing the weights (multiplication) changes the slope.
- Changing the biases (addition) changes the activation point, translating the graph left/right.

### Code Example
Implement a simple ReLU example. The fundamental result is that everything 0-and-below gets clipped.

In [18]:
import numpy as np

inputs = [0, 1, -5, 3, 6, 2, -1, 7]
output = []

for i in inputs:
    output.append(max(0, i))
#   NOTE: SAME RESULT AS...
#   if i > 0:
#       output.append(i)
#   else:
#       output.append(0)
    
print(output)

[0, 1, 0, 3, 6, 2, 0, 7]


### Code Example
Use the ReLU activation function to modify the output of a neuron layer.

In [17]:
import numpy as np
import matplotlib as plt

# Sentdex's custom dataset.
# Based on # https://cs231n.github.io/neural-networks-case-study/
from nnfs.datasets import spiral_data 

np.random.seed(12345)

# X denotes inputs, y denotes targets/classification
X, y = spiral_data(100, 3) # 100 sets of 3 classes each

class DenseLayer:
    
    def __init__(self, num_inputs, num_neurons):
        # Randomly generate 2D array of weights (rows, columns)
        self.weights = 0.10 * np.random.randn(num_inputs, num_neurons)
        # Biases are a vector of zeroes, `num_neurons` in length
        self.biases = np.zeros((1, num_neurons))
        
    def forward(self, inputs):
        self.output = np.dot(inputs, self.weights) + self.biases

class ReLU():
    def forward(self, inputs):
        self.output = np.maximum(0, inputs) # clip values of 0 and below

layer1 = DenseLayer(2, 5)
activation1 = ReLU()

layer1.forward(X)
activation1.forward(layer1.output) # transform output with activation function

print("LAYER 1 OUTPUT\n{}".format(layer1.output))
print("ACTIVATION FUNCTION OUTPUT\n{}".format(activation1.output))

LAYER 1 OUTPUT
[[ 0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00]
 [ 7.69311704e-04  7.84235811e-04 -2.28373517e-03 -9.87815185e-04
  -3.27828448e-04]
 [ 1.77620579e-04  2.83317359e-03 -4.52324829e-03 -1.80507772e-03
  -1.01027115e-03]
 ...
 [-1.27651001e-01  1.87989044e-01 -1.00373679e-01 -2.52528307e-02
  -5.77054642e-02]
 [-1.73879868e-01  1.20034589e-01  6.82417861e-02  4.64914810e-02
  -3.06659630e-02]
 [-1.34910633e-01  1.90738547e-01 -9.41162688e-02 -2.19667053e-02
  -5.81886685e-02]]
ACTIVATION FUNCTION OUTPUT
[[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00]
 [7.69311704e-04 7.84235811e-04 0.00000000e+00 0.00000000e+00
  0.00000000e+00]
 [1.77620579e-04 2.83317359e-03 0.00000000e+00 0.00000000e+00
  0.00000000e+00]
 ...
 [0.00000000e+00 1.87989044e-01 0.00000000e+00 0.00000000e+00
  0.00000000e+00]
 [0.00000000e+00 1.20034589e-01 6.82417861e-02 4.64914810e-02
  0.00000000e+00]
 [0.00000000e+00 1.90738547e-01 0.000