### Dependencies

In [82]:
import numpy as np

## Design `Network` class

#### Notes:
- must inherit the `object` class when creating the `Network` class (required for Python 2.x) [[source](https://stackoverflow.com/questions/4015417/python-class-inherits-object)]
  - replace `class Network:` with `class Network(object):` if using Python 2.x

In [75]:
class Network:
    def __init__(self, sizes):
        self.sizes = sizes
        self.num_layers = len(sizes)
        self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        
    def sigmoid(self, z):
        return 1.0 / (1.0 + np.exp(-z))
    
    def feed_forward(self, a):
        for w, b in zip(self.weights, self.biases):
            a = self.sigmoid(np.dot(w, a) + b)
        return a
    
    # mb_SGD = mini-batch stochastic gradient descent
    def mb_SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
        """
        Train the neural network using mini-batch stochastic
        gradient descent.  
        
        The "training_data" is a list of tuples
        "(x, y)" representing the training inputs and the desired
        outputs.
        
        The other non-optional parameters are
        self-explanatory.
        
        If "test_data" is provided then the
        network will be evaluated against the test data after each
        epoch, and partial progress printed out. This is useful for
        tracking progress, but slows things down substantially.
        """
        if test_data: n_test = len(test_data)
        n = len(training_data)
        for j in xrange(epochs):
            random.shuffle(training_data)
            mini_batches = [
                training_data[k:k+mini_batch_size]
                for k in xrange(0, n, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            if test_data:
                print "Epoch {0}: {1} / {2}".format(
                    j, self.evaluate(test_data), n_test)
            else:
                print "Epoch {0} complete".format(j)

## Instantiate the Network - example network with `sizes = [2, 3, 1]`
- **input layer**: 2 neurons
- **hidden layer**: 3 neurons
- **output layer**: 1 neuron

In [76]:
nn = Network([2, 3, 1])
print(nn.feed_forward)

<bound method Network.feed_forward of <__main__.Network object at 0x1087ef3c8>>


### `sizes`
- a 1d array containing an integer number of neurons per layer
- **`sizes[:-1]`**: 1d array representing the number of neurons connected between the **input** and **hidden** layers
- **`sizes[1:]`**: 1d array representing the number of neurons connected between the **hidden** and **output** layers

In [77]:
print('sizes [:-1]: ', nn.sizes[:-1])
print('sizes  [1:]: ', nn.sizes[1:])

sizes [:-1]:  [2, 3]
sizes  [1:]:  [3, 1]


### `num_layers`
- the number of layers in the neural network

In [78]:
print('number of layers in the neural network: {}'.format(nn.num_layers))
print('input layer: {} neurons'.format(nn.sizes[0]))
print ('hidden layer(s): {} total neurons'.format(sum(nn.sizes[1:-1])))
print('output layer: {} neurons'.format(nn.sizes[-1]))

number of layers in the neural network: 3
input layer: 2 neurons
hidden layer(s): 3 total neurons
output layer: 1 neurons


### `weights`
- **note:** the input layer does not have any weights associated with it as they are simple input values
- each element is an `nparray` holding the weights associated with the inputs of the previous layer into the next layer
- prepared for stochastic gradient descent by randomly assigning Gaussian distributed values as a starting point

In [79]:
print('input layer [2 neurons] -> hidden layer [3 neurons] weights - 2x3 matrix:\n{}\n'.format(nn.weights[0]))
print('input layer neurons [2 neurons] weights feeding hidden layer neuron 1: {}'.format(nn.weights[0][0]))
print('input layer neurons [2 neurons] weights feeding hidden layer neuron 2: {}'.format(nn.weights[0][1]))
print('input layer neurons [2 neurons] weights feeding hidden layer neuron 2: {}\n'.format(nn.weights[0][1]))
print('hidden layer neurons [3 neurons] -> output layer [1 neuron] weights - 1x3 matrix:\n{}'.format(nn.weights[1]))

input layer [2 neurons] -> hidden layer [3 neurons] weights - 2x3 matrix:
[[-0.11581554  0.94516819]
 [ 0.12787295  0.6695134 ]
 [ 2.13255741 -0.68516319]]

input layer neurons [2 neurons] weights feeding hidden layer neuron 1: [-0.11581554  0.94516819]
input layer neurons [2 neurons] weights feeding hidden layer neuron 2: [ 0.12787295  0.6695134 ]
input layer neurons [2 neurons] weights feeding hidden layer neuron 2: [ 0.12787295  0.6695134 ]

hidden layer neurons [3 neurons] -> output layer [1 neuron] weights - 1x3 matrix:
[[ 0.39929956  0.6893374  -0.28006016]]


### `biases`
- **note:** the input layer does not have any biases associated with its neurons as they are simple input values
- each element is an `nparray` holding the biases associated with the neurons in that layer
- prepared for stochastic gradient descent by randomly assigning Gaussian distributed values as a starting point

In [80]:
print('2d nparray of biases [hidden and output layers]:\n{}\n'.format(nn.biases))
print('hidden layer [3 neurons] biases:\n{}\n'.format(nn.biases[0]))
print('bias of neuron 1 in the hidden layer:{}\n'.format(nn.biases[0][0]))
print('output layer [1 neuron] bias:{}\n'.format(nn.biases[-1]))

2d nparray of biases [hidden and output layers]:
[array([[-0.14699631],
       [ 0.23296194],
       [ 0.67244771]]), array([[-0.71364298]])]

hidden layer [3 neurons] biases:
[[-0.14699631]
 [ 0.23296194]
 [ 0.67244771]]

bias of neuron 1 in the hidden layer:[-0.14699631]

output layer [1 neuron] bias:[[-0.71364298]]



## The `Sigmoid[σ]`  method
- **note**: this method is defined in the class and explored in this section
- accepts a numpy vector `z` and applies the `sigmoid function` element-wise
- the `sigmoid function` is used to calculate the `activation vector` output of each neuron

```
def sigmoid(self, z):
    return 1.0 / (1.0 + np.exp(-z))
```

## Activation vector: **`a′= σ(w·a + b)`**
- **`a'`** --> activation vector (neuron output / input to neuron in the next layer)
-  **`σ`** --> sigmoid function
- **`w`** --> matrix of weights
- **`a`** --> matrix of activation vectors
- **`b`** --> matrix of biases

## The `feedforward` method
- **note**: this method is defined in the class and explored in this section
```
def feed_forward(self, a):
    for w, b in zip(self.weights, self.biases):
        a = self.sigmoid(np.dot(w, a) + b)
    return a
```

### QUESTIONS
- **input**: `a` --> activation vector from the previous layer
- **output**: `a'` --> activation vector output of the current layer
- isnt the argument `a` being overwritten in each iteration of the loop?


### Mini-batch Stochastic Gradient Descent

<img src="https://media.giphy.com/media/aZ3LDBs1ExsE8/giphy.gif">'