## Random Initialization & Weights folding/unfolding

### On this notebook, we'll cover tips for NN optimization
1. Random Initialization
2. Weights folding/unfolding

Weights are usually 2D matrix in fully connected neural nets, but since scipy.minimize.optimize is used for
this project, and it only accepts 1D matrix of parameters, neural net's weight matrix needs to be folded / unfolded.


**For NumPy documentation...**
- [numpy.reshape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html#numpy.reshape)
- [numpy.concatenate](https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html)
- [numpy.ndarray.size](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.size.html)
- [numpy.ndarray.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html)

**Let's start with Random Initialization!**

### 1. Random Initialization

- Initialization of $\Theta$ is necessary for optimizaiton algorithm to work. But what if we initialize it as matrix of 0s..???
> Each $\theta$ value would end up having same value, as a result, will not converge!!

- So... we initialize them to a random value in $[-\epsilon, \epsilon]$ 
Mathematically.....
$$ -\epsilon\le\Theta_{ij}^{(l)}\le\epsilon $$

In [31]:
import numpy as np

## params
eps = .5
n = 784
hidden_layer_size = 100
num_labels = 10

r_theta1 = np.random.random((hidden_layer_size, n+1))*2*eps-eps
r_theta2 = np.random.random((num_labels,
                            hidden_layer_size+1))*2*eps-eps

print('initial theta1: \n', r_theta1)
print("\ntheta1: Max, Min: ", r_theta1.max(), r_theta1.min())
print("Theta1, shape: ", r_theta1.shape)

initial theta1: 
 [[-0.36509614  0.19920317  0.38754691 ...,  0.45924069  0.11615955
  -0.35184345]
 [-0.46771081 -0.23989066  0.37161541 ...,  0.0685328  -0.37220195
   0.08807854]
 [ 0.39716566  0.45892626 -0.08816415 ...,  0.09367224  0.28341768
   0.3219025 ]
 ..., 
 [-0.04134174 -0.21449344  0.45503016 ..., -0.29866822  0.43778739
   0.27134899]
 [-0.28086922 -0.32664497  0.23147162 ..., -0.44852144 -0.19954163
   0.20961565]
 [-0.24701615  0.34807752  0.38773301 ..., -0.0452624   0.29700791
   0.06417084]]

theta1: Max, Min:  0.499995723955 -0.499992764652
Theta1, shape:  (100, 785)


You can see that `r_theta1` is in range of..

$$-\epsilon < \Theta_{r1} < \epsilon$$, 

where $\epsilon = 0.5$ here.


<br>
## Weights folding

**Here, we'll go over a method called 'folding/unfolding.'**

Weight folding is a process to 'flatten' matrix, and concatinate $\Theta$ matrices into one flat matrix (or should be called vector maybe??) so that we can feed it into optimization algorithm.

First, $\Theta_1$ has shape of `(100, 785)` as we saw in the last code.
1. _Flattening_
> Here, we use `np.reshape(args)` which change shape of matrix. Note that `np.size` returns the number of elements in the matrix. 
We can use it as one of the arguments for np.reshape, since 'flattening' meaning...
    - (shape of the matrix) => (total number of elements x 1)

2. Concatenating the $\Theta$ s
> After flattening all thetas, we will concatinate them into one vector using `np.concatenate()`. It takes tuple of the matrics as argument.

In [25]:
print('Size of r_theta1 before folding...\n', r_theta1.shape)
print("np.size returns the # of elements...\n", r_theta1.size)

# Let's flatten it!
flattened_theta1 = r_theta1.reshape(r_theta1.size, order='F')
print('flattened theta1...\n', flattened_theta1)
flattened_theta2 = r_theta2.reshape(r_theta2.size, order='F')
print('\nflattened theta2...\n', flattened_theta2)

print('\nflattened theta1, shape...\n', flattened_theta1.shape)
print('flattened theta2, shape:\n', flattened_theta2.shape)

Size of r_theta1 before folding...
 (100, 785)
np.size returns the # of elements...
 78500
flattened theta1...
 [ 0.05302608  0.42610533 -0.19749085 ...,  0.26062377  0.37601568
 -0.45652982]

flattened theta2...
 [-0.15014029 -0.31277123 -0.35601266 ...,  0.10094713  0.43902183
 -0.49759141]

flattened theta1, shape...
 (78500,)
flattened theta2, shape:
 (1010,)


In [26]:
## Let's concatenate them
r_nn_params = np.concatenate((flattened_theta1, flattened_theta2))
print('\nconcatenated theta\n', r_nn_params)
print('\nconcatenated theta, shape:\n', r_nn_params.shape)


concatenated theta
 [ 0.05302608  0.42610533 -0.19749085 ...,  0.10094713  0.43902183
 -0.49759141]

concatenated theta, shape:
 (79510,)


As you can see, shapes of the flattened $\Theta$ s are...
- $\Theta_1$: `(78500,)`
- $\Theta_2$: `(1010,)`

Therefore, the concatenated theta's shape is the sum of these shapes, `(79510,)`

This can be fed into optimization algorithm.



## Weights Unfolding

We use `np.reshape()` method again, to unfold the `nn_param`. 

This method takes argument like `np.reshape(a, newshape, order)`, where
- `a`: array like object (numpy array here)
- `newshape`: new dimension to be reshaped
- `order`: 
    > Read the elements of a using this index order, and place the elements into the reshaped array using this index order. _Numpy documentation_
    
We'll first index the appropriate portions of the nn_param vector to corresponding theta1 and 2. 
Then we'll reshape them into the original shape.
Here, let's take a look at theta1 as our example.
1. indexing:
> Theta1's original shape is `(hidden_layer_size, n+1)`. Therefore, in `nn_param` vector, corresponding index for the theta1 should be at index 0 - (size of theta1), where size of theta1 is <br> hidden_layer_size $\times$ n+1.

2. reshaping:
> We know the original shape of the theta1, so the newshape param is simply..<br> `(hidden_layer_size, n+1)`
       

In [28]:
## 1. indexing
# t1 as indexed flat theta1
t1 = r_nn_params[:hidden_layer_size * (n+1)]
print('flat theta1\n', t1)
print('\nflat theta1 shape: \n', t1.shape)

flat theta1
 [ 0.05302608  0.42610533 -0.19749085 ...,  0.26062377  0.37601568
 -0.45652982]

flat theta1 shape: 
 (78500,)


In [29]:
## 2. Reshape it
theta1 = np.reshape(t1, (hidden_layer_size, n+1), order='F')

print('Reshaped theta1:\n', theta1)
print('\ntheta1 shape: ', theta1.shape)

Reshaped theta1:
 [[ 0.05302608 -0.44979394  0.2860511  ..., -0.46321588  0.14193639
  -0.414402  ]
 [ 0.42610533  0.49394322 -0.47261149 ..., -0.22663834  0.19509747
   0.11949658]
 [-0.19749085  0.08226684 -0.45153108 ...,  0.39725172 -0.18171361
  -0.29573581]
 ..., 
 [ 0.22880737  0.10516532  0.03017331 ...,  0.38462516  0.42151461
   0.26062377]
 [ 0.11692454 -0.20717831 -0.41355435 ...,  0.21901405  0.07787854
   0.37601568]
 [ 0.26405228 -0.18460549 -0.18506799 ...,  0.46598531 -0.40163424
  -0.45652982]]

theta1 shape:  (100, 785)
