# V Unrolling parameters

Implementational detail about unrolling our parameters from matrices into <code>**suitable vectors**</code> which we need in order to use advanced optimaztion routines.

<code>**MATLAB/OCTAVE**</code>

In [None]:
function [jVal, gradient] = costFunction(theta)
...

optTheta = fminunc(@costFunction, initialTheta, options)

Passed to an advanced optimization algorithm like <code>**fminunc**</code>, fminunc isn't the only one, there are also other advanced optimization algorithms. But what all of them do is : takes as input a <code>**defined cost function**</code> and some <code>**initial network parameters values**</code>. And these routines assume that theta values are <code>**parameter vectors**</code> 

\begin{multline*}
\Theta \in \mathbb{R}^{n} \ or \ \ \Theta \in \mathbb{R}^{n+1}
\end{multline*}


It also assumes that our cost function returns value of gradient whitc is also a parameter vector.
This worked fine when we used logistic regression but since we start using NN our <code>**parameters are no longer vectors but instead matrices**</code>, where for layer network we would have:

<code>**NN (L=4)**</code>

\begin{multline*}
\Theta^{(1)}, \Theta^{(2)}, \Theta^{(3)} \ - \text{matrices: Θ1, Θ2, Θ3}
\end{multline*}

\begin{multline*}
D^{(1)}, D^{(2)}, D^{(3)} \ - \text{matrices: D1, D2, D3}
\end{multline*}

We have to have a way somehow to easily <code>**unroll these**</code> matrices into vectors and end up in a suitable format.

Lets say we have NN of this architecture:

\begin{multline*}
s_1 = 10, s_2 = 10, s_3 = 1
\end{multline*}

In this case the dimension of our matrices  <code>**Θ, 𝐷**</code> are going to given by these layer unit expression above.<br>

<code>**Unroll into vectors**</code>

\begin{multline*}
\Theta^{(1)} \in \mathbb{R}^{10 \ \times 11}, \ \Theta^{(2)} \in \mathbb{R}^{10 \ \times 11}, \Theta^{(3)} \in \mathbb{R}^{1 \ \times 11}
\end{multline*}

\begin{multline*}
D^{(1)} \in \mathbb{R}^{10 \ \times 11}, \ D^{(2)} \in \mathbb{R}^{10 \ \times 11}, D^{(3)} \in \mathbb{R}^{1 \ \times 11}
\end{multline*}

<code>**MATLAB/OCTAVE**</code>

One way to pull all matrices in to one long vector by unrolling them.

In [None]:
thetaVec = [Theta1(:), Theta2(:), Theta3(:)];
DVec     = [D1(:), D2(:), D3(:)];

And go back from vectors to matrices, by pulling out coresponding number of elements from vector.

In [None]:
Theta1 = reshape(thetaVec(1:110), 10, 11)
Theta2 = reshape(thetaVec(110:220), 10, 11)
Theta2 = reshape(thetaVec(220:231), 1, 11)

<code>**Implementing learning algorithm**</code>
  * We have some initial values of parameters Θ(1), Θ(2), Θ(3)
  * Unroll into a long vector to get initial thetas to pass them to <code>**fminunc**</code>

In [None]:
fminunc(@costFunction, initialTheta, options)

<code>**Implementing the cost function**</code>
  * From <code>**thetaVec**</code> get <code>**Θ(1), Θ(2), Θ(3)**</code>
  * Use forward prop/back prop to compute  <code>**D(1), D(2), D(3) and J(Θ)**</code> 
  * Unroll <code>**D(1), D(2), D(3)**</code> to get <code>**gradientVec**</code>

In [None]:
function [jVal, gradient] = costFunction(theta)

Convertion between matrix representation of the parameters versus the vector representation of the parameters. The advantage of the matrix representation is that when our parameters are stored as matrix it is more convenient when we do <code>**forward propagation and back propagation**</code>, when our parameters are stored liked that we can take advantage of <code>**vectorized implementations**</code>. Whereas in contrast the advantage of the vector representation when we use advanced optimization algorithms, those algorithms tend to assume that we have all of our parameters <code>**unrolled into a big long vector**</code>. Practice and use tools to convert between the two as needed.