### Matrix Multiplication
Taking the dot product, you multiply the first elements with each other, then the second, the third and then add them up. e.g.  

$\begin{bmatrix} 1 \\ 3 \end{bmatrix} \cdot \begin{bmatrix} 2 \\ 4 \end{bmatrix} = (1*3)+(2*4) = 3+8 = 11$  
Transpose:  Takes a column vector and transforms it to a row vector

$\vec{a} = \begin{bmatrix} 1\\2 \end{bmatrix} = \vec{a}^T = \begin{bmatrix} 1&&2 \end{bmatrix}$
e.g. multiplying $\vec{a}^T =\begin{bmatrix} 1&&2 \end{bmatrix}   \vec{w} = \begin{bmatrix} 2 \\ 4 \end{bmatrix}$  

***Is the same as taking the dot product.***
**Vector Matrix multiplication**:  

$\vec{a}^T = \begin{bmatrix} 1&&2 \end{bmatrix}$  

$W = \begin{bmatrix} 3&&5 \\ 4&&6 \end{bmatrix}$  

Then calculating: $Z = \vec{a}^T W$
$Z = \begin{bmatrix}\vec{a}^T \vec{w_1} && \vec{a}^T \vec{w_2}\end{bmatrix}$ which is $Z = [(1*3)+(2*4)\; (1*5)+(2*6)] = [11 \; 17]$
**Matrix Matrix Multiplication**:
$A = \begin{bmatrix}1&&-1 \\ 2&&-2\end{bmatrix}$  

$A^T = \begin{bmatrix}1&&2 \\ -1&&-2\end{bmatrix}$  
**How to make the transpose**? The first column, becomes the first row, and the second column becomes the second row.
We also have: $W = \begin{bmatrix} 3&&5 \\ 4&&6 \end{bmatrix}$  
$Z = A^TW = \begin{bmatrix} Row(a1)Col(w1)&&Row(a1)Col(w2) \\ Row(a2)Col(w1)&&Row(a2)Col(w2) \end{bmatrix}$  

$Z = \begin{bmatrix} 11&&17 \\ -11&&-17 \end{bmatrix}$

### Matrix Multiplication Rules
Think of the columns of each matrix as a vector.  

If you take the $Z = A^TW$.  

Think of it as each row of $A^T$ corresponds to the row of the result and each column of $W$ corresponds to each column of the result.  

**A requirement**: 3x2 matrix can only be multiplied with 2xN matrix. The result will be a 3xN matrix.  

**Why?** because you can only take dot products of vectors that are the same length! Therefore the columns of $A^T$ must be the same length as the rows of $W$.

### What is a Derivative (Backpropagation)

A simple cost function: $J(w) = w^2$  

If $w = 3$ then you have that $J(w) = 9$. If we had to increase $w$ with 0.001 then the answer becomes: $J(w) = 9.006001$

Turns out that if you increase $w$ with a infitesimaly small amount, $w$ increases by **6 times** as much. Therefore we can say that the derivative of $J(w) = 6$.

**The informal definition of a derivative**: *If $w$ goes up by $\epsilon$ causes $J(w)$ goes up by $k*\epsilon$ then the derivative of $J(w) = k$*

**The formal definition**: $\frac{\partial}{\partial w}J(w) = 2*w$ where $J(w) = w^2$

In [12]:
import sympy

J,w = sympy.symbols('J,w')
J = 1/w
J

1/w

In [13]:
dJ_dw = sympy.diff(J,w)
dJ_dw

-1/w**2

In [14]:
dJ_dw.subs([(w,2)])

-1/4