# Matrix representations

## Matrix representations - input to the network

Suppose an input has $d_{i}$ dimensions. (Remember that the input has been normalized to range between 0 and 1.)

Then each input would be:

$$X \; (without bias) _{1{\times}d_{i}} = \left[ \begin{array}{c} x_{0} & x_{1} & \cdots & x_{(d_{i}-1)} \end{array} \right] _{1{\times}d_{i}}$$

After adding the bias term,

$$X_{1{\times}(d_{i}+1)} = \left[ \begin{array}{c} 1 & X_{1{\times}d_{i}} \end{array} \right] _{1{\times}(d_{i}+1)}$$

For example, one of the data points given above to make a logic gate was $(0,1)$. Here, $X = \left[ \begin{array}{c} 1 & 0 & 1 \end{array} \right]_{1{\times}(2+1)}$

Suppose we provide $n$ $d_{i}$-dimensional data points. For the first layer of neurons, we can make an input matrix of $n{\times}d_{i}$ dimension.

$$X^{(1)}_{n{\times}(d_{i}+1)} = 
\left[ \begin{array}{c} 1 & _{(0)}X \\ 1 & _{(1)}X \\ \vdots & \vdots \\ 1 & _{(n-1)}X \end{array} \right] _{n{\times}(d_{i}+1)}
=
\left[ \begin{array}{c} 
1 & _{(0)}x_{0} & _{(0)}x_{1} & _{(0)}x_{2} & \cdots & _{(0)}x_{(d_{i}-1)} \\ 
1 & _{(1)}x_{0} & _{(1)}x_{1} & _{(1)}x_{2} & \cdots & _{(1)}x_{(d_{i}-1)} \\ 
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 
1 & _{(n-1)}x_{0} & _{(n-1)}x_{1} & _{(n-1)}x_{2} & \cdots & _{(n-1)}x_{(d_{i}-1)} 
\end{array} \right] _{n{\times}(d_{i}+1)}$$

For example, for logic gates, the input matrix was $X = \left[ \begin{array}{c} 1 & 0 & 0 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 1 & 1 \end{array} \right] _{4{\times}3} $

## Matrix representations - output of a layer

Suppose the output of the $l^{th}$ layer has $o_{l}$ dimensions, meaning there are $o_{l}$ neurons in the layer.

In the above example, the output of the 1st Layer of 2 neurons is $o_{1} = 2$, and the output of the 2nd layer of 1 neuron is $o_{2} = 1$.

For each input, the output is an $o_{l}$-dimensional vector:

$$Y^{(l)} = \left[ \begin{array}{c} y_{[0]}^{(l)} & y_{[1]}^{(l)} & \cdots & y_{[o_{l}-1]}^{(l)} \end{array} \right] _{1{\times}o_{l}}$$


For example, for an AND gate, the output of $(0,1)$ is $Y = \left[ \begin{array}{c} 0 \end{array} \right] _{1{\times}1}$

Thus, for $n$ data points, the output is:

$$Y^{(l)} = \left[ \begin{array}{c} 
{_{(0)}}Y^{(l)} \\ {_{(1)}}Y^{(l)} \\ \vdots \\ _{(n-1)}Y^{(l)} \end{array} \right] _{n{\times}o_{l}} 
= \left[ \begin{array}{c} 
{_{(0)}}y_{[0]}^{(l)} & \cdots & {_{(0)}}y_{[o_{l}-1]}^{(l)} \\ 
{_{(1)}}y_{[0]}^{(l)} & \cdots & {_{(1)}}y_{[o_{l}-1]}^{(l)} \\ 
\vdots & \ddots & \vdots \\ 
_{(n-1)}y_{[0]}^{(l)} & \cdots & _{(n-1)}y_{[o_{l}-1]}^{(l)} 
\end{array} \right] _{n{\times}o_{l}}$$

For example, for an AND gate, the output matrix is $Y = \left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \right] _{4{\times}1}$

## Matrix representations - input to a layer

Suppose at the $l^{th}$ layer, the input has $i_{l}$ dimensions.

(The number of inputs to the layer) = (1 bias term) + (the number of outputs from the previous layer):
$$i_{l} = 1 + o_{(l-1)}$$

In the above example, the input to the first layer of 2 neurons has $i_{1} = d_{i}+1 = 3$, and the second layer of 1 neuron has $i_{2} = o_{1} + 1 = 3$.

If there are $n$ data points given, the input to the $l^{th}$ layer would be an $n{\times}i_{l} = n{\times}(o_{(l-1)}+1)$ matrix:

$$X^{(l)}_{n{\times}i_{l}} 
= \left[ \begin{array}{c} 
1 & _{(0)}Y^{(l-1)} \\ 
1 & _{(1)}Y^{(l-1)} \\ 
\vdots & \vdots \\  
1 & _{(n-1)}Y^{(l-1)} 
\end{array} \right] _{n{\times}i_{l}}
= \left[ \begin{array}{c} 
1 & _{(0)}y^{(l-1)}_{[0]} & \cdots & _{(0)}y^{(l-1)}_{[o_{l-1}-1]} \\ 
1 & _{(1)}y^{(l-1)}_{[0]} & \cdots & _{(1)}y^{(l-1)}_{[o_{l-1}-1]} \\ 
\vdots & \vdots & \ddots & \vdots \\ 
1 & _{(n-1)}y^{(l-1)}_{[0]} & \cdots & _{(n-1)}y^{(l-1)}_{[o_{l-1}-1]} 
\end{array} \right] _{n{\times}i_{l}}$$

For example, in the 3-neurons neural network above, input matrix to the first layer is $\left[ \begin{array}{c} 1 & x_0 & x_1 \end{array} \right] _{1{\times}3}$, and the input matrix to the second layer is $\left[ \begin{array}{c} 1 & y_0 & y_1 \end{array} \right] _{1{\times}3}$

## Matrix representations - weight matrix of one neuron

For a neuron, the weight matrix multiplies a weight with each input in every dimension, and sums them. This can be represented by a dot product.

Assuming the input to the $k^{th}$ neuron in the $l^{th}$ layer has $i_{l}$ dimensions,

$$W^{(l)}_{[k]} {_{1{\times}i_{l}}} = \left[ \begin{array}{c} w^{(l)}_{[k],0} & w^{(l)}_{[k],1} & \cdots & w^{(l)}_{[k],i_{l}-1} \end{array} \right] _{1{\times}i_{l}}$$

(Remember $i_{l} = 1 + o_{(l-1)}$)

Then the output of that neuron for one data point is x < dot product \> weights.

$$y^{(l)}_{[k]} {_{1{\times}1}} = Sigmoid( x^{(l)} {_{1{\times}i_{l}}} \; .* \; W^{(l)}_{[k]}{^T}{_{i_{l}{\times}1}} )$$

$$
=
Sigmoid \left(
x^{(l)}_{[k]}
\left[ \begin{array}{c} 1 & y^{(l-1)}_{0} & \cdots & y^{(l-1)}_{(o_{l-1}-1)}
\end{array} \right] _{1{\times}i_{l}}
\;\;\; .* \;\;\;
W^{(l)}_{[k]} {^{T}}
\left[ \begin{array}{c} w^{(l)}_{[k],0} \\ w^{(l)}_{[k],1} \\ \vdots \\ w^{(l)}_{[k],i_{l}-1} \end{array} \right] _{i_{l}{\times}1}
\right)
$$

$$
= Sigmoid(1*w^{(l)}_{[k],0} \;\;+\;\; y^{(l-1)}_{0}*w^{(l)}_{[k],1} \;\;+\;\; ... \;\;+\;\; y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1})
$$

(We can see that the dot product of the $x$ and $W$ matrices does indeed give the output of the neuron)

For $n$ data points, the output of the $k^{th}$ neuron in the $l^{th}$ layer is:
$$Y^{(l)}_{[k]} {_{n{\times}1}}
=
Sigmoid \left(
X^{(l)}_{[k]}
\left[ \begin{array}{c} 
1 & _{(0)}y^{(l-1)}_{0} & \cdots & _{(0)}y^{(l-1)}_{(o_{l-1}-1)} \\
1 & _{(1)}y^{(l-1)}_{0} & \cdots & _{(1)}y^{(l-1)}_{(o_{l-1}-1)} \\
\vdots & \vdots & \ddots & \vdots \\
1 & _{(n-1)}y^{(l-1)}_{0} & \cdots & _{(n-1)}y^{(l-1)}_{(o_{l-1}-1)}
\end{array} \right] _{n{\times}i_{l}}
\; .* \;
W^{(l)}_{[k]} {^{T}}
\left[ \begin{array}{c} w^{(l)}_{[k],0} \\ w^{(l)}_{[k],1} \\ \vdots \\ w^{(l)}_{[k],i_{l}-1} \end{array} \right] _{i_{l}{\times}1}
\right)
$$

$$
=
Sigmoid \left(
\left[ \begin{array}{c} 
1*w^{(l)}_{[k],0} \;\;+\;\; _{(0)}y^{(l-1)}_{(0)}*w^{(l)}_{[k],1} \;\;+\;\; ... \;\;+\;\; _{(0)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\
1*w^{(l)}_{[k],0} \;\;+\;\; _{(1)}y^{(l-1)}_{0}*w^{(l)}_{[k],1} \;\;+\;\; ... \;\;+\;\; _{(1)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\
\vdots \\
1*w^{(l)}_{[k],0} \;\;+\;\; _{(n-1)}y^{(l-1)}_{0}*w^{(l)}_{[k],1} \;\;+\;\; ... \;\;+\;\; _{(n-1)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\
\end{array} \right] _{n{\times}1}
\right)
$$

## Matrix representations - weight of a layer of neurons

Suppose the $l^{th}$ layer in a neural network has $o_{l}$ neurons.

Each neuron would produce one number as its output - the dot product of its weights, and the inputs.

In matrix form, the weight matrix of the layer is:

$$
W^{(l)}_{o_{l}{\times}i_{l}} = \left[ \begin{array}{c} W^{(l)}_{[0]} \\ W^{(l)}_{[1]} \\ \cdots \\ W^{(l)}_{[o_{l}-1]} \end{array} \right] _{o_{l}{\times}i_{l}} 
= 
\left[ \begin{array}{c} 
w^{(l)}_{[0],0} & w^{(l)}_{[0],1} & w^{(l)}_{[0],2} & \cdots & w^{(l)}_{[0],i_{l}-1} \\ 
w^{(l)}_{[1],0} & w^{(l)}_{[1],1} & w^{(l)}_{[1],2} & \cdots & w^{(l)}_{[1],i_{l}-1} \\ 
\vdots & \vdots & \vdots & \ddots & \vdots \\ 
w^{(l)}_{[o_{l}-1],0} & w^{(l)}_{[o_{l}-1],1} & w^{(l)}_{[o_{l}-1],2} & \cdots & w^{(l)}_{[o_{l}-1],i_{l}-1} 
\end{array} \right] _{o_{l}{\times}i_{l}}
$$

The output of this layer of neurons is:

$$ Y^{(l)}_{n{\times}o_{l}} = Sigmoid\;(\;X^{(l)}_{n{\times}i_{l}} \; .* \; W^{(l)}{^{T}}_{i_{l}{\times}o_{l}} \;)\; $$

$$
Y^{(l)}_{n{\times}o_{l}} \left[ \begin{array}{c} 
{_{(0)}}y_{0}^{(l)} & \cdots & {_{(0)}}y_{o_{l}-1}^{(l)} \\ 
{_{(1)}}y_{0}^{(l)} & \cdots & {_{(1)}}y_{o_{l}-1}^{(l)} \\ 
\vdots & \ddots & \vdots \\ 
_{(n-1)}y_{0}^{(l)} & \cdots & _{(n-1)}y_{o_{l}-1}^{(l)} 
\end{array} \right] _{n{\times}o_{l}}
=
Sigmoid \left(
X^{(l)}_{n{\times}i_{l}} \left[ \begin{array}{c} 
1 & _{(0)}y^{(l-1)}_{0} & \cdots & _{(0)}y^{(l-1)}_{(o_{l-1}-1)} \\ 
1 & _{(1)}y^{(l-1)}_{0} & \cdots & _{(1)}y^{(l-1)}_{(o_{l-1}-1)} \\ 
\vdots & \vdots & \ddots & \vdots \\ 
1 & _{(n-1)}y^{(l-1)}_{0} & \cdots & _{(n-1)}y^{(l-1)}_{(o_{l-1}-1)} 
\end{array} \right] _{n{\times}i_{l}}
\; .* \;
W^{(l)}{^{T}}_{i_{l}{\times}o_{l}} \left[ \begin{array}{c} 
w^{(l)}_{[0],0} & w^{(l)}_{[1],1} & \cdots & w^{(l)}_{[o_{l}-1],0} \\ 
w^{(l)}_{[0],1} & w^{(l)}_{[1],1} & \cdots & w^{(l)}_{[o_{l}-1],1} \\ 
\vdots & \vdots & \ddots & \vdots \\ 
w^{(l)}_{[0],i_{l}-1} & w^{(l)}_{[1],1} & \cdots & w^{(l)}_{[o_{l}-1],i_{l}-1} 
\end{array} \right] _{i_{l}{\times}o_{l}}
\right)
$$

$$
=
Sigmoid \left(
\left[ \begin{array}{c} 
1*w^{(l)}_{[0],0} + \cdots + _{(0)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[0],i_{l-1}-1}
&
\cdots
&
1*w^{(l)}_{[(o_{l}-1)],0} + \cdots + _{(0)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[(o_{l}-1)],i_{l-1}-1}
\\
\vdots & \ddots & \vdots
\\
1*w^{(l)}_{[0],0} + \cdots + _{(n-1)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[0],i_{l-1}-1}
&
\cdots
&
1*w^{(l)}_{[(o_{l}-1)],0} + \cdots + _{(n-1)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[(o_{l}-1)],i_{l-1}-1}
\end{array} \right] _{n{\times}o_{l}}
\right)
$$

## Conclusion

We have seen that the action of a layer of a neural network can be written as the following matrix operation:

$$ Y^{(l)}_{n{\times}o_{l}} = Sigmoid\;(\;X^{(l)}_{n{\times}i_{l}} \; .* \; W^{(l)}{^{T}}_{i_{l}{\times}o_{l}} \;)\; $$

So, a neural network can be defined as the set of weights $W^{(l)}_{i_{l}{\times}o_{l}}$ for all its layers, where $l$ is the index of the layer we are considering, $i_{l}$ and $o_{l}$ are its input and output dimensions.

Also, because of adding a bias term at every layer,

$$i_{l} = 1 + o_{(l-1)}$$

The utility of neural networks can be exploited only once the weight matrices $W^{(l)}_{i_{l}{\times}o_{l}}$ for all $l$ have ben set according to need.