two layer RNN

$h$ = hidden size for 1st layer.

- Input $X_t \in \mathbb{R}^{n \times d}$
- First Layer Weights $W^{(1)} \in \mathbb{R}^{d \times h}$ 
- First Layer Hidden State Weights $W_{h}^{(1)} \in \mathbb{R}^{h \times h}$
- First Layer Bias $b \in \mathbb{R}^{1 \times h}$
- First layer Output, $H_t^{(1)} \in \mathbb{R}^{n \times h}$.
- First Layer Hidden State, $H_{t-1}^{(1)} \in \mathbb{R}^{n \times h}$

```math

H_t = \phi(X_tW^{(1)} + H_{t-1}^{(1)}W_h^{(1)} + b)

```

$h_2$ = hidden size for 2nd layer.

- Input $H_t^{(1)} \in \mathbb{R}^{n \times h}$
- Second Layer Weights $W^{(2)} \in \mathbb{R}^{h \times h_2}$ 
- Second Layer Hidden State Weights $W_{h}^{(2)} \in \mathbb{R}^{h_2 \times h_2}$
- Second Layer Bias $b \in \mathbb{R}^{1 \times h_2}$
- Second layer Output, $H_t^{(2)} \in \mathbb{R}^{n \times h_2}$.
- Second Layer Hidden State, $H_{t-1}^{(1)} \in \mathbb{R}^{n \times h_2}$

second layer

```math
H_t^{(2)} = \phi(H_t^{(1)}W^{(2)} + H_{t-1}^{(2)} W_h^{(2)} + b)
```

In [2]:
import numpy as np
from npRNNv2 import RNN

np.random.seed(0)

## Initializing RNN.

In [2]:
h_units = (10, 20)
activation_funcs = ('tanh', 'tanh')
in_dim = 50
batch_size = 1

rnn = RNN(h_units, activation_funcs, in_dim, batch_size)

(10, 20)
[[array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])], [array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.]])]]


## Verifying Internal Dims / Correct Implementation

In [3]:
weights = rnn.weight
h_weight = rnn.h_weight
bias = rnn.bias

In [4]:
print(len(weights))
print(len(h_weight))
print(len(bias))

2
2
2


$d = 50$<br/>
$h = 10$<br/>
$h_2 = 20$<br/>

In [5]:
print(f"Layer 1 Weights: {weights[0].shape}")
print(f"Layer 1 Hidden State Weights: {h_weight[0].shape}")
print(f"Layer 1 Bias: {bias[0].shape}")
print()
print(f"Layer 2 Weights: {weights[1].shape}")
print(f"Layer 2 Hidden State Weights: {h_weight[1].shape}")
print(f"Layer 2 Bias: {bias[1].shape}")

Layer 1 Weights: (50, 10)
Layer 1 Hidden State Weights: (10, 10)
Layer 1 Bias: (1, 10)

Layer 2 Weights: (10, 20)
Layer 2 Hidden State Weights: (20, 20)
Layer 2 Bias: (1, 20)


Batch Size, $n = 1$<br/>
Input, $X \in \mathbb{R}^{n \times d} = \mathbb{R}^{1 \times 50}$

## Initialzing Random Data and Testing Forward Pass

In [6]:
X = np.random.randn(1, 50)
rnn._forward(X)

In [7]:
print(f"Input: {X.shape}") 
print(f"Layer 1 Weight: {rnn.weight[0].shape}") 
print(f"Layer 1 Hidden Weight: {rnn.h_weight[0].shape}")
print(f"Layer 1 Output: {rnn.a[0].shape}")
print(f"")
print(f"Layer 2 Weight: {rnn.weight[1].shape}") 
print(f"Layer 2 Hidden Weight: {rnn.h_weight[1].shape}")
print(f"Layer 2 Output: {rnn.a[1].shape}")

Input: (1, 50)
Layer 1 Weight: (50, 10)
Layer 1 Hidden Weight: (10, 10)
Layer 1 Output: (1, 50, 10)

Layer 2 Weight: (10, 20)
Layer 2 Hidden Weight: (20, 20)
Layer 2 Output: (1, 50, 20)


### now for multiple time steps,

```math
\mathcal{X} = \begin{bmatrix} X_1 & X_2 & \dots & S \end{bmatrix}

\\[3mm]

\mathcal{X} = \begin{bmatrix} 
\begin{bmatrix} .2 & .3 & \dots & d \end{bmatrix} & 
\begin{bmatrix} .5 & .2 & \dots & d \end{bmatrix} & 
\cdots & 
\begin{bmatrix} .9 & .1 & \dots & d \end{bmatrix}&
S
\end{bmatrix} \in \mathbb{R}^{1 \times d \times S}

```

where $d$ is the dimensionality of the embedding space and $S$ is the fixed size sequence length.

we'll be doing:

$d = 50$ <br/>
$S = 5$

In [52]:
np.random.seed(0)

X = np.random.randn(1, 50, 5) # sequence length of 5 or time steps = 5

h_units = (10, 20)
activation_funcs = ('tanh', 'tanh')
in_dim = 50
seq_len = 2
batch_size = 1

rnn = RNN(h_units, activation_funcs, in_dim, seq_len, batch_size)
rnn._forward(X)

(2, 3)
[[array([[0., 0.]]), array([[0., 0.]])], [array([[0., 0., 0.]]), array([[0., 0., 0.]])]]


In [4]:
array = np.empty(shape = (2, 2))
array_2 = np.random.randn(2, 2)

print(array)
print(array_2)

print(array * array_2)

[[0. 0.]
 [0. 0.]]
[[ 1.86755799 -0.97727788]
 [ 0.95008842 -0.15135721]]
[[ 0. -0.]
 [ 0. -0.]]
