# Figuring How Bidirectional RNN works in Pytorch

In [1]:
import numpy as np
import torch, torch.nn as nn
from torch.autograd import Variable

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

### Initialize Input Sequence Randomly
For demonstration purpose, we are going to feed RNNs only one sequence of length 5 with only one dimension.

In [10]:
random_input = Variable(torch.FloatTensor(5, 1, 1).normal_(), requires_grad=False)
random_input[:, 0, 0]

tensor([-0.4637,  0.4782,  0.3996, -0.0229, -1.9840])

### Initialize a Bidirectional GRU Layer

In [11]:
bi_grus = torch.nn.GRU(input_size=1, hidden_size=1, num_layers=1, batch_first=False, bidirectional=True)

### Initialize a GRU Layer ( for Feeding the Sequence Reversely)

In [12]:
reverse_gru = torch.nn.GRU(input_size=1, hidden_size=1, num_layers=1, batch_first=False, bidirectional=False)

Now make sure the weights of the reverse gru layer match ones of the (reversed) bidirectional's:

In [13]:
reverse_gru.weight_ih_l0 = bi_grus.weight_ih_l0_reverse
reverse_gru.weight_hh_l0 = bi_grus.weight_hh_l0_reverse
reverse_gru.bias_ih_l0 = bi_grus.bias_ih_l0_reverse
reverse_gru.bias_hh_l0 = bi_grus.bias_hh_l0_reverse

### Feed Input Sequence into Both Networks

In [14]:
bi_output, bi_hidden = bi_grus(random_input)

In [15]:
reverse_output, reverse_hidden = reverse_gru(random_input[np.arange(4, -1, -1), :, :])

### Check Outputs

In [16]:
reverse_output[:, 0, 0]

tensor([ 0.4459, -0.2399, -0.5074, -0.5941, -0.2412], grad_fn=<SelectBackward>)

The outputs of the reverse GRUs sit in the [latter half of the output](https://discuss.pytorch.org/t/get-forward-and-backward-output-seperately-from-bidirectional-rnn/2523)(in the last dimension):

In [17]:
bi_output[:, 0, 1]

tensor([-0.2412, -0.5941, -0.5074, -0.2399,  0.4459], grad_fn=<SelectBackward>)

### Check Hidden States

In [18]:
reverse_hidden

tensor([[[-0.2412]]], grad_fn=<StackBackward>)

The hidden states of the reversed GRUs sits in [the odd indices in the first dimension](https://discuss.pytorch.org/t/how-can-i-know-which-part-of-h-n-of-bidirectional-rnn-is-for-backward-process/3883/4).

In [19]:
bi_hidden[1]

tensor([[-0.2412]], grad_fn=<SelectBackward>)

## Conclusion

1. The returned outputs of bidirectional RNN at timestep t is just the output after feeding input to both the reverse and normal RNN unit at timestep t. (where normal RNN has seen inputs 1...t and reverse RNN has seen inputs t...n, n being the length of the sequence)
2. The returned hidden state of bidirectional RNN is the hidden state after the whole sequence is consume. For normal RNN it's after timestep n; for reverse RNN it's after timestep 1.