# PyTorch Basics - RNN

### By [Akshaj Verma](https://akshajverma.com)

This notebook takes you through basics of RNN on PyTorch along with a small demo model.

In [1]:
import numpy as np

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

torch.manual_seed(0)

<torch._C.Generator at 0x7f8128032f70>

## Types of RNNs

RNNs are mainly used in case of sequential data such as time series or NLP. There are multiple different types of RNNs which are used for different applications.  

<img src="../../assets/rnn_karpathy.jpeg" />

For time series, we might use a many-to-many or many-to-one dependingn upon how many time-steps are we predicting for. We can either predict for 1 time-step in the future based on previous data or we can predict multiple days into the future based on previous data.  

In case of NLP, different applications might be:  

* Text Classification: many-to-one
* Text Generation: many-to-many
* Machine Translation: many-to-many
* Named Entity Recognition: many-to-many
* Image Captioning: one-to-many






### many-to-many

In case of many-to-one networks, we care about all the outputs in the network.

<img src="../../assets/rnn_many_to_many.jpg" />


### many-to-one

In case of many-to-one networks, we only care about the final output and not the intermediate outputs in the network.

<img src="../../assets/rnn_many_to_one.jpg" />



### stacked rnns

We often stack rnns together for better performance.

<img src="../../assets/stacked_lstm.png" />


### Bidirectional RNN

<img src="../../assets/birnn.png" />


This notebook explains RNNs in pytorch using a watered-down version of time series forecasting. We will build a many-to-many model to predict numbers. This can by converted to a many-to-one model by simply changing the hidden size. 

## Input Data

Here the data is:   
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]  

We divide it into 4 batches of sequence length = 5.  
[[1, 2, 3, 4, 5],  
[6, 7, 8, 9, 10],  
[11, 12, 13, 14, 15],  
[16, 17, 18, 19, 20]]


Batch Size = 4  
Sequence length = 5  
input size = 1 (Since, only one dimension)  

`hidden_size` refers to the size of the output from RNN.  

In our case, we're looking at 5 (seq_len) previous value to predict the next 2 (hidden_size) values.

In [2]:
# data = torch.Tensor([[1, 2, 3, 4, 5],
#                     [6, 7, 8, 9, 10],
#                     [11, 12, 13, 14, 15],
#                     [16, 17, 18, 19, 20]])

data = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

print("Data:\n", data, "\n")
print("Data Shape: \n", data.shape)

Data:
 tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13., 14.,
        15., 16., 17., 18., 19., 20.]) 

Data Shape: 
 torch.Size([20])


In [3]:
# Number of features used as input. (Number of columns)
INPUT_SIZE = 1

# Number of previous time stamps taken into account.
SEQ_LENGTH = 5

# Number of output units.
HIDDEN_SIZE = 2

# Number of stacked rnn layers.
NUM_LAYERS = 1

# We have total of 20 rows in our input. 
# We divide into 4 batches of 5 rows each.
BATCH_SIZE = 4

## RNN

The rnn has two outputs - `out` and `hidden`.   

* `out` is the output of the rnn at each layer.
* `hidden` is output of the rnn at the last layer.  


If we don't initialize the hidden layer, it will be auto-initiliased by PyTorch to be all zeros.

In [4]:
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers = 1, batch_first=True)

# input size : (batch_size , seq_length, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)

# hidden state size : (num_layers * num_directions, batch, hidden_size)
# hidden = torch.randn(NUM_LAYERS, BATCH_SIZE, HIDDEN_SIZE)

out, hidden = rnn(inputs)

In the output below, notice the last row in each batch of `output` is present in `hidden`. `output` contains all the outputs at each time step.  

We have 4 batches as our output because we set the `BATCH_SIZE=4`. Each batch contains 5 rows because out `SEQ_LENGTH = 5`. In a batch, we have 2 columns as well because `HIDDEN_SIZE=2` ie. number of future time-step predicitons.

In [5]:
print('Input:\n', inputs)
print('\nOutput:\n', out)
print('\nHidden:\n', hidden)

Input:
 tensor([[[ 1.],
         [ 2.],
         [ 3.],
         [ 4.],
         [ 5.]],

        [[ 6.],
         [ 7.],
         [ 8.],
         [ 9.],
         [10.]],

        [[11.],
         [12.],
         [13.],
         [14.],
         [15.]],

        [[16.],
         [17.],
         [18.],
         [19.],
         [20.]]])

Output:
 tensor([[[-0.0819,  0.8100],
         [-0.4311,  0.9332],
         [-0.3162,  0.9748],
         [-0.3979,  0.9875],
         [-0.3675,  0.9944]],

        [[-0.1081,  0.9953],
         [-0.5145,  0.9986],
         [-0.3269,  0.9995],
         [-0.4254,  0.9997],
         [-0.3820,  0.9999]],

        [[-0.1342,  0.9999],
         [-0.5245,  1.0000],
         [-0.3458,  1.0000],
         [-0.4382,  1.0000],
         [-0.3982,  1.0000]],

        [[-0.1601,  1.0000],
         [-0.5328,  1.0000],
         [-0.3648,  1.0000],
         [-0.4506,  1.0000],
         [-0.4143,  1.0000]]], grad_fn=<TransposeBackward1>)

Hidden:
 tensor([[[-0.3675,  0.9944

## Stacked RNN

If I change the `num_layers = 3`, we will have 3 rnn layers stacked next to each other. See how the `hidden` and `output` tensors change.   

We now have 3 batches in the `hidden` tensor. The last batch contains the end-rows of each batch in the `output` tensor.

In [6]:
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers = 3, batch_first=True)

# input size : (batch_size , seq_length, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)

# hidden state size : (num_layers * num_directions, batch, hidden_size)
# hidden = torch.randn(3, BATCH_SIZE, HIDDEN_SIZE)

out, hidden = rnn(inputs)

In [7]:
print('Input:\n', inputs)
print('\nOutput:\n', out)
print('\nHidden:\n', hidden)

Input:
 tensor([[[ 1.],
         [ 2.],
         [ 3.],
         [ 4.],
         [ 5.]],

        [[ 6.],
         [ 7.],
         [ 8.],
         [ 9.],
         [10.]],

        [[11.],
         [12.],
         [13.],
         [14.],
         [15.]],

        [[16.],
         [17.],
         [18.],
         [19.],
         [20.]]])

Output:
 tensor([[[ 0.3144, -0.7527],
         [-0.0597, -0.6038],
         [ 0.0896, -0.7646],
         [ 0.0608, -0.6358],
         [ 0.1084, -0.6783]],

        [[ 0.4442, -0.6350],
         [ 0.0949, -0.3948],
         [ 0.2715, -0.5962],
         [ 0.1819, -0.4580],
         [ 0.2529, -0.5213]],

        [[ 0.4907, -0.5688],
         [ 0.1671, -0.2976],
         [ 0.3462, -0.4922],
         [ 0.2388, -0.3768],
         [ 0.3078, -0.4418]],

        [[ 0.5041, -0.5466],
         [ 0.1883, -0.2675],
         [ 0.3684, -0.4576],
         [ 0.2572, -0.3502],
         [ 0.3238, -0.4167]]], grad_fn=<TransposeBackward1>)

Hidden:
 tensor([[[-0.6480, -0.4044

## Bidirectional RNN

In [8]:
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers = 1, batch_first=True, bidirectional = True)

# input size : (batch_size , seq_length, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)

# hidden state size : (num_layers * num_directions, batch, hidden_size)
# hidden = torch.randn(3, BATCH_SIZE, HIDDEN_SIZE)

out, hidden = rnn(inputs)

In [9]:
print('Input:\n', inputs)
print('\nOutput:\n', out)
# print(out.contiguous().view(SEQ_LENGTH, BATCH_SIZE, 2, HIDDEN_SIZE))
print("\nForward Outputs: \n", out[:, :, :HIDDEN_SIZE])
print("\nBackward Outpus: \n", out[:, :, HIDDEN_SIZE:])
print('\nHidden:\n', hidden)
print("\nForward Hidden: \n", hidden[:, :, 0])
print("\nBackward Hidden: \n", hidden[:, :, 1])

Input:
 tensor([[[ 1.],
         [ 2.],
         [ 3.],
         [ 4.],
         [ 5.]],

        [[ 6.],
         [ 7.],
         [ 8.],
         [ 9.],
         [10.]],

        [[11.],
         [12.],
         [13.],
         [14.],
         [15.]],

        [[16.],
         [17.],
         [18.],
         [19.],
         [20.]]])

Output:
 tensor([[[ 0.2184,  0.4086,  0.6418, -0.1677],
         [-0.0222, -0.0095,  0.8794, -0.4927],
         [-0.6716, -0.2802,  0.9585, -0.7248],
         [-0.9387, -0.4152,  0.9846, -0.8646],
         [-0.9841, -0.6164,  0.9789, -0.9192]],

        [[-0.9813, -0.8829,  0.9979, -0.9721],
         [-0.9986, -0.8902,  0.9992, -0.9877],
         [-0.9995, -0.9449,  0.9997, -0.9946],
         [-0.9998, -0.9729,  0.9999, -0.9977],
         [-0.9999, -0.9868,  0.9998, -0.9987]],

        [[-0.9999, -0.9968,  1.0000, -0.9996],
         [-1.0000, -0.9969,  1.0000, -0.9998],
         [-1.0000, -0.9985,  1.0000, -0.9999],
         [-1.0000, -0.9993,  1.0000, -1

## Simple LSTM

The rnn has two outputs - `out` and `h,c` (hiddent layers).   

* `out` is the output of the rnn at each layer.
* `h,c` is output of the rnn at the last layer.  


If we don't initialize the hidden layer, it will be auto-initiliased by PyTorch to be all zeros.

In [10]:
lstm = nn.LSTM(input_size = INPUT_SIZE, hidden_size = HIDDEN_SIZE, num_layers = NUM_LAYERS, batch_first = True)

# input size = (batch_size , seq_length, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)

# hidden state size = (num_layers * num_directions, batch, hidden_size)
h_0 = torch.randn(NUM_LAYERS, BATCH_SIZE, HIDDEN_SIZE)
c_0 = torch.randn(NUM_LAYERS, BATCH_SIZE, HIDDEN_SIZE)

out, (h_n, c_n) = lstm(inputs, (h_0, c_0))

In [11]:
print('Input:\n', inputs)
print('\nOutput:\n', out)
print('\nh_n:\n', h_n)
print('\nc_n:\n', c_n)

Input:
 tensor([[[ 1.],
         [ 2.],
         [ 3.],
         [ 4.],
         [ 5.]],

        [[ 6.],
         [ 7.],
         [ 8.],
         [ 9.],
         [10.]],

        [[11.],
         [12.],
         [13.],
         [14.],
         [15.]],

        [[16.],
         [17.],
         [18.],
         [19.],
         [20.]]])

Output:
 tensor([[[ 1.0726e-01,  3.0418e-03],
         [ 9.1427e-02,  9.1830e-02],
         [ 5.8402e-02,  1.0662e-01],
         [ 3.5712e-02,  9.9316e-02],
         [ 2.1396e-02,  8.5975e-02]],

        [[ 1.5140e-02,  1.2776e-01],
         [ 7.5433e-03,  7.4776e-02],
         [ 4.3969e-03,  5.9002e-02],
         [ 2.5599e-03,  4.5910e-02],
         [ 1.4913e-03,  3.5681e-02]],

        [[ 5.0291e-04, -9.5297e-03],
         [ 4.6364e-04,  2.4555e-04],
         [ 2.8614e-04,  3.5727e-03],
         [ 1.6931e-04,  4.8141e-03],
         [ 9.9062e-05,  4.9508e-03]],

        [[ 6.7911e-06,  6.3028e-03],
         [ 2.7046e-05,  5.8015e-03],
         [ 1.8402e-

## Simple GRU

The rnn has two outputs - `out` and `hidden`.   

* `out` is the output of the rnn at each layer.
* `hidden` is output of the rnn at the last layer.  


If we don't initialize the hidden layer, it will be auto-initiliased by PyTorch to be all zeros.

In [12]:
gru = nn.GRU(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers = NUM_LAYERS, batch_first=True)

# input size = (batch_size , seq_length, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)

# hidden state size = (num_layers * num_directions, batch, hidden_size)
hidden = torch.randn(NUM_LAYERS, BATCH_SIZE, HIDDEN_SIZE)

out, hidden = gru(inputs, hidden)

In [13]:
print('Input:\n', inputs)
print('\nOutput:\n', out)
print('\nHidden:\n', hidden)

Input:
 tensor([[[ 1.],
         [ 2.],
         [ 3.],
         [ 4.],
         [ 5.]],

        [[ 6.],
         [ 7.],
         [ 8.],
         [ 9.],
         [10.]],

        [[11.],
         [12.],
         [13.],
         [14.],
         [15.]],

        [[16.],
         [17.],
         [18.],
         [19.],
         [20.]]])

Output:
 tensor([[[ 0.1936,  0.3611],
         [ 0.4350,  0.4892],
         [ 0.6126,  0.6602],
         [ 0.7295,  0.7821],
         [ 0.8052,  0.8545]],

        [[-0.1884, -0.0280],
         [ 0.2012,  0.0870],
         [ 0.4086,  0.1688],
         [ 0.5362,  0.2237],
         [ 0.6212,  0.2597]],

        [[-0.3989,  0.7967],
         [-0.1058,  0.8006],
         [ 0.0780,  0.8033],
         [ 0.2040,  0.8052],
         [ 0.2949,  0.8064]],

        [[ 0.3991,  1.0043],
         [ 0.4479,  1.0043],
         [ 0.4863,  1.0042],
         [ 0.5170,  1.0042],
         [ 0.5419,  1.0042]]], grad_fn=<TransposeBackward1>)

Hidden:
 tensor([[[0.8052, 0.8545],

## End-to-End RNN Example [Time Series]

This example is just to give you an idea into what training an rnn on PyTorch looks like.

### Train/Test Data

In [14]:
train_data = []
test_data = []

for i in range(101):
    train_data.append(i)
    
for i in range(101, 131):
    test_data.append(i)

In [15]:
print("Train data: \n", train_data)
print("\nTest data: \n", test_data)

Train data: 
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]

Test data: 
 [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130]


#### Create Train Data Structure  

Here, we will look at 10 previous time-steps to predict 1 time-step in the future.

In [16]:
X_train = []
y_train = []

look_back = 10
predict_ahead = 1
num_features = 1

for i in range(len(train_data) - look_back - predict_ahead):
    X_train.append(train_data[i: i + look_back])
    y_train.append(train_data[i + look_back: i + look_back + predict_ahead])

X_train, y_train = np.array(X_train), np.array(y_train)

In [17]:
print("First 10 rows of X_train: \n", X_train[0:10, ])

print("\nFirst 10 rows of y_train: \n", y_train[0:10, ])

First 10 rows of X_train: 
 [[ 0  1  2  3  4  5  6  7  8  9]
 [ 1  2  3  4  5  6  7  8  9 10]
 [ 2  3  4  5  6  7  8  9 10 11]
 [ 3  4  5  6  7  8  9 10 11 12]
 [ 4  5  6  7  8  9 10 11 12 13]
 [ 5  6  7  8  9 10 11 12 13 14]
 [ 6  7  8  9 10 11 12 13 14 15]
 [ 7  8  9 10 11 12 13 14 15 16]
 [ 8  9 10 11 12 13 14 15 16 17]
 [ 9 10 11 12 13 14 15 16 17 18]]

First 10 rows of y_train: 
 [[10]
 [11]
 [12]
 [13]
 [14]
 [15]
 [16]
 [17]
 [18]
 [19]]


In [18]:
print("X_train.shape: ", X_train.shape)
print("y_train.shape: ", y_train.shape)

X_train.shape:  (90, 10)
y_train.shape:  (90, 1)


#### Create Test Data Structure

In [19]:
X_test = []
y_test = []

look_back = 10
predict_ahead = 1
num_features = 1

for i in range(len(test_data) - look_back - predict_ahead):
    X_test.append(test_data[i: i + look_back])
    y_test.append(test_data[i + look_back: i + look_back + predict_ahead])

X_test, y_test = np.array(X_test), np.array(y_test)

In [20]:
print("First 10 rows of X_test: \n", X_test[0:10, ])

print("\nFirst 10 rows of y_test: \n", y_test[0:10, ])

First 10 rows of X_test: 
 [[101 102 103 104 105 106 107 108 109 110]
 [102 103 104 105 106 107 108 109 110 111]
 [103 104 105 106 107 108 109 110 111 112]
 [104 105 106 107 108 109 110 111 112 113]
 [105 106 107 108 109 110 111 112 113 114]
 [106 107 108 109 110 111 112 113 114 115]
 [107 108 109 110 111 112 113 114 115 116]
 [108 109 110 111 112 113 114 115 116 117]
 [109 110 111 112 113 114 115 116 117 118]
 [110 111 112 113 114 115 116 117 118 119]]

First 10 rows of y_test: 
 [[111]
 [112]
 [113]
 [114]
 [115]
 [116]
 [117]
 [118]
 [119]
 [120]]


In [21]:
print("X_test.shape: ", X_test.shape)
print("y_test.shape: ", y_test.shape)

X_test.shape:  (19, 10)
y_test.shape:  (19, 1)


### Neural Net Tuning Parameters

In [22]:
batch_size = 5
input_size = 1
sequence_length = 10
hidden_size = 1
num_layer = 3
epochs = 10

### Data Loaders

In [23]:
class trainData(Dataset):
    
    def __init__(self, X_data, y_data):
        self.X_data = X_data
        self.y_data = y_data
        
    def __getitem__(self, index):
        return self.X_data[index], self.y_data[index]
        
    def __len__ (self):
        return len(self.X_data)

    
class testData(Dataset):
    
    def __init__(self, X_data):
        self.X_data = X_data
        
    def __getitem__(self, index):
        return self.X_data[index]
        
    def __len__ (self):
        return len(self.X_data)

In [24]:
train_data = trainData(torch.FloatTensor(X_train), torch.FloatTensor(y_train))
test_data = testData(torch.FloatTensor(X_test))

In [25]:
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=False, num_workers=2)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False, num_workers=2)

### RNN

#### RNN Class

Here, we're not initialising the hidden layers. If you want to initialise hidden layers, follow the examples above and modify the code below.

In [26]:
class ModelRNN(nn.Module):
    def __init__(self):
        super(ModelRNN, self).__init__()
        self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size, batch_first=True, num_layers=num_layer)
        
    def forward(self, x):
#         hidden = torch.randn(num_layer, batch_size, hidden_size)

        x = x.view(batch_size, sequence_length, input_size)
        out, hidden = self.rnn(x)
        
        return out, hidden

In [27]:
model_rnn = ModelRNN()
print(model_rnn)

ModelRNN(
  (rnn): RNN(1, 1, num_layers=3, batch_first=True)
)


#### Optimizer and Loss function

In [28]:
optimizer=torch.optim.Adam(model_rnn.parameters(),lr=0.01)
criterion=nn.MSELoss()

#### RNN - Train Loop

In [29]:
for e in range(epochs):
    epoch_loss = 0
    for x_batch, y_batch in train_loader:
        optimizer.zero_grad()
        output, hidden = model_rnn(x_batch)
        # hidden[-1] gives the last hidden layer ie. our output
        # Print out the shapes of 'y_batch' and 'hidden' yourselves here to explore
        loss = criterion(hidden[-1],  y_batch) 
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        
    print(f'Epoch {e+1:02}: | Loss: {epoch_loss/len(train_loader):.5f}')

Epoch 01: | Loss: 3724.94150
Epoch 02: | Loss: 3629.42670
Epoch 03: | Loss: 3556.37166
Epoch 04: | Loss: 3543.76887
Epoch 05: | Loss: 3541.13346
Epoch 06: | Loss: 3540.09723
Epoch 07: | Loss: 3539.50432
Epoch 08: | Loss: 3539.10224
Epoch 09: | Loss: 3538.80756
Epoch 10: | Loss: 3538.58223


### LSTM

#### LSTM Class

Here, we're not initialising the hidden layers. If you want to initialise hidden layers, follow the examples above and modify the code below.

In [30]:
class ModelLSTM(nn.Module):
    def __init__(self):
        super(ModelLSTM, self).__init__()
        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size, batch_first=True, num_layers=num_layer)
        
    def forward(self, x):
#         h_0 = torch.randn(num_layer, batch_size, hidden_size)
#         c_0 = torch.randn(num_layer, batch_size, hidden_size)
        
        x = x.view(batch_size, sequence_length, input_size)
        
        out, (h_n, c_n) = self.lstm(x)

        return out, (h_n, c_n)

In [31]:
model_lstm = ModelLSTM()
print(model_lstm)

ModelLSTM(
  (lstm): LSTM(1, 1, num_layers=3, batch_first=True)
)


#### Optimizer and Loss function

In [32]:
optimizer=torch.optim.Adam(model_lstm.parameters(),lr=0.01)
criterion=nn.MSELoss()

#### LSTM - Train Loop

In [33]:
for e in range(epochs):
    epoch_loss = 0
    for x_batch, y_batch in train_loader:
        optimizer.zero_grad()
        output, hidden = model_rnn(x_batch)
        # hidden[-1] gives the last hidden layer ie. our output
        # Print out the shapes of 'y_batch' and 'hidden' yourselves here to explore
        loss = criterion(hidden[-1],  y_batch) 
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        
    print(f'Epoch {e+1:02}: | Loss: {epoch_loss/len(train_loader):.5f}')

Epoch 01: | Loss: 3538.50384
Epoch 02: | Loss: 3538.50384
Epoch 03: | Loss: 3538.50384
Epoch 04: | Loss: 3538.50384
Epoch 05: | Loss: 3538.50384
Epoch 06: | Loss: 3538.50384
Epoch 07: | Loss: 3538.50384
Epoch 08: | Loss: 3538.50384
Epoch 09: | Loss: 3538.50384
Epoch 10: | Loss: 3538.50384


### GRU

#### GRU Class  


Here, we're not initialising the hidden layers. If you want to initialise hidden layers, follow the examples above and modify the code below.

In [34]:
class ModelGRU(nn.Module):
    def __init__(self):
        super(ModelGRU, self).__init__()
        self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, batch_first=True, num_layers=num_layer)
        
        
    def forward(self, x):
#         hidden = torch.randn(num_layer, batch_size, hidden_size)

        x = x.view(batch_size, sequence_length, input_size)
        out, hidden = self.gru(x, hidden)
        
        return out, hidden

In [35]:
model_gru = ModelGRU()
print(model_gru)

ModelGRU(
  (gru): GRU(1, 1, num_layers=3, batch_first=True)
)


#### Optimizer and Loss function

In [36]:
optimizer=torch.optim.Adam(model_gru.parameters(),lr=0.01)
criterion=nn.MSELoss()

#### GRU - Train Loop

In [37]:
for e in range(epochs):
    epoch_loss = 0
    for x_batch, y_batch in train_loader:
        optimizer.zero_grad()
        output, hidden = model_rnn(x_batch)
        # hidden[-1] gives the last hidden layer ie. our output
        # Print out the shapes of 'y_batch' and 'hidden' yourselves here to explore
        loss = criterion(hidden[-1],  y_batch) 
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        
    print(f'Epoch {e+1:02}: | Loss: {epoch_loss/len(train_loader):.5f}')

Epoch 01: | Loss: 3538.50384
Epoch 02: | Loss: 3538.50384
Epoch 03: | Loss: 3538.50384
Epoch 04: | Loss: 3538.50384
Epoch 05: | Loss: 3538.50384
Epoch 06: | Loss: 3538.50384
Epoch 07: | Loss: 3538.50384
Epoch 08: | Loss: 3538.50384
Epoch 09: | Loss: 3538.50384
Epoch 10: | Loss: 3538.50384
