
# RNNs for Time Series Prediction


In this tutorial  we will deploy a simple probabilistic model using a Recurrent Neural Network (RNN) to infer the probability distribution of a time-series. Given a set of signals $\mathcal{D}=\{X^1,X^2,\ldots,X^N\}$ where 
$$X^i = [X_0^i,X_1^i,\ldots,X_T^i]$$
is the $i$-th signal, we will train a probabilistic model of the form
$$p(X|X_0) = \prod_{t=0}^T p(X_t|X_{t-1},X_{:t-1})$$
where $X_{:t-1}$ is the signal up to time and $X_0$ is the first step of the signal, which we assume to be $X_0 \sim   \mathcal{N}\left( 0, \sigma^2\right)$. Additionally, we assume  each factor is a Gaussian distribution with fixed variance $\sigma^2$ and mean given by **the ouput of a RNN** with input $X_{t-1}$. For any time $t$ the signal up to time $t-1$, that is $X_{:t-1}$,  is embedded through the **RNN** state $\mathbf{h}_{t-1}$. Hence, the conditional probability $p(X_t|X_{t-1},X_{:t-1})$ is given by:


$$p(X_t|X_{t-1},X_{:t-1}) = \mathcal{N}\left(f_{RNN}(X_{t-1},\mathbf{h}_{t-1}),\sigma^2\right)$$

During training, for $t=1,\ldots,T$, we will sample $\hat{X}_t$ from $p(X_t|X_{t-1},X_{:t-1})$ and minimize the average squared loss $\frac{1}{T}\sum_{t=1}^T(X_t-\hat{X}_t)^2$. Then we average again for all signals in the training set. Note that during training we **feed the RNN with the true values of the signal** $X^i = [X_0^i,X_1^i,\ldots,X_T^i]$. 

We would like to give credit to Prof. Pablo M. Olmos (Universidad Carlos III) whose notebook inspired ours.

**IMPORTANT NOTE:** The exercises throught the notebook are indicated as follows:
> **Exercise**:

These exercises ask you to complete missing parts in the code. Missing code can be located whenever you find the following comment:
```
# YOUR CODE HERE
```

In [None]:
import torch
from torch import nn
from torch import optim
import numpy as np
import time
import matplotlib.pyplot as plt

%matplotlib inline
%config InlineBackend.figure_format = 'retina'  #To get figures with high quality!


## Part I. Create a synthetic databse

We will generate $N$ target signals of length $T$ time steps. We generate each signal as one realization of the following autoregressive model
\begin{align}
X_{t}=c+\sum_{i=1}^{p} \varphi_{i} X_{t-i}+\varepsilon_{t}
\end{align}



In [None]:
N = 1000 # Number of signals

T = 200

c = 0
# weights previous p=3 time steps
phi_1 = 1

phi_2 = -1

phi_3 = 1

sigma = 1

X = np.zeros([N,T])

np.random.seed(23)

# initialize the first 4 time steps randomly with contributions from previous time steps as applicable
X[:,0] = c + np.random.randn(N,)*np.sqrt(sigma)

X[:,1] = c + phi_1 * X[:,0] + np.random.randn(N,)*np.sqrt(sigma)

X[:,2] = c + phi_1 * X[:,1] + phi_2 * X[:,0] + np.random.randn(N,)*np.sqrt(sigma)

X[:,3] = c + phi_1 * X[:,2] + phi_2 * X[:,1] + phi_3 * X[:,0] + np.random.randn(N,)*np.sqrt(sigma)

t = 4

while (t<T):

    X[:,t] = c + phi_1 * X[:,t-1] + phi_2 * X[:,t-2] + phi_3 * X[:,t-3] + np.random.randn(N,)*np.sqrt(sigma)
    
    t +=1
    

# Create targets
# for 0 <= i <= T-1, X[:,i] is the currently observed value and Y[:,i] is the value we want to predict

Y = X[:,1:] # all time steps but the first (targets to predict)
X = X[:,:-1] # all time steps but the last

T -=1

The goal of the RNN will be to predict the value of the signal in the next time step $t$ given the current observation at time $t-1$. Note that the noise in the model

$$p(X_t|X_{t-1},X_{:t-1}) = \mathcal{N}\left(f_{RNN}(X_{t-1},\mathbf{h}_{t-1}),\sigma^2\right)$$

will simply introduce an error that will prevent the model from overfitting. 

Let's plot one of the signals versus the *target*, which is the same signal but shifted to the right ...

In [None]:
# Plot the signal 
plt.figure(figsize=(8,5))
plt.title("whole series")
plt.plot(np.arange(T), X[1,:T], 'r.--', label='input, x',ms=10) # x
plt.plot(np.arange(T), Y[1,:T], 'b.-', label='target, y',ms=10) # y

plt.legend(loc='best')

# Plot the signal (20 first steps)
plt.figure(figsize=(8,5))
plt.title("first 20 steps")
plt.plot(np.arange(20), X[1,:20], 'r.--', label='input, x',ms=10) # x
plt.plot(np.arange(20), Y[1,:20], 'b.-', label='target, y',ms=10) # y

plt.legend(loc='best')

## Part II. RNN

Next, we define an RNN in PyTorch. We'll use `nn.RNN` to create an RNN layer, which takes in a number of parameters:
* **input_size** - the number of expected features in the input $x$
* **hidden_dim** - the number of features in the hidden state $h$
* **n_layers** - the number of recurrent layers. If you're stacking up multiple recurrent layers (i.e.,  `num_layers>1`) one after the other we're talking about **stacked RNNs**. For example, setting `num_layers=2` would mean stacking two RNNs together, with the second RNN taking in outputs of the first RNN and computing the final results. 

This is an example of a stacked RNN

<img src="https://yiyibooks.cn/__src__/wizard/nmt-tut-neubig-2017_20180721165003_deleted/img/6-5.jpg" width="40%"> 


If you take a look at the [RNN documentation](https://pytorch.org/docs/stable/nn.html#rnn), you will see that `nn.RNN` only provides the actual computation of the hidden states along time
\begin{align}
h_{t}=g \left(W_{i h} x_{t}+b_{i h}+W_{h h} h_{(t-1)}+b_{h h}\right)
\end{align}

If we want the output of our RNN (in our case $x_{t + 1}$) to have a different size than the state $h_t$, then we'll add a last, fully-connected layer to get the output size that we want. For simplicity, **the input to this dense layer is the state $h_t$ of the RNN, i.e. $x_{t + 1} = f(h_t)$**.

You have to pay special attention to the dimensions of the input/output tensors of the RNN. **Check the [RNN documentation](https://pytorch.org/docs/stable/nn.html#rnn)**.




The following class implements the network architecture where 
- An input signal with features of dimension `input_size` is processed by a RNN. As a result, we obtain a sequence of states $\mathbf{h}_{t}$, from $t=1$ to $t=T$.
- We process each state with a linear layer to estimate the output signal (of dimension `output_size`) at time $t$ from $\mathbf{h}_{t}$. 
- We add Gaussian noise with variance `sigma` to the output of the linear layer.


**Remark:** Do not confuse the PyTorch implementation of a RNN `nn.RNN` with our implementation `RNN`.

> **Exercise**: complete the following code. Understand all steps, particularly those in the `forward` method.

In [None]:
class RNN(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers,sigma):
        
        # input size -> Dimension of the input signal
        # outpu_size -> Dimension of the output signal
        # hidden_dim -> Dimension of the rnn state
        # n_layers -> Number of recurrent layers. If >1, we are using a stacked RNN.
        
        super().__init__()
        
        self.hidden_dim = hidden_dim
        
        self.input_size = input_size
        
        self.sigma = sigma

        # define a RNN with specified parameters
        # SUGGEST: batch_first=True means that the batch size is the first component of the input tensor's shape
        # batch_first=True means that the first dimension of the input will be the batch_size
        self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_dim, num_layers=n_layers, 
                          nonlinearity='relu',batch_first=True)
        
        # last, fully-connected layer
        self.fc1 = # YOUR CODE HERE

    def forward(self, x, h0=None):
        
        '''
        About the shape of the different tensors ...:
        
        - Input signal x has shape (batch_size, seq_length, input_size)
        - The initialization of the RNN hidden state h0 has shape (n_layers, batch_size, hidden_dim).
          If None value is used, internally it is initialized to zeros.
        - The RNN output has shape (batch_size, seq_length, hidden_size). This output is the RNN's state unfolded in time  

        '''
        batch_size = x.size(0) # Number of signals N
        seq_length = x.size(1) # T
        
        # get RNN outputs
        # r_out is the sequence of states
        # hidden is just the last state (we will use it for forecasting)
        r_out, hidden = # YOUR CODE GOES HERE 
        
        # shape r_out to be (seq_length, hidden_dim) #UNDERSTANDING THIS POINT IS IMPORTANT!!
        # passing -1 as parameter means that the sequence length is inferred from r_out        
        r_out = r_out.reshape(-1, self.hidden_dim) 
        
        output = self.fc1(r_out)
        
        noise = torch.randn_like(output)*sigma
        
        output += noise
        
        # reshape back to temporal structure
        output = output.reshape([-1,seq_length,1])
        
        return output, hidden


> **Exercise:** Instantiate the object RNN implemented above with the right parameters for our problem. Use `hidden_dim=32`, `n_layers=1` and `sigma=1`. 

In [None]:
test_rnn = # YOUR CODE HERE

In the following code, we compute the model output using the `forward` method. Note that we use an all zero initial state.

> **Exercise**: Complete the following code. What are the dimensions of variables `h` and `o`? How are these dimensions related to the number of signals, hidden state of the RNN and signal duration?

In [None]:
X_in = torch.Tensor(X).view([-1,T,1])

o,h = # YOUR CODE HERE


dim_h = h.shape 
dim_o = o.shape 

print(f"dim_h={dim_h}")

print(f"dim_o={dim_o}")

> **Exercise:** Complete the code for the following class, which extends `RNN` to include a training method.  Store the values of the loss for every training iteration.

Note that there is no mini-batch, we process all signals for every SGD iteration. You are free employ the mini-batch training functionality.

In [None]:
class RNN_extended(RNN):
    
    
    def __init__(self, num_data_train, num_iter, sequence_length,
                 input_size, output_size, hidden_dim, n_layers, sigma, lr=0.001):
        
        super().__init__(input_size, output_size, hidden_dim, n_layers,sigma) 
        
        self.hidden_dim = hidden_dim
        
        self.sequence_length = sequence_length
        
        self.num_layers = n_layers # number of recurrent layers
        
        self.lr = lr # learning rate
        
        self.num_train = num_data_train # number of training signals
        
        self.optim = optim.Adam(self.parameters(), self.lr) # optimizer
        
        self.num_iter = num_iter # number of training iterations
        
        self.criterion = nn.MSELoss() # loss function
                
        # A list to store the loss evolution along training
        
        self.loss_during_training = [] 
        
           
    def trainloop(self,x,y):
        '''
        x: input signals shaped (n_samples, n_tsteps, n_features)
        y: target signal shaped (n_samples, n_tsteps, 1)
        '''
        # SGD Loop
        
        for e in range(int(self.num_iter)):
            # reset the gradient
            self.optim.zero_grad() 
                
            x = torch.Tensor(x).view()  # YOUR CODE HERE: arrange x to the required shape

            y = torch.Tensor(y).view()  # YOUR CODE HERE: arrange y to the required shape

                            
            # YOUR CODE HERE
            # 1. Forward the signal
            # 2. compute the loss and add it to running_loss
            # 3. Perform backward pass

            
            # This code helps to avoid vanishing exploiting gradients in RNNs
            # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
            nn.utils.clip_grad_norm_(self.parameters(), 2.0)
                
            self.optim.step()
            
            if(e % 50 == 0): # Every 10 iterations

                print("Iteration %d. Training loss: %f" %(e,self.loss_during_training[-1]))                

> **Exercise:** Using only the first 100 time steps of every signal, train the RNN for 100 SGD iterations. Use `hidden_dim=32`, `n_layers=1` and `sigma=1`. Recall that the target signal is stored in the variable `Y`.

In [None]:
T_train = 100
my_rnn = # YOUR CODE HERE: instatiate extended class

In [None]:
# YOUR CODE HERE: run the training loop on the first T_train time steps
my_rnn.trainloop(X[:,:T_train],Y[:,:T_train])

> **Exercise:** Plot the loss for every training iteration.

In [None]:
# YOUR CODE HERE
plt.plot(my_rnn.loss_during_training,label='Training Loss')
plt.legend()

> **Exercise:** For $50 \leq t \leq 100$ predict the target signal using the network. Plot the input signal, target signal and predicted signal

In [None]:
# We first evaluate the model for the N signals up to time T_train = 100
X_in = torch.Tensor(X[:,:T_train]).view([N,T_train,1]) 

o,h =  # YOUR CODE HERE

output_rnn = o.detach().numpy().reshape([N,-1])

offset = 50

signal = 0 # index of the signal to plot from 0 to N-1 (you can play with this)

# Plot the first training signal and the target
plt.figure(figsize=(8,5))
plt.plot(np.arange(T_train-offset,T_train,1), X[signal,T_train-offset:T_train], 'r.--', label='input, x',ms=10) # x
plt.plot(np.arange(T_train-offset,T_train,1), Y[signal,T_train-offset:T_train], 'b.-', label='target, y',ms=10) # x
plt.plot(np.arange(T_train-offset,T_train,1), output_rnn[signal,T_train-offset:T_train], 'k.-', label='RNN output',ms=10) # y


plt.legend(loc='best')



Observe that the prediction is pretty good! The RNN model has clearly learnt the dynamics of the dataset. In the previous experiment, we  fed the RNN model with the **true** values of the signal, In other words, we use  $\hat{X}_t$ at time $t$ to predict $\hat{X}_{t+1}$ and we loop for all the values of the signal.

Using the model we have just trained, let's now do **forecasting**. Namely, we feed the RNN the output that we predicted and we do this recursively for as long as we want. This represents **sampling** from the probabilistic model 



$$p(X|X_{:T_{train}}) = \prod_{t=T_{train}}^T \mathcal{N}\left(f_{RNN}(X_{t-1},\mathbf{h}_{t-1}),\sigma^2\right)$$


To forecast, we have to repeatedly call the `forward` method, feeding the output and state from the last call as the signal and state to the next  `forward` call. The following code would do the job:

In [None]:
# We take the last RNN output
current_input = o[:,-1,:].view([N,1,1]) #Note that current input only contains one observation for each of the N signals
# We take the last RNN state
current_state = h
# Will hold the forecasted signals
forecast_rnn = np.zeros([N,T-T_train])

for t in range(T-T_train):
    
    # ... and feed them as input and initial state
    
    current_input,current_state = my_rnn.forward(current_input,current_state)
    
    forecast_rnn[:,t] = current_input.detach().numpy().reshape([-1,])
    
# stack predictions from true data and forecasts
final_rnn_reconstruct = np.hstack([output_rnn,forecast_rnn])

# We plot the signal and the target before and after forecasting

plt.plot(np.arange(0,T-1,1), Y[signal,:-1].reshape([-1]), 'b.-', label='target, y',ms=10) 
plt.plot(np.arange(0,T-1,1), final_rnn_reconstruct[signal,:-1], 'g.-', label='RNN output',ms=10) 
plt.plot([T_train,T_train],[np.min(Y[signal,:]),np.max(Y[signal,:])],'k--')
plt.legend()

print('Between t=0 and t=100, we feed the real values')
print('From t=100, we feed the estimated values (forecasting)')

Observe that, during forecasting, sometimes the model quickly diverges from the target. That is **expected**, because we are sampling from the generative model and it is likely that we do not get exactly the same sample! Particularly in RNN, since they have short memory.

## Part III. LSTM

LSTMs were designed to mitigate the short lifetime of memory in a traditional RNN. Intuitively, they capture long-term associations by adding more flexibility to the state. A LSTM cell can decide to which extent the current signal changes the current state and how much of the current state is forgotten depending on the current signal.


You can create a basic [LSTM layer](https://pytorch.org/docs/stable/nn.html#lstm) as follows

```python
lstm = nn.LSTM(input_size, n_hidden, n_layers, 
                            dropout=drop_prob, batch_first=True)
```

where `input_size` is the number of characters this cell expects to see as sequential input, and `n_hidden` is the number of units in the hidden layers in the cell. If **stacked LSTMs (`n_layers>1`) are used** we can automatically add dropout between LSTM layers with te parameter `dropout` with a specified probability.

In this task we'll use a modified version of the previous architecture obtained by replacing the RNN layers by LSTM layers. The RNN from the previous section had difficulties forcasting the signal, we would hope that we can address this issue with the more elaborate memory provided by LSTMs.


**Remark:** Do not confuse the PyTorch implementation of a LSTM `nn.LSTM` with our implementation `LSTM`.

> **Exercise:** Complete the code for the following two classes 

In [None]:
class LSTM(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers,sigma,drop_prob):
        
        # input size -> Dimension of the input signal
        # outpusize -> Dimension of the output signal
        # hidden_dim -> Dimension of the rnn state
        # n_layers -> If >1, we are using a stacked RNN
        
        super().__init__()
        
        self.hidden_dim = hidden_dim
        
        self.input_size = input_size
        
        self.sigma = sigma

        # define an RNN with specified parameters
        # batch_first=True means that the first dimension of the input will be the batch_size
        self.lstm = nn.LSTM(input_size, hidden_dim, n_layers, dropout=drop_prob, batch_first=True)
        
        # add a last, fully-connected layer
        self.fc1 = # YOUR CODE HERE

    def forward(self, x, h0=None, valid=False):
        
        '''
        About the shape of the different tensors ...:
        
        - Input signal x has shape (batch_size, seq_length, input_size)
        - The initialization of the LSTM hidden state is a tuple, containing two tensors of dimensions
          (n_layers, batch_size, hidden_dim) each. The first tensor represents the LSTM hidden state 
          cell states. We can use the None value so internally they are initialized with 0s.
        - The LSTM output shape is (batch_size, seq_length, hidden_size) 

        valid: flag; if true perform forward pass in evaluation mode if false use training mode (e.g. use Dropout)
        '''
        
        if(valid):
            self.eval()
        else:
            self.train()
        
        batch_size = x.size(0) # Number of signals N
        seq_length = x.size(1) # T
        
        # get RNN outputs
        # r_out is the sequence of states
        # hidden is just the last state (we will use it for forecasting)
    
        r_out, hidden = # YOUR CODE HERE
        
        # shape r_out to be (seq_length, hidden_dim) # UNDERSTANDING THIS POINT IS IMPORTANT!!        
        r_out = r_out.reshape(-1, self.hidden_dim) 
        
        output = self.fc1(r_out)
        
        noise = torch.randn_like(output)*sigma
        
        output += noise
        
        # reshape back to temporal structure
        output = output.reshape([-1,seq_length,1])
        
        return output, hidden


In [None]:
class LSTM_extended(LSTM):
        
    def __init__(self, num_data_train, num_iter, sequence_length,
                 input_size, output_size, hidden_dim, n_layers, sigma, drop_prob=0.3, lr=0.001):
        
        super().__init__(input_size, output_size, hidden_dim, n_layers,sigma,drop_prob) 
        
        self.hidden_dim = hidden_dim
        
        self.sequence_length = sequence_length
        
        self.num_layers = n_layers
        
        self.lr = lr #Learning Rate
        
        self.num_train = num_data_train #Number of training signals
        
        self.optim = optim.Adam(self.parameters(), self.lr)
        
        self.num_iter = num_iter
        
        self.criterion =    # YOUR CODE HERE     
        
        # A list to store the loss evolution along training
        
        self.loss_during_training = [] 
        
           
    def trainloop(self,x,y):
        '''
        x: signals shaped (n_samples, n_time_steps, n_features)
        y: targets shaped (n_samples, n_time_steps)
        '''
        # SGD Loop
        
        for e in range(int(self.num_iter)):
        
            self.optim.zero_grad() 

            # adjust shape of x and y    
            x = torch.Tensor(x).view()  #YOUR CODE HERE 

            y = torch.Tensor(y).view()  #YOUR CODE HERE 

            # YOUR CODE HERE
            # 1. Forward the signal
            # 2. compute the loss and add it to running_loss
            # 3. Perform backward pass

            
            # This code helps to avoid vanishing exploiting gradients in RNNs
            # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
            nn.utils.clip_grad_norm_(self.parameters(), 2.0)
                
            self.optim.step()
            
            if(e % 50 == 0): # Every 10 iterations

                print("Iteration %d. Training loss: %f" %(e,self.loss_during_training[-1]))                

> **Exercise:** Train the LSTM model for 500 iterations using the first 100 values of each signal. Use `hidden_dim=32`, `n_layers=1` and `sigma=1`. Recall that the target signal is stored in the variable `Y`.
Note that with only one layer, the dropout probability parameter does not play any role (you will get a warning actually).


In [None]:

my_lstm = # YOUR CODE HERE 

In [None]:
my_lstm.trainloop(X[:,:T_train],Y[:,:T_train]) 

In [None]:
plt.plot(my_lstm.loss_during_training,label='Training Loss')
plt.legend()

> **Exercise:**  For $0 \leq t \leq T_{train}$ predict the next value of the signal. Plot one input signal, predicted signal and target signal.


In [None]:
# We first evaluate the model for the N signals up to time T_train = 100
X_in = torch.Tensor(X[:,:T_train]).view([N,T_train,1]) 

o,h =  # YOUR CODE HERE

output_lstm = o.detach().numpy().reshape([N,-1])

offset = 50

signal = 6 # From 1 to N (you can play with this)

# Plot the first training signal and the target
plt.figure(figsize=(8,5))
plt.plot(np.arange(T_train-offset,T_train,1), X[signal,T_train-offset:T_train], 'r.--', label='input, x',ms=10) # x
plt.plot(np.arange(T_train-offset,T_train,1), Y[signal,T_train-offset:T_train], 'b.-', label='target, y',ms=10) # x
plt.plot(np.arange(T_train-offset,T_train,1), output_lstm[signal,T_train-offset:T_train], 'k.-', label='LSTM output',ms=10) # y


plt.legend(loc='best')

> **Exercise:**  For $T_{train} \leq t < T$ forecast the signals given the prediction at time $T_{train}$ and the state at time $T_{train}$. Plot the input signal, target signal, RNN prediction and forecast and the LSTM prediction and forecast. Discuss your findings.

In [None]:
# We take the last RNN output 
current_input = o[:,-1,:].view([N,1,1]) #Note that current input only contains one observation for each of the N signals
# We take the last RNN state
current_state = h

forecast_lstm = np.zeros([N,T-T_train])

for t in range(T-T_train):
    
    # ... and feed them as input and initial state
    
    current_input,current_state = # YOUR CODE HERE 
    
    forecast_lstm[:,t] = current_input.detach().numpy().reshape([-1,])
    
final_lstm_reconstruct = np.hstack([output_lstm,forecast_lstm])

# We plot the signal and the target before and after forecasting

signal = 6

plt.plot(np.arange(0,T-1,1), Y[signal,:-1].reshape([-1]), 'b.-', label='target, y',ms=10) 
plt.plot(np.arange(0,T-1,1), final_lstm_reconstruct[signal,:-1], 'g.-', label='LSTM output',ms=10) 
plt.plot(np.arange(0,T-1,1), final_rnn_reconstruct[signal,:-1], 'r-', label='RNN output',ms=10) 
plt.plot([T_train,T_train],[np.min(Y[signal,:]),np.max(Y[signal,:])],'k--')
plt.legend()

print('Between t=0 and t=100, we feed the real values')
print('From t=100, we feed the estimated values (forecasting)')

You should observe that the LSTM is able to keep track of the real signal during forecasting for a longer period of time.

> **Exercise:** Repeat the last exercise with a LSTM with 3 LSTM layers.
1. Train the model with signals up to time $T_{train}$.
2. Predict the target for $0 \leq t < T_{train}$.
3. Forecast the target for $T_{train} \leq t < T$ given the last prediction and state.
4. Plot the input signal, target and predicted/forecasted signal.


In [None]:
my_lstm3 = # YOUR CODE HERE

In [None]:
my_lstm3.trainloop(X[:,:T_train],Y[:,:T_train])

In [None]:
# We first evaluate the model for the N signals up to time T_train = 1000
X_in = torch.Tensor(X[:,:T_train]).view([N,T_train,1]) 

o,h = my_lstm3.forward(X_in) 

output_lstm3 = o.detach().numpy().reshape([N,-1])

offset = 50

signal = 6 # From 1 to N (you can play with this)

# Plot the first training signal and the target
plt.figure(figsize=(8,5))
plt.plot(np.arange(T_train-offset,T_train,1), X[signal,T_train-offset:T_train], 'r.--', label='input, x',ms=10) # x
plt.plot(np.arange(T_train-offset,T_train,1), Y[signal,T_train-offset:T_train], 'b.-', label='target, y',ms=10) # x
plt.plot(np.arange(T_train-offset,T_train,1), output_lstm3[signal,T_train-offset:T_train], 'k.-', label='LSTM3 output',ms=10) # y


plt.legend(loc='best')

In [None]:
# We take the last RNN output 
current_input = o[:,-1,:].view([N,1,1]) #Note that current input only contains one observation for each of the N signals
# We take the last RNN state
current_state = h

forecast_lstm3 = np.zeros([N,T-T_train])

for t in range(T-T_train):
    
    # ... and feed them as input and initial state
    
    current_input,current_state = # YOUR CODE HERE 
    
    forecast_lstm3[:,t] = current_input.detach().numpy().reshape([-1,])
    
final_lstm3_reconstruct = np.hstack([output_lstm3,forecast_lstm3])

# We plot the signal and the target before and after forecasting

signal = 6

plt.plot(np.arange(0,T-1,1), Y[signal,:-1].reshape([-1]), 'b.-', label='target, y',ms=10) 
plt.plot(np.arange(0,T-1,1), final_lstm_reconstruct[signal,:-1], 'g.-', label='LSTM output',ms=10)
plt.plot(np.arange(0,T-1,1), final_lstm3_reconstruct[signal,:-1], 'k.-', label='LSTM3 output',ms=10)
plt.plot(np.arange(0,T-1,1), final_rnn_reconstruct[signal,:-1], 'r-', label='RNN output',ms=10) 
plt.plot([T_train,T_train],[np.min(Y[signal,:]),np.max(Y[signal,:])],'k--')
plt.legend()

print('Between t=0 and t=100, we feed the real values')
print('From t=100, we feed the estimated values (forecasting)')