# RNN for Time Series Forecasting

Recurrent neural networks (RNN) are a type of neural network that handle sequences, i.e. the relation between observations to each other.

Therefore, RNN seem to be able to learn temporal contexts. In addition, there may be the hope that the explicit relationship (such as trend and seasonalities can be learned by the network without explicitly being programmed. 

## Time Series Forecasting

- Adds the complexity of sequential order (temporal dependence between observations)
- Requires specialized handling of the data (fitting and evaluating)
- Adds additional structure that the model could potentially exploit (i.e. patterns like seasonality and trends)
- Traditional time series analysis focuses on linear methods such as ARIMA models and exponential smoothing

## Neural Networks for Time Series

Neural networks approximate a mapping function from input variables to output variables. 
 
- Robust to noise
- Nonlinear relationship can be captured

## Recurrent Neural Networks for Time Series

- Add explicit handling of order between observations when learning a mapping function from inputs to outputs. 
- Sequence as a new dimension
- LSTM, a special kind of RNN, is able to solve many time series tasks unsolvable by feed-forward networks using fixed size time windows. 
- RNN can learn the temporal dependence from the data. 
- LSTM has the ability to learn long term correlations in a sequence

> Because of this ability to learn long term correlations in a sequence, LSTM networks obviate the need for a pre-specified time window and are capable of accurately modelling complex multivaraite sequences. 

So there is the hope that LSTM may learn complex relationships such as trend and seasonality. In addition, practice and some research suggests removing such systematic structures to simplify the problem space (e.g. Makridakis et al, 2018). 

## Predictions with Sequences

In general, observations in machine learning are treated equally wrt to their order to each other. This is different for sequences. 

- Sequences impose an explicit order on the observations.

### Sequence prediction

Input is an ordered sequence and the task is to predict the next value in the sequence. 

Examples: 

- Weather forecasting, Stock market prediction, Product Recommendation

### Sequence classification

Sequence is given, what is the class label of the sequence?

Examples for classes: 

- Trend/no trend, seasonality/ no seasonality
- Anomaly detection, Sentiment analysis (text is a typical example of sequences that are dealt with in deep learning)

### Sequence Generation

Generating a new output sequence that has the same general characteristics as the input.

Examples: 

- Text generation, Handwriting prediction, Music generation

### Sequence-to-Sequence Prediction

Predicting an output sequence given an input sequence.

Examples: 

- Multi-step Time Series Forecasting
- Text summarization
- Program execution

After clarifying sequences, how can these problems be approached?

# Introduction to Long Short-Term Memory Networks

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network that respect the sequential order of a time series or other data that are ordered in this way.

- Recurrent networks have an internal *state* that can represent context information
- The *state* keeps information about past inputs for an amount of time that is not fixed but depends on its weights and the input data. 
- This network can be used to transform an input sequence into an output sequence

Requirements of a RNN:

- System can store information for an arbitray duration
- System is resistant to noise
- System paramaters can be trained (in reasonable time frame)

Context in RNN: 

- RNN contain cycles that feed the network activations from a previous time step as inputs to the network to influence predictions at the current time step.  
- Activations are stored internally
 - In principle, they can hold long-term temporal contextual information. 
 
### LSTMs

- RNN fail to learn in the presence of time lags greater than 5-10 time steps (vanishing gradient).
- LSTM is not affected by this problem and can learn time lags in excess of 1000 discrete time steps
- This can be achieved by enforcing a constant error or "constant error carrousel" whitin special units (special cells)

Problems the LSTM addresses: 

1. Vanishing gradient
2. Exploding gradients
 
Both are related to the training process of the network.

Key to the success of LSTM is a specific internal structure of the units used in the model

LSTM analogies: 

- Motivation is the error flow of existing RNNs
 - long time lags inaccessible, backpropagated error either blows up or decays exponentially
- LSTM layers has a set of recurrently connected blocks (units which deliver information to themselves)
 - Each LSTM block contains one or more recurrently connected memory cells and three multiplicative units (input, output, forget gate) that provide continous analogues of write, read and reset operations
 - Net can only interact witht the cells via the gates. 
- Promise for any sequential processing task in which we suspect a hierarchical decomposition

### Bidirectional LSTMs

- Process each training sequence forwards and backwards to two separate recurrent nets, both of which are connected to the same output layer

### Seq2seq LSTMs or RNN Encoder-Decoders?

Idea (Ilya Sutskever, 2014):

- Use one LSTM to read the input sequence, one timestep at a time.
- Use another LSTM to extract the output sequence from that vector. 
 - This second LSTM is essentially a RNN language model that is conditioned on the input sequence. 
- Does well on long sentences 


# Introduction to Models for Sequence Prediction with Recurrent Neural Networks

