Long Short-Term Networks or LSTMs are a popular and powerful type of Recurrent Neural Network, or RNN.

They can be quite difficult to configure and apply to arbitrary sequence prediction problems, even with well defined and “easy to use” interfaces like those provided in the Keras deep learning library in Python.

One reason for this difficulty in Keras is the use of the TimeDistributed wrapper layer and the need for some LSTM layers to return sequences rather than single values.

In this tutorial, you will discover different ways to configure LSTM networks for sequence prediction, the role that the TimeDistributed layer plays, and exactly how to use it.

After completing this tutorial, you will know:

- How to design a one-to-one LSTM for sequence prediction.
- How to design a many-to-one LSTM for sequence prediction without the TimeDistributed Layer.
- How to design a many-to-many LSTM for sequence prediction with - the TimeDistributed Layer.

## TimeDistributed Layer
LSTMs are powerful, but hard to use and hard to configure, especially for beginners.

An added complication is the TimeDistributed Layer (and the former TimeDistributedDense layer) that is cryptically described as a layer wrapper:

>This wrapper allows us to apply a layer to every temporal slice of an input.

How and when are you supposed to use this wrapper with LSTMs?

The confusion is compounded when you search through discussions about the wrapper layer on the Keras GitHub issues and StackOverflow.

For example, in the issue “When and How to use TimeDistributedDense,” fchollet (Keras’ author) explains:

>TimeDistributedDense applies a same Dense (fully-connected) operation to every timestep of a 3D tensor.

This makes perfect sense if you already understand what the TimeDistributed layer is for and when to use it, but is no help at all to a beginner.

This tutorial aims to clear up confusion around using the TimeDistributed wrapper with LSTMs with worked examples that you can inspect, run, and play with to help your concrete understanding.

One-to-One LSTM for Sequence Prediction
Before we dive in, it is important to show that this sequence learning problem can be learned piecewise.

That is, we can reframe the problem into a dataset of input-output pairs for each item in the sequence. Given 0, the network should output 0, given 0.2, the network must output 0.2, and so on.

This is the simplest formulation of the problem and requires the sequence to be split into input-output pairs and for the sequence to be predicted one step at a time and gathered outside of the network.

The input-output pairs are as follows:
```
X, 	y
0.0,	0.0
0.2,	0.2
0.4,	0.4
0.6,	0.6
0.8,	0.8
```

The input for LSTMs must be three dimensional. We can reshape the 2D sequence into a 3D sequence with 5 samples, 1 time step, and 1 feature. We will define the output as 5 samples with 1 feature.

```Python
X = seq.reshape(5, 1, 1)
y = seq.reshape(5, 1)
```

We will define the network model as having 1 input with 1 time step. The first hidden layer will be an LSTM with 5 units. The output layer with be a fully-connected layer with 1 output.

The model will be fit with efficient ADAM optimization algorithm and the mean squared error loss function.

The batch size was set to the number of samples in the epoch to avoid having to make the LSTM stateful and manage state resets manually, although this could just as easily be done in order to update weights after each sample is shown to the network.

The complete code listing is provided below:

In [1]:
import numpy as np 
from keras.models import Sequential
from keras.layers import Dense, LSTM

In [2]:
# prepare sequence
length = 5 
seq = np.array([i/float(length) for i in range(length)])
X = seq.reshape(len(seq),1,1)
y = seq.reshape(len(seq),1)

In [5]:
# define LSTM configuration
n_neurons = length
n_batch = length
n_epoch = 1000
# create the LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(1,1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')


In [6]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_2 (LSTM)                (None, 5)                 140       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 6         
Total params: 146
Trainable params: 146
Non-trainable params: 0
_________________________________________________________________


In [7]:
# train lstm
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)

W0920 19:06:12.817477 140361061619520 deprecation_wrapper.py:119] From /home/tianqin/.conda/envs/tensorflow/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.



Epoch 1/1000
 - 1s - loss: 0.3277
Epoch 2/1000
 - 0s - loss: 0.3253
Epoch 3/1000
 - 0s - loss: 0.3233
Epoch 4/1000
 - 0s - loss: 0.3212
Epoch 5/1000
 - 0s - loss: 0.3190
Epoch 6/1000
 - 0s - loss: 0.3168
Epoch 7/1000
 - 0s - loss: 0.3146
Epoch 8/1000
 - 0s - loss: 0.3124
Epoch 9/1000
 - 0s - loss: 0.3102
Epoch 10/1000
 - 0s - loss: 0.3080
Epoch 11/1000
 - 0s - loss: 0.3057
Epoch 12/1000
 - 0s - loss: 0.3035
Epoch 13/1000
 - 0s - loss: 0.3014
Epoch 14/1000
 - 0s - loss: 0.2992
Epoch 15/1000
 - 0s - loss: 0.2970
Epoch 16/1000
 - 0s - loss: 0.2949
Epoch 17/1000
 - 0s - loss: 0.2927
Epoch 18/1000
 - 0s - loss: 0.2906
Epoch 19/1000
 - 0s - loss: 0.2885
Epoch 20/1000
 - 0s - loss: 0.2864
Epoch 21/1000
 - 0s - loss: 0.2843
Epoch 22/1000
 - 0s - loss: 0.2823
Epoch 23/1000
 - 0s - loss: 0.2802
Epoch 24/1000
 - 0s - loss: 0.2782
Epoch 25/1000
 - 0s - loss: 0.2762
Epoch 26/1000
 - 0s - loss: 0.2742
Epoch 27/1000
 - 0s - loss: 0.2722
Epoch 28/1000
 - 0s - loss: 0.2702
Epoch 29/1000
 - 0s - loss: 0

Epoch 232/1000
 - 0s - loss: 0.0722
Epoch 233/1000
 - 0s - loss: 0.0719
Epoch 234/1000
 - 0s - loss: 0.0716
Epoch 235/1000
 - 0s - loss: 0.0714
Epoch 236/1000
 - 0s - loss: 0.0711
Epoch 237/1000
 - 0s - loss: 0.0708
Epoch 238/1000
 - 0s - loss: 0.0705
Epoch 239/1000
 - 0s - loss: 0.0703
Epoch 240/1000
 - 0s - loss: 0.0700
Epoch 241/1000
 - 0s - loss: 0.0697
Epoch 242/1000
 - 0s - loss: 0.0695
Epoch 243/1000
 - 0s - loss: 0.0692
Epoch 244/1000
 - 0s - loss: 0.0690
Epoch 245/1000
 - 0s - loss: 0.0687
Epoch 246/1000
 - 0s - loss: 0.0685
Epoch 247/1000
 - 0s - loss: 0.0682
Epoch 248/1000
 - 0s - loss: 0.0680
Epoch 249/1000
 - 0s - loss: 0.0677
Epoch 250/1000
 - 0s - loss: 0.0675
Epoch 251/1000
 - 0s - loss: 0.0673
Epoch 252/1000
 - 0s - loss: 0.0670
Epoch 253/1000
 - 0s - loss: 0.0668
Epoch 254/1000
 - 0s - loss: 0.0666
Epoch 255/1000
 - 0s - loss: 0.0664
Epoch 256/1000
 - 0s - loss: 0.0661
Epoch 257/1000
 - 0s - loss: 0.0659
Epoch 258/1000
 - 0s - loss: 0.0657
Epoch 259/1000
 - 0s - loss:

Epoch 460/1000
 - 0s - loss: 0.0382
Epoch 461/1000
 - 0s - loss: 0.0381
Epoch 462/1000
 - 0s - loss: 0.0380
Epoch 463/1000
 - 0s - loss: 0.0379
Epoch 464/1000
 - 0s - loss: 0.0378
Epoch 465/1000
 - 0s - loss: 0.0376
Epoch 466/1000
 - 0s - loss: 0.0375
Epoch 467/1000
 - 0s - loss: 0.0374
Epoch 468/1000
 - 0s - loss: 0.0373
Epoch 469/1000
 - 0s - loss: 0.0371
Epoch 470/1000
 - 0s - loss: 0.0370
Epoch 471/1000
 - 0s - loss: 0.0369
Epoch 472/1000
 - 0s - loss: 0.0368
Epoch 473/1000
 - 0s - loss: 0.0367
Epoch 474/1000
 - 0s - loss: 0.0365
Epoch 475/1000
 - 0s - loss: 0.0364
Epoch 476/1000
 - 0s - loss: 0.0363
Epoch 477/1000
 - 0s - loss: 0.0362
Epoch 478/1000
 - 0s - loss: 0.0361
Epoch 479/1000
 - 0s - loss: 0.0359
Epoch 480/1000
 - 0s - loss: 0.0358
Epoch 481/1000
 - 0s - loss: 0.0357
Epoch 482/1000
 - 0s - loss: 0.0356
Epoch 483/1000
 - 0s - loss: 0.0355
Epoch 484/1000
 - 0s - loss: 0.0353
Epoch 485/1000
 - 0s - loss: 0.0352
Epoch 486/1000
 - 0s - loss: 0.0351
Epoch 487/1000
 - 0s - loss:

Epoch 688/1000
 - 0s - loss: 0.0134
Epoch 689/1000
 - 0s - loss: 0.0134
Epoch 690/1000
 - 0s - loss: 0.0133
Epoch 691/1000
 - 0s - loss: 0.0132
Epoch 692/1000
 - 0s - loss: 0.0131
Epoch 693/1000
 - 0s - loss: 0.0130
Epoch 694/1000
 - 0s - loss: 0.0129
Epoch 695/1000
 - 0s - loss: 0.0129
Epoch 696/1000
 - 0s - loss: 0.0128
Epoch 697/1000
 - 0s - loss: 0.0127
Epoch 698/1000
 - 0s - loss: 0.0126
Epoch 699/1000
 - 0s - loss: 0.0125
Epoch 700/1000
 - 0s - loss: 0.0124
Epoch 701/1000
 - 0s - loss: 0.0124
Epoch 702/1000
 - 0s - loss: 0.0123
Epoch 703/1000
 - 0s - loss: 0.0122
Epoch 704/1000
 - 0s - loss: 0.0121
Epoch 705/1000
 - 0s - loss: 0.0120
Epoch 706/1000
 - 0s - loss: 0.0119
Epoch 707/1000
 - 0s - loss: 0.0119
Epoch 708/1000
 - 0s - loss: 0.0118
Epoch 709/1000
 - 0s - loss: 0.0117
Epoch 710/1000
 - 0s - loss: 0.0116
Epoch 711/1000
 - 0s - loss: 0.0115
Epoch 712/1000
 - 0s - loss: 0.0115
Epoch 713/1000
 - 0s - loss: 0.0114
Epoch 714/1000
 - 0s - loss: 0.0113
Epoch 715/1000
 - 0s - loss:

Epoch 916/1000
 - 0s - loss: 0.0018
Epoch 917/1000
 - 0s - loss: 0.0018
Epoch 918/1000
 - 0s - loss: 0.0018
Epoch 919/1000
 - 0s - loss: 0.0017
Epoch 920/1000
 - 0s - loss: 0.0017
Epoch 921/1000
 - 0s - loss: 0.0017
Epoch 922/1000
 - 0s - loss: 0.0017
Epoch 923/1000
 - 0s - loss: 0.0017
Epoch 924/1000
 - 0s - loss: 0.0016
Epoch 925/1000
 - 0s - loss: 0.0016
Epoch 926/1000
 - 0s - loss: 0.0016
Epoch 927/1000
 - 0s - loss: 0.0016
Epoch 928/1000
 - 0s - loss: 0.0016
Epoch 929/1000
 - 0s - loss: 0.0016
Epoch 930/1000
 - 0s - loss: 0.0015
Epoch 931/1000
 - 0s - loss: 0.0015
Epoch 932/1000
 - 0s - loss: 0.0015
Epoch 933/1000
 - 0s - loss: 0.0015
Epoch 934/1000
 - 0s - loss: 0.0015
Epoch 935/1000
 - 0s - loss: 0.0015
Epoch 936/1000
 - 0s - loss: 0.0014
Epoch 937/1000
 - 0s - loss: 0.0014
Epoch 938/1000
 - 0s - loss: 0.0014
Epoch 939/1000
 - 0s - loss: 0.0014
Epoch 940/1000
 - 0s - loss: 0.0014
Epoch 941/1000
 - 0s - loss: 0.0014
Epoch 942/1000
 - 0s - loss: 0.0013
Epoch 943/1000
 - 0s - loss:

<keras.callbacks.callbacks.History at 0x7fa8075d6c90>

In [8]:
# evaluate
result = model.predict(X, batch_size=n_batch)

In [9]:
result


array([[0.0501245 ],
       [0.21198079],
       [0.39504686],
       [0.5875248 ],
       [0.7778261 ]], dtype=float32)