https://www.analyticsvidhya.com/blog/2019/01/introduction-time-series-classification/

## Setting up the Problem Statement

We will be working on the ‘Indoor User Movement Prediction‘ problem. In this challenge, multiple motion sensors are placed in different rooms and the goal is to identify whether an individual has moved across rooms, based on the frequency data captured from these motion sensors.

There are four motion sensors (A1, A2, A3, A4) placed across two rooms. Have a look at the below image which illustrates where the sensors are positioned in each room. The setup in these two rooms was created in 3 different pairs of rooms (group1, group2, group3).

A person can move along any of the six pre-defined paths shown in the above image. If a person walks on path 2, 3, 4 or 6, he moves within the room. On the other hand, if a person follows path 1 or path 5, we can say that the person has moved between the rooms.

The sensor reading can be used to identify the position of a person at a given point in time. As the person moves in the room or across rooms, the reading in the sensor changes. This change can be used to identify the path of the person.

Now that the problem statement is clear, it’s time to get down to coding! In the next section, we will look at the dataset for the problem which should help clear up any lingering questions you might have on this statement. You can download the dataset from this link: https://archive.ics.uci.edu/ml/datasets/Indoor+User+Movement+Prediction+from+RSS+data.

Our dataset comprises of 316 files:

- 314 MovementAAL csv files containing the readings from motion sensors placed in the environment
- A Target csv file that contains the target variable for each MovementAAL file
- One Group Data csv file to identify which MovementAAL file belongs to which setup group
- The Path csv file that contains the path which the object took

In [20]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline 
from os import listdir 

import tensorflow as tf 
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.optimizers import Adam
from keras.models import load_model
from keras.callbacks import ModelCheckpoint

from sklearn.metrics import accuracy_score

  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


In [5]:
df1 = pd.read_csv('./MovementAAl/dataset/MovementAAL_RSS_1.csv')
df2 = pd.read_csv('./MovementAAl/dataset/MovementAAL_RSS_2.csv')

In [6]:
df1.head()

Unnamed: 0,#RSS_anchor1,RSS_anchor2,RSS_anchor3,RSS_anchor4
0,-0.90476,-0.48,0.28571,0.3
1,-0.57143,-0.32,0.14286,0.3
2,-0.38095,-0.28,-0.14286,0.35
3,-0.28571,-0.2,-0.47619,0.35
4,-0.14286,-0.2,0.14286,-0.2


In [7]:
df2.head()

Unnamed: 0,#RSS_anchor1,RSS_anchor2,RSS_anchor3,RSS_anchor4
0,-0.57143,-0.2,0.71429,0.5
1,-0.7619,-0.48,0.7619,-0.25
2,-0.85714,-0.6,0.85714,0.55
3,-0.7619,-0.4,0.71429,0.6
4,-0.7619,-0.84,0.85714,0.45


In [8]:
df1.shape, df2.shape

((27, 4), (26, 4))

The files contain normalized data from the four sensors – A1, A2, A3, A4. The length of the csv files (number of rows) vary, since the data corresponding to each csv is for a different duration. To simplify things, let us suppose the sensor data is collected every second. The first reading was for a duration of 27 seconds (so 27 rows), while another reading was for 26 seconds (so 26 rows).

We will have to deal with this varying length before we build our model. For now, we will read and store the values from the sensors in a list using the following code block:

In [9]:
path = './MovementAAl/dataset/MovementAAL_RSS_'
sequences = []
for i in range(1, 315):
    file_path = path + str(i) + '.csv'
    print(file_path)
    df = pd.read_csv(file_path, header=0)
    values = df.values
    sequences.append(values)
    
targets = pd.read_csv('./MovementAAl/dataset/MovementAAL_target.csv')
targets = targets.values[:,1]

./MovementAAl/dataset/MovementAAL_RSS_1.csv
./MovementAAl/dataset/MovementAAL_RSS_2.csv
./MovementAAl/dataset/MovementAAL_RSS_3.csv
./MovementAAl/dataset/MovementAAL_RSS_4.csv
./MovementAAl/dataset/MovementAAL_RSS_5.csv
./MovementAAl/dataset/MovementAAL_RSS_6.csv
./MovementAAl/dataset/MovementAAL_RSS_7.csv
./MovementAAl/dataset/MovementAAL_RSS_8.csv
./MovementAAl/dataset/MovementAAL_RSS_9.csv
./MovementAAl/dataset/MovementAAL_RSS_10.csv
./MovementAAl/dataset/MovementAAL_RSS_11.csv
./MovementAAl/dataset/MovementAAL_RSS_12.csv
./MovementAAl/dataset/MovementAAL_RSS_13.csv
./MovementAAl/dataset/MovementAAL_RSS_14.csv
./MovementAAl/dataset/MovementAAL_RSS_15.csv
./MovementAAl/dataset/MovementAAL_RSS_16.csv
./MovementAAl/dataset/MovementAAL_RSS_17.csv
./MovementAAl/dataset/MovementAAL_RSS_18.csv
./MovementAAl/dataset/MovementAAL_RSS_19.csv
./MovementAAl/dataset/MovementAAL_RSS_20.csv
./MovementAAl/dataset/MovementAAL_RSS_21.csv
./MovementAAl/dataset/MovementAAL_RSS_22.csv
./MovementAAl/datas

./MovementAAl/dataset/MovementAAL_RSS_230.csv
./MovementAAl/dataset/MovementAAL_RSS_231.csv
./MovementAAl/dataset/MovementAAL_RSS_232.csv
./MovementAAl/dataset/MovementAAL_RSS_233.csv
./MovementAAl/dataset/MovementAAL_RSS_234.csv
./MovementAAl/dataset/MovementAAL_RSS_235.csv
./MovementAAl/dataset/MovementAAL_RSS_236.csv
./MovementAAl/dataset/MovementAAL_RSS_237.csv
./MovementAAl/dataset/MovementAAL_RSS_238.csv
./MovementAAl/dataset/MovementAAL_RSS_239.csv
./MovementAAl/dataset/MovementAAL_RSS_240.csv
./MovementAAl/dataset/MovementAAL_RSS_241.csv
./MovementAAl/dataset/MovementAAL_RSS_242.csv
./MovementAAl/dataset/MovementAAL_RSS_243.csv
./MovementAAl/dataset/MovementAAL_RSS_244.csv
./MovementAAl/dataset/MovementAAL_RSS_245.csv
./MovementAAl/dataset/MovementAAL_RSS_246.csv
./MovementAAl/dataset/MovementAAL_RSS_247.csv
./MovementAAl/dataset/MovementAAL_RSS_248.csv
./MovementAAl/dataset/MovementAAL_RSS_249.csv
./MovementAAl/dataset/MovementAAL_RSS_250.csv
./MovementAAl/dataset/MovementAAL_

In [23]:
targets

array([ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1

In [24]:
groups

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,

We now have a list ‘sequences’ that contains the data from the motion sensors and ‘targets’ which holds the labels for the csv files. When we print sequences[0], we get the values of sensors from the first csv file:

In [10]:
sequences[0]

array([[-0.90476 , -0.48    ,  0.28571 ,  0.3     ],
       [-0.57143 , -0.32    ,  0.14286 ,  0.3     ],
       [-0.38095 , -0.28    , -0.14286 ,  0.35    ],
       [-0.28571 , -0.2     , -0.47619 ,  0.35    ],
       [-0.14286 , -0.2     ,  0.14286 , -0.2     ],
       [-0.14286 , -0.2     ,  0.047619,  0.      ],
       [-0.14286 , -0.16    , -0.38095 ,  0.2     ],
       [-0.14286 , -0.04    , -0.61905 , -0.2     ],
       [-0.095238, -0.08    ,  0.14286 , -0.55    ],
       [-0.047619,  0.04    , -0.095238,  0.05    ],
       [-0.19048 , -0.04    ,  0.095238,  0.4     ],
       [-0.095238, -0.04    , -0.14286 ,  0.35    ],
       [-0.33333 , -0.08    , -0.28571 , -0.2     ],
       [-0.2381  ,  0.04    ,  0.14286 ,  0.35    ],
       [ 0.      ,  0.08    ,  0.14286 ,  0.05    ],
       [-0.095238,  0.04    ,  0.095238,  0.1     ],
       [-0.14286 , -0.2     ,  0.14286 ,  0.5     ],
       [-0.19048 ,  0.04    , -0.42857 ,  0.3     ],
       [-0.14286 , -0.08    , -0.2381  ,  0.15

As mentioned previously, the dataset was collected in three different pairs of rooms – hence three groups. This information can be used to divide the dataset into train, test and validation sets. We will load the DatasetGroup csv file now

In [11]:
groups = pd.read_csv('./MovementAAl/groups/MovementAAL_DatasetGroup.csv', header=0)
groups = groups.values[:,1]

## Preprocessing Steps

Since the time series data is of varying length, we cannot directly build a model on this dataset. So how can we decide the ideal length of a series? There are multiple ways in which we can deal with it and here are a few ideas (I would love to hear your suggestions in the comment section):

Pad the shorter sequences with zeros to make the length of all the series equal. In this case, we will be feeding incorrect data to the model
Find the maximum length of the series and pad the sequence with the data in the last row
Identify the minimum length of the series in the dataset and truncate all the other series to that length. However, this will result in a huge loss of data
Take the mean of all the lengths, truncate the longer series, and pad the series which are shorter than the mean length

Let’s find out the minimum, maximum and mean length:

In [12]:
len_sequences = []
for one_seq in sequences:
    len_sequences.append(len(one_seq))
    
pd.Series(len_sequences).describe()

count    314.000000
mean      42.028662
std       16.185303
min       19.000000
25%       26.000000
50%       41.000000
75%       56.000000
max      129.000000
dtype: float64

Most of the files have lengths between 40 to 60. Just 3 files are coming up with a length more than 100. Thus, taking the minimum or maximum length does not make much sense. The 90th quartile comes out to be 60, which is taken as the length of sequence for the data. Let’s code it out:

In [13]:
# padding the sequence with the values in last row to max length
to_pad = 129
new_seq = []
for one_seq in sequences:
    len_one_seq = len(one_seq)
    last_val = one_seq[-1]
    n = to_pad - len_one_seq
    to_concat = np.repeat(one_seq[-1], n).reshape(4, n).transpose()
    new_one_seq = np.concatenate([one_seq, to_concat])
    new_seq.append(new_one_seq)

final_seq = np.stack(new_seq)

# truncate the sequence to length 60
seq_len = 60 
final_seq = sequence.pad_sequences(final_seq, maxlen=seq_len, padding='post', dtype='float', truncating='post')

In [18]:
train = [final_seq[i] for i in range(len(groups)) if (groups[i] == 2)]
validation = [final_seq[i] for i in range(len(groups)) if (groups[i] == 1)]
test = [final_seq[i] for i in range(len(groups)) if groups[i] == 3]
train_target = [targets[i] for i in range(len(groups)) if (groups[i] == 2)]
validation_target = [targets[i] for i in range(len(groups)) if (groups[i] == 1)]
test_target = [targets[i] for i in range(len(groups)) if (groups[i] == 3)]
train = np.array(train)
validation = np.array(validation)
test = np.array(test)
train_target = np.array(train_target)
train_target = (train_target + 1) / 2
validation_target = np.array(validation_target)
validation_target = (validation_target + 1) / 2
test_target = np.array(test_target)
test_target = (test_target + 1) / 2

## Building a Time Series Classification model

We have prepared the data to be used for an LSTM (Long Short Term Memory) model. We dealt with the variable length sequence and created the train, validation and test sets. Let’s build a single layer LSTM network.

In [19]:
model = Sequential()
model.add(LSTM(256, input_shape=(seq_len, 4)))
model.add(Dense(1, activation='sigmoid'))
model.summary()

W0717 10:32:44.759997 4472313280 deprecation_wrapper.py:119] From /Users/Jianhua/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0717 10:32:45.072867 4472313280 deprecation_wrapper.py:119] From /Users/Jianhua/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0717 10:32:45.179292 4472313280 deprecation_wrapper.py:119] From /Users/Jianhua/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 256)               267264    
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 257       
Total params: 267,521
Trainable params: 267,521
Non-trainable params: 0
_________________________________________________________________


In [22]:
adam = Adam(lr=0.001)
chk = ModelCheckpoint('best_model.pkl', monitor='val_acc', save_best_only=True, mode='max', verbose=1)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
model.fit(train, train_target, epochs=200, batch_size=128, callbacks=[chk], validation_data=(validation, validation_target))

# loading the model and checking accuracy on the test data 
model = load_model('best_model.pkl')
test_preds = model.predict_classes(test)
accuracy_score(test_target, test_preds)

Train on 106 samples, validate on 104 samples
Epoch 1/200

Epoch 00001: val_acc improved from -inf to 0.59615, saving model to best_model.pkl
Epoch 2/200

Epoch 00002: val_acc did not improve from 0.59615
Epoch 3/200

Epoch 00003: val_acc improved from 0.59615 to 0.61538, saving model to best_model.pkl
Epoch 4/200

Epoch 00004: val_acc did not improve from 0.61538
Epoch 5/200

Epoch 00005: val_acc did not improve from 0.61538
Epoch 6/200

Epoch 00006: val_acc did not improve from 0.61538
Epoch 7/200

Epoch 00007: val_acc did not improve from 0.61538
Epoch 8/200

Epoch 00008: val_acc improved from 0.61538 to 0.67308, saving model to best_model.pkl
Epoch 9/200

Epoch 00009: val_acc did not improve from 0.67308
Epoch 10/200

Epoch 00010: val_acc did not improve from 0.67308
Epoch 11/200

Epoch 00011: val_acc did not improve from 0.67308
Epoch 12/200

Epoch 00012: val_acc did not improve from 0.67308
Epoch 13/200

Epoch 00013: val_acc did not improve from 0.67308
Epoch 14/200

Epoch 00014:


Epoch 00044: val_acc did not improve from 0.67308
Epoch 45/200

Epoch 00045: val_acc did not improve from 0.67308
Epoch 46/200

Epoch 00046: val_acc did not improve from 0.67308
Epoch 47/200

Epoch 00047: val_acc did not improve from 0.67308
Epoch 48/200

Epoch 00048: val_acc did not improve from 0.67308
Epoch 49/200

Epoch 00049: val_acc did not improve from 0.67308
Epoch 50/200

Epoch 00050: val_acc did not improve from 0.67308
Epoch 51/200

Epoch 00051: val_acc did not improve from 0.67308
Epoch 52/200

Epoch 00052: val_acc did not improve from 0.67308
Epoch 53/200

Epoch 00053: val_acc did not improve from 0.67308
Epoch 54/200

Epoch 00054: val_acc did not improve from 0.67308
Epoch 55/200

Epoch 00055: val_acc did not improve from 0.67308
Epoch 56/200

Epoch 00056: val_acc did not improve from 0.67308
Epoch 57/200

Epoch 00057: val_acc did not improve from 0.67308
Epoch 58/200

Epoch 00058: val_acc did not improve from 0.67308
Epoch 59/200

Epoch 00059: val_acc did not improve fr


Epoch 00089: val_acc did not improve from 0.67308
Epoch 90/200

Epoch 00090: val_acc did not improve from 0.67308
Epoch 91/200

Epoch 00091: val_acc did not improve from 0.67308
Epoch 92/200

Epoch 00092: val_acc did not improve from 0.67308
Epoch 93/200

Epoch 00093: val_acc did not improve from 0.67308
Epoch 94/200

Epoch 00094: val_acc did not improve from 0.67308
Epoch 95/200

Epoch 00095: val_acc did not improve from 0.67308
Epoch 96/200

Epoch 00096: val_acc did not improve from 0.67308
Epoch 97/200

Epoch 00097: val_acc did not improve from 0.67308
Epoch 98/200

Epoch 00098: val_acc did not improve from 0.67308
Epoch 99/200

Epoch 00099: val_acc did not improve from 0.67308
Epoch 100/200

Epoch 00100: val_acc did not improve from 0.67308
Epoch 101/200

Epoch 00101: val_acc did not improve from 0.67308
Epoch 102/200

Epoch 00102: val_acc did not improve from 0.67308
Epoch 103/200

Epoch 00103: val_acc did not improve from 0.67308
Epoch 104/200

Epoch 00104: val_acc did not impro


Epoch 00132: val_acc did not improve from 0.76923
Epoch 133/200

Epoch 00133: val_acc did not improve from 0.76923
Epoch 134/200

Epoch 00134: val_acc did not improve from 0.76923
Epoch 135/200

Epoch 00135: val_acc did not improve from 0.76923
Epoch 136/200

Epoch 00136: val_acc did not improve from 0.76923
Epoch 137/200

Epoch 00137: val_acc did not improve from 0.76923
Epoch 138/200

Epoch 00138: val_acc did not improve from 0.76923
Epoch 139/200

Epoch 00139: val_acc did not improve from 0.76923
Epoch 140/200

Epoch 00140: val_acc did not improve from 0.76923
Epoch 141/200

Epoch 00141: val_acc did not improve from 0.76923
Epoch 142/200

Epoch 00142: val_acc did not improve from 0.76923
Epoch 143/200

Epoch 00143: val_acc did not improve from 0.76923
Epoch 144/200

Epoch 00144: val_acc did not improve from 0.76923
Epoch 145/200

Epoch 00145: val_acc did not improve from 0.76923
Epoch 146/200

Epoch 00146: val_acc did not improve from 0.76923
Epoch 147/200

Epoch 00147: val_acc did


Epoch 00176: val_acc did not improve from 0.76923
Epoch 177/200

Epoch 00177: val_acc did not improve from 0.76923
Epoch 178/200

Epoch 00178: val_acc did not improve from 0.76923
Epoch 179/200

Epoch 00179: val_acc did not improve from 0.76923
Epoch 180/200

Epoch 00180: val_acc did not improve from 0.76923
Epoch 181/200

Epoch 00181: val_acc did not improve from 0.76923
Epoch 182/200

Epoch 00182: val_acc did not improve from 0.76923
Epoch 183/200

Epoch 00183: val_acc did not improve from 0.76923
Epoch 184/200

Epoch 00184: val_acc did not improve from 0.76923
Epoch 185/200

Epoch 00185: val_acc did not improve from 0.76923
Epoch 186/200

Epoch 00186: val_acc did not improve from 0.76923
Epoch 187/200

Epoch 00187: val_acc did not improve from 0.76923
Epoch 188/200

Epoch 00188: val_acc did not improve from 0.76923
Epoch 189/200

Epoch 00189: val_acc did not improve from 0.76923
Epoch 190/200

Epoch 00190: val_acc did not improve from 0.76923
Epoch 191/200

Epoch 00191: val_acc did

0.7692307692307693