# STEP 4 - Dynamic Windowing

A tutorial for timeseries prediction with tf was worked through. Here the advantages of Sliding Windows were discussed. Their application possibilities are tested in the following.
Sliding windows allow to extract more sequences from the sequenced data than it has been done so far.
So far, whole chunks are taken from the data, which are free of overlap. With Sliding Windows, the window is continuously slid over the data set to obtain sequences.
With little data available, this could lead to better prediction accuracy.

First, the dynamic windowing was tested on the 4square dataset where it proved to be useful and results were acceptable.
Next, it is supposed to be used for the NYC taxi dataset to see if it leads to an improvement of the prediction accuracy.

In [1]:
import tensorflow as tf
import pandas as pd
import import_ipynb

In [2]:
from model_helper import ModelHelper

importing Jupyter notebook from model_helper.ipynb


In [3]:
df = pd.read_csv("./ma_results/trips_with_zones_final.csv")

Only use the top 100 taxis.

In [4]:
df.medallion.value_counts().loc[df.medallion.value_counts().index[100]]
count = df.medallion.value_counts()

medallions = count.loc[count.index[:100]].index

df = df.loc[df.medallion.isin(medallions)]

df.head(10)

Unnamed: 0,medallion,pickup_week_day,pickup_hour,pickup_day,pickup_month,dropoff_week_day,dropoff_hour,dropoff_day,dropoff_month,pickup_location_id,dropoff_location_id
39179,009D3CCA83486B03FCE736A2F642CBA8,1,0,1,1,1,0,1,1,161.0,141.0
39180,009D3CCA83486B03FCE736A2F642CBA8,1,0,1,1,1,0,1,1,162.0,233.0
39181,009D3CCA83486B03FCE736A2F642CBA8,1,0,1,1,1,1,1,1,170.0,163.0
39182,009D3CCA83486B03FCE736A2F642CBA8,1,1,1,1,1,1,1,1,163.0,143.0
39183,009D3CCA83486B03FCE736A2F642CBA8,1,1,1,1,1,1,1,1,143.0,48.0
39184,009D3CCA83486B03FCE736A2F642CBA8,1,1,1,1,1,1,1,1,48.0,107.0
39185,009D3CCA83486B03FCE736A2F642CBA8,1,1,1,1,1,1,1,1,107.0,236.0
39186,009D3CCA83486B03FCE736A2F642CBA8,1,2,1,1,1,2,1,1,236.0,186.0
39187,009D3CCA83486B03FCE736A2F642CBA8,1,2,1,1,1,2,1,1,186.0,25.0
39188,009D3CCA83486B03FCE736A2F642CBA8,1,2,1,1,1,3,1,1,25.0,36.0


Initiate the ModelHelper class to slim down the following code.

In [5]:
mh = ModelHelper(df, 17)
mh.set_client_column_name('medallion')
mh.set_client_column_ids()

Generate location sequences.

In [6]:
mh.df_to_location_sequence()
mh.df

Unnamed: 0,index,location_id,day,month,hour_sin,hour_cos,week_day_sin,week_day_cos,weekend
0,39179,161.0,1,1,0.000000,1.000000,0.781831,0.62349,0
1,39179,161.0,1,1,0.000000,1.000000,0.781831,0.62349,0
2,39180,162.0,1,1,0.000000,1.000000,0.781831,0.62349,0
3,39186,236.0,1,1,0.500000,0.866025,0.781831,0.62349,0
4,39193,237.0,1,1,0.866025,0.500000,0.781831,0.62349,0
...,...,...,...,...,...,...,...,...,...
253794,10737635,138.0,13,1,0.866025,-0.500000,-0.781831,0.62349,1
253795,10737635,138.0,13,1,0.866025,-0.500000,-0.781831,0.62349,1
253796,10737636,74.0,13,1,0.707107,-0.707107,-0.781831,0.62349,1
253797,10737636,74.0,13,1,0.707107,-0.707107,-0.781831,0.62349,1


Set the vocab size before creating user locations sequences.

In [7]:
mh.set_target_column_name('location_id')
mh.set_vocab_size()
mh.vocab_size

264

Reset the dataframe to the original dataframe to create the sequences for each user separately.

In [8]:
mh.reset_df(df)
mh.df

Unnamed: 0,medallion,pickup_week_day,pickup_hour,pickup_day,pickup_month,dropoff_week_day,dropoff_hour,dropoff_day,dropoff_month,pickup_location_id,dropoff_location_id
39179,009D3CCA83486B03FCE736A2F642CBA8,1,0,1,1,1,0,1,1,161.0,141.0
39180,009D3CCA83486B03FCE736A2F642CBA8,1,0,1,1,1,0,1,1,162.0,233.0
39181,009D3CCA83486B03FCE736A2F642CBA8,1,0,1,1,1,1,1,1,170.0,163.0
39182,009D3CCA83486B03FCE736A2F642CBA8,1,1,1,1,1,1,1,1,163.0,143.0
39183,009D3CCA83486B03FCE736A2F642CBA8,1,1,1,1,1,1,1,1,143.0,48.0
...,...,...,...,...,...,...,...,...,...,...,...
14931692,FF40FB8123940D9F96D33EDA1D92A83C,3,21,31,1,3,21,31,1,79.0,7.0
14931693,FF40FB8123940D9F96D33EDA1D92A83C,3,21,31,1,3,22,31,1,7.0,237.0
14931694,FF40FB8123940D9F96D33EDA1D92A83C,3,22,31,1,3,22,31,1,237.0,246.0
14931695,FF40FB8123940D9F96D33EDA1D92A83C,3,22,31,1,3,22,31,1,237.0,106.0


Create location sequences for each user.

In [9]:
mh.create_users_locations()

100%|██████████| 100/100 [00:01<00:00, 78.37it/s]


[      index  location_id  day  month  hour_sin  hour_cos  week_day_sin  \
 0     39179        161.0    1      1  0.000000  1.000000      0.781831   
 1     39179        161.0    1      1  0.000000  1.000000      0.781831   
 2     39180        162.0    1      1  0.000000  1.000000      0.781831   
 3     39186        236.0    1      1  0.500000  0.866025      0.781831   
 4     39193        237.0    1      1  0.866025  0.500000      0.781831   
 ...     ...          ...  ...    ...       ...       ...           ...   
 2526  40585         90.0   22      1 -0.500000  0.866025      0.781831   
 2527  40586        163.0   22      1 -0.500000  0.866025      0.781831   
 2528  40587        116.0   22      1 -0.258819  0.965926      0.781831   
 2529  40588        226.0   23      1  0.000000  1.000000      0.974928   
 2530  40588        226.0   23      1  0.000000  1.000000      0.974928   
 
       week_day_cos  weekend  
 0         0.623490        0  
 1         0.623490        0  
 2   

Concatenate the user sequences and perform a training validation test split.

In [10]:
mh.split_concat_user_df()
mh.df_train

Unnamed: 0,location_id,hour_sin,hour_cos,week_day_sin,week_day_cos,weekend
0,161,0.000000,1.000000,0.781831,0.62349,0
1,161,0.000000,1.000000,0.781831,0.62349,0
2,162,0.000000,1.000000,0.781831,0.62349,0
3,236,0.500000,0.866025,0.781831,0.62349,0
4,237,0.866025,0.500000,0.781831,0.62349,0
...,...,...,...,...,...,...
1727,48,0.000000,1.000000,-0.781831,0.62349,1
1728,66,0.258819,0.965926,-0.781831,0.62349,1
1729,66,0.258819,0.965926,-0.781831,0.62349,1
1730,144,0.258819,0.965926,-0.781831,0.62349,1


Generate and instantiate a window generator for the target column "location_id".

In [None]:
mh.set_batch_size(128)

In [12]:
mh.set_window_generator(['location_id'])

Create the windowed datasets.

In [13]:
mh.make_windowed_dataset()
mh.train_dataset

<MapDataset shapes: ((None, 16, 6), (None, 1, 1)), types: (tf.float32, tf.float32)>

The TensorSpecs are as expected:
* history of 16 locations as input
* one location as output (prediction/label)

In [19]:
mh.train_dataset.element_spec

(TensorSpec(shape=(None, 16, 6), dtype=tf.float32, name=None),
 TensorSpec(shape=(None, 1, 1), dtype=tf.float32, name=None))

Test with a dummy model (only DenseLayers)

In [None]:
input_shape = (mh.train_dataset.element_spec[0].shape[1], mh.train_dataset.element_spec[0].shape[2])
input_shape

In [26]:
dense = tf.keras.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units=128, activation='relu'),
    tf.keras.layers.Dense(units=128, activation='relu'),
    tf.keras.layers.Dense(units=mh.vocab_size)
])

In [32]:
mh.assign_model(dense)
mh.compile_model(optimizer_type=tf.keras.optimizers.Adam, learning_rate=0.002)
mh.fit_model(with_early_stopping=False)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [33]:
mh.evaluate_model()



Now, a basic GRU is used.

In [39]:
gru = tf.keras.Sequential()
gru.add(tf.keras.layers.GRU(128,return_sequences=True,input_shape=input_shape, activation='relu'))
gru.add(tf.keras.layers.GRU(64,input_shape=input_shape, activation='relu'))
gru.add(tf.keras.layers.Dropout(0.3))
gru.add(tf.keras.layers.Dense(mh.vocab_size))

In [40]:
mh.assign_model(gru)
mh.compile_model(optimizer_type=tf.keras.optimizers.Adam, learning_rate=0.002)
mh.fit_model(with_early_stopping=False)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [36]:
mh.evaluate_model()



Why does the simple model perform so badly compared to the original model?
The answer is not obvious because for the 4square dataset, both models (simple and complex architectures) perform relatively well.
With the NYC Taxi dataset the original model performs quite well.
On the other hand, a simple timeseries model like this does not even match a baseline prediction accuracy.
That question is to be analyzed in the following notebooks.