# Revisiting the NYC Taxi DataSet Model Architecture Part 5

In this notebook, the model architecture is being changed in order to understand the correlation between the model architecture and the prediction quality.
This model uses sequences of time components for the GRU network component.
That component consists of two GRU layers which are then flattened and concatenated with the time components of the target data.
The deep learning problem can thus be described as follows:
Predict the (N+1)th location using time components of the target location and a history of prior time data in form a sequence of N time components.
All time components are sin-cos-transformed.
This model and all future changed model architectures also use the changed network architecture from Part 1.
This architecture only predicts the relevant (N+1)th location and not all next locations for every sequence element like it was in the original architecture.


In [4]:
import numpy as np
np.random.seed(0)
import tensorflow as tf
import pandas as pd
from tensorflow import feature_column
from tensorflow.keras import layers
import import_ipynb

In [5]:
from model_helper import ModelHelper

# Dataset

In [6]:
df = pd.read_csv("./ma_results/trips_with_zones_final.csv")
df = df.head(10000000)
df.head(10)

Unnamed: 0,medallion,pickup_week_day,pickup_hour,pickup_day,pickup_month,dropoff_week_day,dropoff_hour,dropoff_day,dropoff_month,pickup_location_id,dropoff_location_id
0,00005007A9F30E289E760362F69E4EAD,1,0,1,1,1,0,1,1,162.0,262.0
1,00005007A9F30E289E760362F69E4EAD,1,0,1,1,1,0,1,1,262.0,239.0
2,00005007A9F30E289E760362F69E4EAD,1,0,1,1,1,1,1,1,239.0,236.0
3,00005007A9F30E289E760362F69E4EAD,1,1,1,1,1,1,1,1,236.0,41.0
4,00005007A9F30E289E760362F69E4EAD,1,1,1,1,1,1,1,1,41.0,211.0
5,00005007A9F30E289E760362F69E4EAD,1,1,1,1,1,2,1,1,211.0,238.0
6,00005007A9F30E289E760362F69E4EAD,1,2,1,1,1,2,1,1,238.0,142.0
7,00005007A9F30E289E760362F69E4EAD,1,2,1,1,1,2,1,1,142.0,263.0
8,00005007A9F30E289E760362F69E4EAD,1,2,1,1,1,3,1,1,263.0,48.0
9,00005007A9F30E289E760362F69E4EAD,1,3,1,1,1,3,1,1,48.0,246.0


In [7]:
# Check dtypes of the attributes
df.dtypes

medallion               object
pickup_week_day          int64
pickup_hour              int64
pickup_day               int64
pickup_month             int64
dropoff_week_day         int64
dropoff_hour             int64
dropoff_day              int64
dropoff_month            int64
pickup_location_id     float64
dropoff_location_id    float64
dtype: object

In [8]:
# Drop the medallion, it is not needed for this example
df.drop(['medallion'], axis=1, inplace=True)

Because there are too many taxis (over 9000) it is better to take the 100 taxi with the major number of records

In [9]:
# Cast the columns type to int32
dictionary = {'pickup_week_day': 'int32', 'pickup_hour': 'int32', 'pickup_day': 'int32', 'pickup_month': 'int32', 'dropoff_week_day': 'int32', 'dropoff_hour': 'int32', 'dropoff_day': 'int32', 'dropoff_month': 'int32', 'pickup_location_id':'int32', 'dropoff_location_id':'int32'}
df = df.astype(dictionary, copy=True)
df.dtypes

pickup_week_day        int32
pickup_hour            int32
pickup_day             int32
pickup_month           int32
dropoff_week_day       int32
dropoff_hour           int32
dropoff_day            int32
dropoff_month          int32
pickup_location_id     int32
dropoff_location_id    int32
dtype: object

We can use the other taxis to create a local test and validation sets

Now we need to create the location sequence for each user

In [10]:
mh = ModelHelper(df, 129)

In [11]:
# Call the function
mh.df_to_location_sequence()

print(mh.df)

            index  location_id  day  month  hour_sin      hour_cos  \
0               0          162    1      1  0.000000  1.000000e+00   
1              12          230    1      1  0.707107  7.071068e-01   
2              13          125    1      1  0.707107  7.071068e-01   
3              15           48    1      1  0.866025  5.000000e-01   
4              18          170    1      1  1.000000  6.123234e-17   
...           ...          ...  ...    ...       ...           ...   
13731996  7284341          161   26      1 -0.500000 -8.660254e-01   
13731997  7284341          161   26      1 -0.500000 -8.660254e-01   
13731998  7284342          132   26      1 -0.707107 -7.071068e-01   
13731999  7284343          141   26      1 -0.866025 -5.000000e-01   
13732000  7284344          141   26      1 -0.866025 -5.000000e-01   

          week_day_sin  week_day_cos  weekend  
0             0.781831      0.623490        0  
1             0.781831      0.623490        0  
2             0

In [12]:
mh.train_val_test_split()
print(len(mh.df_train), 'train examples')
print(len(mh.df_val), 'validation examples')
print(len(mh.df_test), 'test examples')

8788480 train examples
2197120 validation examples
2746401 test examples


In [13]:
mh.split_data()
mh.list_test[0]

Unnamed: 0,index,location_id,day,month,hour_sin,hour_cos,week_day_sin,week_day_cos,weekend
10985600,5283998,246,4,1,-0.866025,0.500000,-0.433884,-0.900969,0
10985601,5283999,107,4,1,-0.707107,0.707107,-0.433884,-0.900969,0
10985602,5284000,142,4,1,-0.707107,0.707107,-0.433884,-0.900969,0
10985603,5284001,48,4,1,-0.500000,0.866025,-0.433884,-0.900969,0
10985604,5284001,48,4,1,-0.500000,0.866025,-0.433884,-0.900969,0
...,...,...,...,...,...,...,...,...,...
10985724,5284091,234,7,1,0.000000,1.000000,0.000000,1.000000,0
10985725,5284092,162,7,1,0.258819,0.965926,0.000000,1.000000,0
10985726,5284093,142,7,1,0.500000,0.866025,0.000000,1.000000,0
10985727,5284093,142,7,1,0.500000,0.866025,0.000000,1.000000,0


In [14]:
mh.set_batch_size(128)
mh.create_batch_dataset()
mh.test_dataset

<BatchDataset shapes: ({start_place: (128, 128), start_hour_sin: (128, 128), start_hour_cos: (128, 128), weekend: (128, 128), week_day_sin: (128, 128), week_day_cos: (128, 128), end_hour_sin: (128,), end_hour_cos: (128,), end_weekend: (128,), end_week_day_sin: (128,), end_week_day_cos: (128,)}, (128,)), types: ({start_place: tf.int32, start_hour_sin: tf.float64, start_hour_cos: tf.float64, weekend: tf.int32, week_day_sin: tf.float64, week_day_cos: tf.float64, end_hour_sin: tf.float64, end_hour_cos: tf.float64, end_weekend: tf.int32, end_week_day_sin: tf.float64, end_week_day_cos: tf.float64}, tf.int32)>

In [15]:
mh.set_target_column_name('location_id')
mh.set_vocab_size()
mh.set_numerical_column_names(['start_hour_sin', 'start_hour_cos', 'weekend', 'week_day'])

In [16]:
# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 256

# Create a model
def create_model():
  N = mh.total_window_length
  batch_size = mh.batch_size
  number_of_places = mh.vocab_size

  # Shortcut to the layers package
  l = tf.keras.layers

  # Now we need to define an input dictionary.
  # Where the keys are the column names
  # This is a model with multiple inputs, so we need to declare and input layer for each feature
  feature_inputs = {
    'start_hour_sin': tf.keras.Input((N-1, ), batch_size=batch_size, name='start_hour_sin'),
    'start_hour_cos': tf.keras.Input((N-1, ), batch_size=batch_size, name='start_hour_cos'),
    'weekend': tf.keras.Input((N-1, ), batch_size=batch_size, name='weekend'),
    'week_day_sin': tf.keras.Input((N-1, ), batch_size=batch_size, name='week_day_sin'),
    'week_day_cos': tf.keras.Input((N-1, ), batch_size=batch_size, name='week_day_cos'),
  }

  other_feature_inputs = {
    'end_hour_sin': tf.keras.Input((1, ), batch_size=batch_size, name='end_hour_sin'),
    'end_hour_cos': tf.keras.Input((1, ), batch_size=batch_size, name='end_hour_cos'),
    'end_weekend': tf.keras.Input((1, ), batch_size=batch_size, name='end_weekend'),
    'end_week_day_sin': tf.keras.Input((1, ), batch_size=batch_size, name='end_week_day_sin'),
    'end_week_day_cos': tf.keras.Input((1, ), batch_size=batch_size, name='end_week_day_cos')
  }

  # We cannot use an array of features as always because we have sequences, and we cannot match the shape otherwise
  # We have to do one by one
  start_hour_sin = feature_column.numeric_column("start_hour_sin", shape=(N-1))
  hour_sin_feature = l.DenseFeatures(start_hour_sin)(feature_inputs)

  start_hour_cos = feature_column.numeric_column("start_hour_cos", shape=(N-1))
  hour_cos_feature = l.DenseFeatures(start_hour_cos)(feature_inputs)

  weekend = feature_column.numeric_column("weekend", shape=(N-1))
  weekend_feature = l.DenseFeatures(weekend)(feature_inputs)

  week_day_sin = feature_column.numeric_column("week_day_sin", shape=(N-1))
  week_day_sin_feature = l.DenseFeatures(week_day_sin)(feature_inputs)

  week_day_cos = feature_column.numeric_column("week_day_cos", shape=(N-1))
  week_day_cos_feature = l.DenseFeatures(week_day_cos)(feature_inputs)

  end_hour_sin = feature_column.numeric_column("end_hour_sin", shape=(1))
  end_hour_sin_feature = l.DenseFeatures(end_hour_sin)(other_feature_inputs)

  end_hour_cos = feature_column.numeric_column("end_hour_cos", shape=(1))
  end_hour_cos_feature = l.DenseFeatures(end_hour_cos)(other_feature_inputs)

  end_weekend = feature_column.numeric_column("end_weekend", shape=(1))
  end_weekend_feature = l.DenseFeatures(end_weekend)(other_feature_inputs)

  end_week_day_sin = feature_column.numeric_column("end_week_day_sin", shape=(1))
  end_week_day_sin_feature = l.DenseFeatures(end_week_day_sin)(other_feature_inputs)

  end_week_day_cos = feature_column.numeric_column("end_week_day_cos", shape=(1))
  end_week_day_cos_feature = l.DenseFeatures(end_week_day_cos)(other_feature_inputs)

  # We have also to add a dimension to then concatenate
  hour_sin_feature = tf.expand_dims(hour_sin_feature, -1)
  hour_cos_feature = tf.expand_dims(hour_cos_feature, -1)
  weekend_feature = tf.expand_dims(weekend_feature, -1)
  week_day_sin_feature = tf.expand_dims(week_day_sin_feature, -1)
  week_day_cos_feature = tf.expand_dims(week_day_cos_feature, -1)

  input_sequence = l.Concatenate(axis=2)([hour_sin_feature, hour_cos_feature, weekend_feature, week_day_sin_feature, week_day_cos_feature])

  # Rnn
  recurrent = l.GRU(rnn_units,
                        batch_size=batch_size, #in case of stateful
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform')(input_sequence)

  recurrent_2 = l.GRU(64,
                        batch_size=batch_size, #in case of stateful
                        stateful=True,
                        recurrent_initializer='glorot_uniform')(recurrent)


  flatten = l.Flatten()(recurrent_2)

  concatenate_2 = l.Concatenate(axis=1)([flatten, end_hour_sin_feature, end_hour_cos_feature, end_weekend_feature, end_week_day_sin_feature, end_week_day_cos_feature])

  # Last layer with an output for each places
  dense_1 = layers.Dense(number_of_places)(concatenate_2)

  # Softmax output layer
  output = l.Softmax()(dense_1)

  # To return the Model, we need to define its inputs and outputs
  # In out case, we need to list all the input layers we have defined
  inputs = list(feature_inputs.values()) + list(other_feature_inputs.values())

  # Return the Model
  return tf.keras.Model(inputs=inputs, outputs=output)

In [17]:
# Get the model and compile it
mh.assign_model(create_model())
mh.compile_model()

# Training

In [18]:
mh.set_num_epochs(10)
mh.fit_model()

Epoch 1/10


  [n for n in tensors.keys() if n not in ref_input_names])


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 00010: early stopping


# Evaluation

In [19]:
mh.evaluate_model()



In [20]:
mh.model.summary()

Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
start_hour_cos (InputLayer)     [(128, 128)]         0                                            
__________________________________________________________________________________________________
start_hour_sin (InputLayer)     [(128, 128)]         0                                            
__________________________________________________________________________________________________
week_day_cos (InputLayer)       [(128, 128)]         0                                            
__________________________________________________________________________________________________
week_day_sin (InputLayer)       [(128, 128)]         0                                            
_______________________________________________________________________________________

In [21]:
mh.print_test_prediction_info()

logits
Shape :  (21248, 264)
Example [0] :  [3.36034689e-03 4.69429360e-04 4.28359016e-07 2.14406464e-05
 6.61201263e-03 7.34902244e-07 2.11368133e-05 1.06455879e-02
 1.02275852e-04 1.42888850e-04 2.28962745e-04 1.49397194e-04
 3.25455854e-04 8.54633935e-03 2.73698615e-03 2.44518829e-04
 1.05612693e-04 3.28943273e-03 2.28187928e-04 2.35042262e-05
 4.03467820e-05 8.43185771e-05 6.90974237e-04 5.34979481e-05
 3.24554066e-03 5.22627868e-03 2.03456948e-04 1.09229347e-06
 2.66153947e-04 3.42656509e-04 6.71690316e-07 3.03401721e-05
 3.03126399e-05 4.74334788e-03 1.79150913e-04 9.34643831e-05
 2.28582323e-03 7.20618898e-03 3.78258701e-05 2.12406783e-04
 3.11835250e-03 5.94239729e-03 6.81134546e-03 3.61590181e-03
 1.73168446e-05 5.13413316e-03 5.36656553e-05 6.81316669e-05
 2.85392106e-02 6.46747975e-03 8.35854746e-03 3.82633516e-05
 2.15441966e-03 1.50351465e-04 3.42050858e-04 8.48723794e-05
 1.54884212e-04 3.97578674e-07 2.04560965e-05 5.04799573e-07
 1.20761033e-04 4.89324098e-03 1.03958650

As expected, adding the end time to the mix does not change the bad performance of the prediction based on time components.