# Monitoring a Trip Time prediction model

In this example, we consider a regression task where we want to predict the trip time given the trip pickup & dropoff co-ordinates, number of passengers, booking date etc. 

Input: Features such as vendor_id, pickup_datetime, passenger_count, pickup_location, drop_location etc.
Output: Trip Duration (in seconds)

In this notebook, we will see how we can use UpTrain package to monitor model performance, run data integrity checks, and identify data drift.

In [1]:
import pandas as pd
from sklearn import model_selection, metrics
from lightgbm import LGBMRegressor
import time
import uptrain
from helper_funcs import *

In [2]:
base_dir = download_datasets()

In [3]:
df_train = pd.read_csv(os.path.join(base_dir, "train.csv"))
df_test = pd.read_csv(os.path.join(base_dir, "test.csv"))
print(df_train.head(3))

          id  vendor_id      pickup_datetime     dropoff_datetime  \
0  id2875421          2  2016-03-14 17:24:55  2016-03-14 17:32:30   
1  id2377394          1  2016-06-12 00:43:35  2016-06-12 00:54:38   
2  id3858529          2  2016-01-19 11:35:24  2016-01-19 12:10:48   

   passenger_count  pickup_longitude  pickup_latitude  dropoff_longitude  \
0                1        -73.982155        40.767937         -73.964630   
1                1        -73.980415        40.738564         -73.999481   
2                1        -73.979027        40.763939         -74.005333   

   dropoff_latitude store_and_fwd_flag  trip_duration  
0         40.765602                  N            455  
1         40.731152                  N            663  
2         40.710087                  N           2124  


In [4]:
df_train = process_training_data(df_train)
df_test = process_testing_data(df_test)

In [5]:
Y = df_train["trip_duration"]
X = df_train.drop(["trip_duration"], axis = 1)

X_train, X_val, y_train, y_val = model_selection.train_test_split(X, Y, test_size=0.1)

m = LGBMRegressor(n_estimators=500) 
m.fit(X_train,y_train)

In [6]:
preds = abs(m.predict(X_val))
err_val = metrics.mean_squared_log_error(y_val, preds)
print(f"Error Percentage = {err_val * 100} %")

Error Percentage = 33.98336490918488 %


In [None]:
cfg = {
    "checks": [{
        "type": uptrain.Anomaly.DATA_INTEGRITY,
        "measurable_args": {
            'type': uptrain.MeasurableType.INPUT_FEATURE,
            'feature_name': 'passenger_count'
        },
        "integrity_type": "greater_than",
        "threshold": 1
    }],
    "st_logging": True
}
framework = uptrain.Framework(cfg_dict=cfg)

batch_size = 256
cols = list(df_test.columns)
for idx in range(int(len(df_test)/batch_size)):
    this_elems = df_test[idx*batch_size: (idx+1)*batch_size]
    this_preds = abs(m.predict(this_elems))
    framework.log(inputs=this_elems, outputs=this_preds)
    time.sleep(0.01)

Deleting the folder:  uptrain_smart_data
Deleting the folder:  uptrain_logs

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.6.92:8501

  For better performance, install the Watchdog module:

  $ xcode-select --install
  $ pip install watchdog
            
  Stopping...
