# Inference

## Import Libraries

In [3]:
import pickle
import json
import pandas as pd
import numpy as np

## Import Model

In [4]:
with open('model_lin_reg.pkl', 'rb') as file_1:
    model_lin_reg = pickle.load(file_1)

with open('model_scaler.pkl', 'rb') as file_2:
    model_scaler = pickle.load(file_2)

with open('model_encoder_ord.pkl', 'rb') as file_3:
    model_encoder_ord = pickle.load(file_3)

with open("model_encoder.pkl", "rb") as file_4: 
    model_encoder = pickle.load(file_4)

with open('list_num_cols.txt', 'r') as file_5:
    list_num_cols = json.load(file_5)

with open('list_cat_cols.txt', 'r') as file_6:
    list_cat_cols = json.load(file_6)

with open('list_cat_cols_ord.txt', 'r') as file_7:
    list_cat_cols_ord = json.load(file_7)

## Create Dummy Data 

In [5]:
data_inf = {
    'weather': 'Rain',
    'cab_type': 'Lyft',
    'name': 'Lux Black XL',
    'distance': 2.33,
    'time': 'pm',
    'distance_level': 'medium',
    'is_weekend': 1,
    'surge_multiplier': 2,
}

data_inf = pd.DataFrame([data_inf])
data_inf

Unnamed: 0,weather,cab_type,name,distance,time,distance_level,is_weekend,surge_multiplier
0,Rain,Lyft,Lux Black XL,2.33,pm,medium,1,2


In [6]:
# Split num and cat cols
data_inf_cat = data_inf[list_cat_cols]
data_inf_num = data_inf[list_num_cols]
data_inf_cat_ord = data_inf[list_cat_cols_ord]
data_inf_cat


Unnamed: 0,cab_type,name
0,Lyft,Lux Black XL


In [7]:
# feature scaling and feature encoding
data_inf_num_scaled = model_scaler.transform(data_inf_num)
data_inf_cat_ord_encoded = model_encoder_ord.transform(data_inf_cat_ord)
data_inf_cat_encoded = model_encoder.transform(data_inf_cat)
data_inf_final = np.concatenate([data_inf_num_scaled, data_inf_cat_encoded, data_inf_cat_ord_encoded], axis=1)

In [8]:
# Predict using Linear Regression
y_pred_inf = model_lin_reg.predict(data_inf_final)
y_pred_inf

array([44.34921265])

## Conclusion

The predicted fare is `44.34 USD` and the model has an Error of `1.67%` for the training set and `1.82%` for the test set, it means that the actual fare values are expected to be within a range of `42.67 USD` to `45.01 USD` `(44.34 - 1.67 and 44.34 + 1.82)` for the training set and a range of `42.52 USD` to `45.16 USD` `(44.34 - 1.82 and 44.34 + 1.82)` for the test set.

The percentage error for the training set is `12.71%`, and for the test set, it is `13.02%`. This indicates that the model's predictions may deviate from the actual fare values by up to approximately 13% in either direction.

The R2 score for the test set is `92.81%`, which means that the model explains `92.81%` of the variance in the fare values on the test set. Similarly, the R2 score for the training set is `91.29%`, which means that the model explains `91.29%` of the variance in the fare values on the training set.

Overall, these performance metrics suggest that the model is reasonably accurate in predicting fare values based on the independent variables in the dataset. However, there is still some room for improvement, and the model's predictions may not be entirely precise for all scenarios.