## Introduction

Effectively managing routing of Last Mile Delivery involves ensuring that packages are delivered at a conveinent time or at a quickest possible time.

If a delivery, for some reason, relative to a destination doesn't meet the expected time for the delivery of a package, (means that maybe the delivery personnel got stuck at a location), the algorithms or model sends a notification, only then does a human intervention take place, which aids in effectively managing routing of the deliveries.

In this project, I'll build a model that predicts when a delivery doesn't meet the time constraint for the delivery of a package relative to a destination (route) and when it meets the time constraint (successfully delivered) using a dataset containing historical datas of successful & unsucessful deliveries for the dunzo app.

### The Dataset

The Dataset contains 7 columns:
- Recipient_Location: Where the package is to be delivered to.
- Sender_Location: The current location of the delivery partner.
- Orders: The amount of orders per delivery.
- Day: Weekend or Weekday
- Hour: Contains the hours during which deliveries were ordered, from 1 to 24
- Duration (mins): Time in mins it took package to get delivered to the customer
- Success: Indicator of whether a delivery was succesful or not


P.S: Intra-city's deliveries are mostly faster than Inter-city's deliveries

In this project, I'll:

- Prepare the data for machine learning
- Train a model using Classifier models
- Measure the accuracy of the model using Precision

In [16]:
import pandas as pd
dunzo_app = pd.read_csv("Dunzo_App_data.csv")
dunzo_app.info()
dunzo_app.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 232 entries, 0 to 231
Data columns (total 7 columns):
Recipient_Location    232 non-null object
Sender_Location       232 non-null object
Orders                232 non-null int64
Day                   232 non-null object
Hour                  232 non-null int64
Duration (mins)       232 non-null int64
Success               232 non-null int64
dtypes: int64(4), object(3)
memory usage: 12.8+ KB


Unnamed: 0,Recipient_Location,Sender_Location,Orders,Day,Hour,Duration (mins),Success
0,Bengaluru,Bengaluru,1,Weekend,6,5,1
1,Delhi,Delhi,2,Weekday,7,10,1
2,Gurgaon,Gurgaon,3,Weekend,8,20,0
3,Noida,Noida,4,Weekday,9,5,1
4,Pune,Pune,5,Weekend,11,10,1
5,Chennai,Chennai,6,Weekday,12,20,1
6,Mumbai,Mumbai,7,Weekend,15,35,1
7,Noida,Bengaluru,8,Weekday,16,70,1
8,Pune,Delhi,9,Weekend,14,120,0
9,Chennai,Noida,10,Weekday,18,40,1


## Finding Missing Values

In [17]:
dunzo_app.isnull().sum()

Recipient_Location    0
Sender_Location       0
Orders                0
Day                   0
Hour                  0
Duration (mins)       0
Success               0
dtype: int64

## Feature Engineering

- To make the data more nuanced, I'll concatenate the Recipient_Location and the Sender_Location column to create a route

In [18]:
dunzo_app["Destination"] = dunzo_app["Recipient_Location"] + dunzo_app["Sender_Location"]
dunzo_app = dunzo_app.drop(["Recipient_Location", "Sender_Location", "Day"], axis=1)

In [19]:
dunzo_app["Destination"].value_counts()

BengaluruBengaluru    31
DelhiDelhi            16
GurgaonGurgaon        13
NoidaNoida             9
ChennaiPune            8
ChennaiDelhi           8
PuneBengaluru          7
PuneNoida              7
DelhiBengaluru         6
DelhiChennai           6
BengaluruChennai       6
ChennaiNoida           6
PunePune               6
MumbaiMumbai           6
BengaluruPune          5
GurgaonPune            5
NoidaBengaluru         5
MumbaiChennai          4
BengaluruDelhi         4
ChennaiChennai         4
NoidaChennai           4
ChennaiBengaluru       4
MumbaiNoida            4
GurgaonNoida           3
ChennaiGurgaon         3
PuneDelhi              3
NoidaPune              3
MumbaiPune             3
MumbaiGurgaon          3
DelhiPune              3
NoidaDelhi             3
GurgaonDelhi           3
ChennaiMumbai          2
MumbaiBengaluru        2
GuargonChennai         2
GurgaonChennai         2
PuneMumbai             2
NoidaGurgaon           2
DelhiNoida             2
PuneChennai            2


- From the data dictionary, it is known that deliveries between the same cities are mostly faster than inter-cities, So I'll group all the intra-city's delivery

In [20]:
mapping_dict = {
    "Destination":{
    
        "BengaluruBengaluru" : "intra_city",
        "NoidaNoida" : "intra_city",
        "MumbaiMumbai": "intra_city",
        "DelhiDelhi": "intra_city",
        "ChennaiChennai": "intra_city",
        "PunePune": "intra_city",
        "GurgaonGurgaon" : "intra_city",
        
    }
}

dunzo_app = dunzo_app.replace(mapping_dict)

- The Hour column contains the hours during which deliveries were ordered, from 1 to 24. A machine will treat each hour differently, without understanding that certain hours are related. 

- So I'll  introduce some order into the process by creating a new column with labels for morning (1), afternoon (2), evening(3). 

- This will bundle similar times together, enabling the model to make better decisions.

In [21]:
def assign_label(hour):
    if hour >=0 and hour < 12:
        return 1
    elif hour >=12 and hour < 17 :
        return 2
    elif hour >= 17 and hour < 24:
        return 3

dunzo_app["time_label"] = dunzo_app["Hour"].apply(assign_label)

In [22]:
dunzo_app = dunzo_app.drop(["Hour"], axis=1)

- For the Orders column, I'll create a bin that will bundle similar orders together, enabling the model to make better decisions

In [23]:
def assign_label_1(order):
    if order >=0 and order < 4:
        return 1
    elif order >=4 and order < 7:
        return 2
    elif order >=7 and order < 10:
        return 3
    elif order >=10 and order < 18:
        return 4
    
dunzo_app["order_label"] = dunzo_app["Orders"].apply(assign_label_1)

In [24]:
dunzo_app = dunzo_app.drop(["Orders"], axis=1)

#### Preprocessing the numeric features

In [25]:
from sklearn.preprocessing import minmax_scale
columns = ["Duration (mins)","time_label", "order_label"]
for col in columns:
    dunzo_app[col + "_scaled"] = minmax_scale(dunzo_app[col])

In [26]:
dunzo_app = dunzo_app.drop(columns, axis=1)

#### Creating Dummies for the object feature

In [28]:
cat_column = ["Destination"]
dummy_df = pd.get_dummies(dunzo_app[cat_column])
dunzo_app = pd.concat([dunzo_app, dummy_df], axis=1)
dunzo_app = dunzo_app.drop(cat_column, axis=1)

In [29]:
dunzo_app

Unnamed: 0,Success,Duration (mins)_scaled,time_label_scaled,order_label_scaled,Destination_BengaluruChennai,Destination_BengaluruDelhi,Destination_BengaluruGuargon,Destination_BengaluruGurgaon,Destination_BengaluruNoida,Destination_BengaluruPune,...,Destination_NoidaGurgaon,Destination_NoidaMumbai,Destination_NoidaPune,Destination_PuneBengaluru,Destination_PuneChennai,Destination_PuneDelhi,Destination_PuneGuargon,Destination_PuneMumbai,Destination_PuneNoida,Destination_intra_city
0,1,0.000000,0.0,0.000000,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,1,0.025641,0.0,0.000000,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,0,0.076923,0.0,0.000000,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
3,1,0.000000,0.0,0.333333,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,1,0.025641,0.0,0.333333,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
5,1,0.076923,0.5,0.333333,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
6,1,0.153846,0.5,0.666667,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
7,1,0.333333,0.5,0.666667,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0.589744,0.5,0.666667,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
9,1,0.179487,1.0,1.000000,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [30]:
from warnings import simplefilter
# ignore all future warnings
simplefilter(action='ignore', category=FutureWarning)

## Training a Model with a Logistic Regression Model

#### Picking an error rate

I'll use precision_score as the error metric to get around the class imbalance

In [31]:
dunzo_app_features = dunzo_app.drop(["Success"], axis=1)

In [32]:
from sklearn.feature_selection import RFECV
from sklearn.linear_model import LogisticRegression


cols = dunzo_app_features
all_X = cols
all_y = dunzo_app["Success"]
lr = LogisticRegression()
selector = RFECV(lr,cv=10)
selector.fit(all_X,all_y)

optimized_columns = all_X.columns[selector.support_]


In [33]:
from sklearn.model_selection import cross_val_predict
X = dunzo_app_features[optimized_columns]
Y = dunzo_app["Success"]

lr = LogisticRegression(class_weight="balanced")
predictions = cross_val_predict(lr, X, Y, cv=10)

In [39]:
from sklearn.metrics import precision_score
precision_score(dunzo_app["Success"], predictions)

0.8904109589041096

A precision score of 89%

## Training a Model with a Random Forest Classifier Model

In [42]:
from sklearn.ensemble import RandomForestClassifier
import numpy as np
rf = RandomForestClassifier(class_weight="balanced", random_state=1)
predictions_2 = cross_val_predict(rf, X, Y, cv=10)

In [43]:
from sklearn.metrics import precision_score
precision_score(dunzo_app["Success"], predictions_2)

0.9047619047619048

A precision score of 90%