# Inventory example

You're the operations manager of a firm that sells widgets, and you have been tasked with reducing delivery and inventory costs. To accomplish this, you requested data about the historical monthly demand of widgets in various customer locations ($Y$) and some other features about those locations ($X$). You also split the data into a training set and a test set.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Load data
df_train = pd.read_csv("inventory_training.csv")
df_test = pd.read_csv("inventory_test.csv")
df_train.head()

Unnamed: 0,Y,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10
0,0.0,0.314341,0.335933,0.455522,0.609706,0.776379,0.611998,0.225171,0.221796,0.109632,0.632819
1,0.0,0.49129,0.07305,0.880053,0.910943,0.050429,0.859846,0.165786,0.505842,0.773601,0.193455
2,0.0,0.736468,0.971234,0.031419,0.669304,0.110343,0.220268,0.846313,0.563787,0.850898,0.531761
3,1.0,0.142886,0.220886,0.291696,0.076489,0.982436,0.676249,0.580175,0.836969,0.481736,0.583566
4,1.0,0.877595,0.521938,0.08084,0.350176,0.985736,0.831571,0.820918,0.630583,0.702497,0.948716


Use the training data to fit a decision tree regressor.

In [2]:
from sklearn.tree import DecisionTreeRegressor 

target = "Y"
predictors = df_train.columns[df_train.columns != target]
model = DecisionTreeRegressor()

# Your code to fit the model goes here! 

Now, suppose that at the beginning of every month you have to pre-stock customer locations with widgets. Every extra unit that you stock will cost you \$1 inventory cost. Every widget that you don't have on hand will cost you \$1 delivery cost. According to the function below, what would be the total cost if we had used the model to predict $Y$ in the training set?

In [3]:
def total_cost(predictions, demand, inventory_cost=1, delivery_cost=1):
    errors = demand - predictions
    cost = ((errors > 0) * errors * delivery_cost + (errors < 0) * -1 * errors * inventory_cost).sum()
    return "${:,}".format(round(cost))

train_predictions = 0 # Put in this line your predictions in the TRAINING set.

model_cost = total_cost(train_predictions, df_train[target])
print("Total cost of your model: ", model_cost)

Total cost of your model:  $1,950.0


What if we had used the mean instead? Does it work better than the model?

In [4]:
mean = df_train[target].mean()
mean_cost = total_cost(mean, df_train[target])
print("Total cost of the mean: ", mean_cost)

Total cost of the mean:  $1,773.0


Let's repeat the same steps but using the test set. What works better, the mean or your model?

In [5]:
test_predictions = 0 # Put in this line your predictions in the TEST set.

model_cost = total_cost(test_predictions, df_test[target])
print("Total cost of your model: ", model_cost)
mean_cost = total_cost(mean, df_test[target])
print("Total cost of the mean: ", mean_cost)

Total cost of your model:  $1,820.0
Total cost of the mean:  $1,696.0


Can you think of any other way to do better than the mean or the model? Try it out here!

In [6]:
better_predictions = 0 # Put here the predictions of your new proposal in the TEST set.

new_cost = total_cost(better_predictions, df_test[target])
print("Total cost of your new proposal: ", new_cost)

Total cost of your new proposal:  $1,820.0
