# Day 9 Lab, IS 4487

What do you need to do for today's project?

1. Use the model to predict on a new dataset (without the target), then use those predictions to identify those who should be called--a contact list.
2.  Make a recommendation to the Director of Sales based on all of your analytic work for this project.

Remember that for this example we'll be using the MegaTelCo data, where the target is `leave` not `answer`.  

Note that the first set of steps below is identical to what we did in the previous lab.




#Load Libraries


In [None]:
import pandas as pd
from sklearn.tree import plot_tree
from sklearn.preprocessing import LabelEncoder # for label encoding
from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier
from sklearn import tree


# Get Data

For this part of the project we will be using the model to predict whether current customers will churn.

Remember:  we have trained the model on historical data, which includes information about whether customers have *already* churned.  But the important use case is to predict whether *existing* customers will churn.

In [None]:
# Training data
mtc = pd.read_csv("https://raw.githubusercontent.com/jefftwebb/is_4487_base/dd870389117d5b24eee7417d5378d80496555130/Labs/DataSets/megatelco_leave_survey.csv")

# Current customer data
current_customers = pd.read_csv("https://raw.githubusercontent.com/jefftwebb/is_4487_base/main/Labs/DataSets/megatelco_new_customer_data.csv")

We should double check that this new dataset is clean:

In [None]:
current_customers.describe()

In [None]:
current_customers.info()

Looks okay.

# Clean data


In [None]:
# filter rows
mtc_clean = mtc[(mtc['house'] > 0) & (mtc['income'] > 0) & (mtc['handset_price'] < 1000)]

# remove NAs
mtc_clean = mtc_clean.dropna()

# Fit full model

Again, we will set `max_depth = 5` to keep the tree simple and prevent overfitting.

In [None]:
# split the datafram into independent (X) and dependent (predicted) attributes (y)
X = mtc_clean.drop(['id', 'leave'], axis=1)
y = mtc_clean['leave']

# Convert categorical variables to numeric
le = LabelEncoder()
for column in X.select_dtypes(include=['object']):
    X[column] = le.fit_transform(X[column])

# initialize the tree
full_tree = DecisionTreeClassifier(criterion="entropy", max_depth = 5)

# Create Decision Tree Classifer
full_tree = full_tree.fit(X, y)

# Predict

The next step is to use the model to predict churn for the current customers.

We need to make sure that the new dataset has the same shape and data types  as the data used to fit the model.

1. Prepare the new data.  This will entail dropping the `id` column and reformatting the string variables with the label encoder.

In [None]:
X_new = current_customers.drop(['id'], axis=1)

# Convert categorical variables to numeric
le = LabelEncoder()
for column in X_new.select_dtypes(include=['object']):
    X_new[column] = le.fit_transform(X_new[column])

2. Predict using the new data.

In [None]:
pred = full_tree.predict(X = X_new)

# Add predictions to the data

The next step is to append the predictions to the `current_customers` data so we can link the predictions to the customer ID.  



In [None]:
current_customers["predictions"] = pred

list = current_customers[["id", "predictions"]]

list

# Which customers to target for retention?

The list can be handed off to the marketing department to direct their retention efforts!