# DS Automation Assignment

Using our prepared churn data from week 2:
- use pycaret to find an ML algorithm that performs best on the data
    - Choose a metric you think is best to use for finding the best model; by default, it is accuracy but it could be AUC, precision, recall, etc. The week 3 FTE has some information on these different metrics.
- save the model to disk
- create a Python script/file/module with a function that takes a pandas dataframe as an input and returns the probability of churn for each row in the dataframe
    - your Python file/function should print out the predictions for new data (new_churn_data.csv)
    - the true values for the new data are [1, 0, 0, 1, 0] if you're interested
- test your Python module and function with the new data, new_churn_data.csv
- write a short summary of the process and results at the end of this notebook
- upload this Jupyter Notebook and Python file to a Github repository, and turn in a link to the repository in the week 5 assignment dropbox

*Optional* challenges:
- return the probability of churn for each new prediction, and the percentile where that prediction is in the distribution of probability predictions from the training dataset (e.g. a high probability of churn like 0.78 might be at the 90th percentile)
- use other autoML packages, such as TPOT, H2O, MLBox, etc, and compare performance and features with pycaret
- create a class in your Python module to hold the functions that you created
- accept user input to specify a file using a tool such as Python's `input()` function, the `click` package for command-line arguments, or a GUI
- Use the unmodified churn data (new_unmodified_churn_data.csv) in your Python script. This will require adding the same preprocessing steps from week 2 since this data is like the original unmodified dataset from week 1.

In [295]:
#!conda create -n msds python=3.10.14 -y
#!conda activate msds
#!pip install --upgrade pycaret

In [296]:
#Loading and using churn data from week 2
import pandas as pd
df = pd.read_csv(r'C:\Users\thelo\OneDrive\Documentos\School\MSDS_600\Week_4_Assignment\Prepped_Churn_data.csv')
df

Unnamed: 0.1,Unnamed: 0,customerID,tenure,PhoneService,Contract,PaymentMethod,MonthlyCharges,TotalCharges,Churn,Average_tenure_Charges,Average_Annual_Charges
0,0,7590-VHVEG,1,No,Month-to-month,Electronic check,29.85,29.85,0,29.850000,12.000000
1,1,5575-GNVDE,34,Yes,One year,Mailed check,56.95,1889.50,0,55.573529,398.138718
2,2,3668-QPYBK,2,Yes,Month-to-month,Mailed check,53.85,108.15,1,54.075000,24.100279
3,3,7795-CFOCW,45,No,One year,Bank transfer (automatic),42.30,1840.75,0,40.905556,522.198582
4,4,9237-HQITU,2,Yes,Month-to-month,Electronic check,70.70,151.65,1,75.825000,25.739745
...,...,...,...,...,...,...,...,...,...,...,...
7038,7038,6840-RESVB,24,Yes,One year,Mailed check,84.80,1990.50,0,82.937500,281.674528
7039,7039,2234-XADUH,72,Yes,One year,Credit card (automatic),103.20,7362.90,0,102.262500,856.151163
7040,7040,4801-JZAZL,11,No,Month-to-month,Electronic check,29.60,346.45,0,31.495455,140.452703
7041,7041,8361-LTMKD,4,Yes,Month-to-month,Mailed check,74.40,306.60,1,76.650000,49.451613


In [297]:
df=df.drop(columns=['Unnamed: 0', 'Average_Annual_Charges'], axis=1)
df=df.rename(columns= {'Average_tenure_Charges': 'charge_per_tenure'})

In [298]:
#!pip install pycaret

In [299]:
from pycaret.classification import setup, compare_models, predict_model, save_model, load_model

In [301]:
automl = setup(df, target='Churn')

Unnamed: 0,Description,Value
0,Session id,239
1,Target,Churn
2,Target type,Binary
3,Original data shape,"(7043, 9)"
4,Transformed data shape,"(7043, 14)"
5,Transformed train set shape,"(4930, 14)"
6,Transformed test set shape,"(2113, 14)"
7,Numeric features,4
8,Categorical features,4
9,Rows with missing values,0.2%


In [315]:
best_model = compare_models(sort= 'Accuracy')

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
knn,K Neighbors Classifier,0.7631,0.7399,0.429,0.5716,0.4889,0.3392,0.3457,0.024
lr,Logistic Regression,0.7363,0.821,0.0069,0.2833,0.0133,0.0095,0.0348,0.048
dt,Decision Tree Classifier,0.7347,0.5,0.0,0.0,0.0,0.0,0.0,0.016
ridge,Ridge Classifier,0.7347,0.8258,0.0,0.0,0.0,0.0,0.0,0.019
rf,Random Forest Classifier,0.7347,0.6676,0.0,0.0,0.0,0.0,0.0,0.057
qda,Quadratic Discriminant Analysis,0.7347,0.5543,0.0,0.0,0.0,0.0,0.0,0.018
ada,Ada Boost Classifier,0.7347,0.5,0.0,0.0,0.0,0.0,0.0,0.017
gbc,Gradient Boosting Classifier,0.7347,0.4613,0.0,0.0,0.0,0.0,0.0,0.05
lda,Linear Discriminant Analysis,0.7347,0.5,0.0,0.0,0.0,0.0,0.0,0.017
et,Extra Trees Classifier,0.7347,0.6392,0.0,0.0,0.0,0.0,0.0,0.043


In [303]:
best_model

In [304]:
df.iloc[-2:-1]


Unnamed: 0,customerID,tenure,PhoneService,Contract,PaymentMethod,MonthlyCharges,TotalCharges,Churn,charge_per_tenure
7041,8361-LTMKD,4,Yes,Month-to-month,Mailed check,74.4,306.6,1,76.65


In [305]:
predict_model(best_model, df.iloc[-2:-1])

Unnamed: 0,customerID,tenure,PhoneService,Contract,PaymentMethod,MonthlyCharges,TotalCharges,charge_per_tenure,Churn,prediction_label,prediction_score
7041,8361-LTMKD,4,Yes,Month-to-month,Mailed check,74.400002,306.600006,76.650002,1,1,0.8


In [306]:
save_model(best_model, 'K_Neighbors_Classifier')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=Memory(location=None),
          steps=[('numerical_imputer',
                  TransformerWrapper(exclude=None,
                                     include=['tenure', 'MonthlyCharges',
                                              'TotalCharges',
                                              'charge_per_tenure'],
                                     transformer=SimpleImputer(add_indicator=False,
                                                               copy=True,
                                                               fill_value=None,
                                                               keep_empty_features=False,
                                                               missing_values=nan,
                                                               strategy='mean'))),
                 ('categorical_imputer',
                  TransformerWrapper(exc...
                                     transformer=TargetEncoder(cols=['customerID'],
   

In [307]:
import pickle
with open('K_Neighbors_Classifier.pkl', 'wb') as f:
    pickle.dump(best_model, f)

In [308]:
with open('K_Neighbors_Classifier.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

In [309]:
loaded_lda = load_model('K_Neighbors_Classifier')

Transformation Pipeline and Model Successfully Loaded


In [310]:
new_data=df.iloc[-2:-1]

In [311]:
predict_model(loaded_lda, df)

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,K Neighbors Classifier,0.8154,0.8485,0.5265,0.7034,0.6022,0.4852,0.4939


Unnamed: 0,customerID,tenure,PhoneService,Contract,PaymentMethod,MonthlyCharges,TotalCharges,charge_per_tenure,Churn,prediction_label,prediction_score
0,7590-VHVEG,1,No,Month-to-month,Electronic check,29.850000,29.850000,29.850000,0,0,0.6
1,5575-GNVDE,34,Yes,One year,Mailed check,56.950001,1889.500000,55.573528,0,0,1.0
2,3668-QPYBK,2,Yes,Month-to-month,Mailed check,53.849998,108.150002,54.075001,1,0,0.8
3,7795-CFOCW,45,No,One year,Bank transfer (automatic),42.299999,1840.750000,40.905556,0,0,1.0
4,9237-HQITU,2,Yes,Month-to-month,Electronic check,70.699997,151.649994,75.824997,1,0,0.6
...,...,...,...,...,...,...,...,...,...,...,...
7038,6840-RESVB,24,Yes,One year,Mailed check,84.800003,1990.500000,82.937500,0,0,0.8
7039,2234-XADUH,72,Yes,One year,Credit card (automatic),103.199997,7362.899902,102.262497,0,0,1.0
7040,4801-JZAZL,11,No,Month-to-month,Electronic check,29.600000,346.450012,31.495455,0,0,0.8
7041,8361-LTMKD,4,Yes,Month-to-month,Mailed check,74.400002,306.600006,76.650002,1,1,0.8


In [312]:
from IPython.display import Code

Code('predict_churn.py')

In [313]:
%run predict_churn.py

Transformation Pipeline and Model Successfully Loaded


   customerID  tenure  PhoneService  Contract  PaymentMethod  MonthlyCharges  \
0  9305-CKSKC      22             1         0              2       97.400002   
1  1452-KNGVK       8             0         1              1       77.300003   
2  6723-OKKJM      28             1         0              0       28.250000   
3  7832-POPKP      62             1         0              2      101.699997   
4  6348-TACGU      10             0         0              1       51.150002   

   TotalCharges  charge_per_tenure  prediction_label  prediction_score  
0    811.700012          36.895454                 0               0.8  
1   1701.949951         212.743744                 0               0.6  
2    250.899994           8.960714                 0               1.0  
3   3106.560059          50.105808                 0               1.0  
4   3440.969971         344.096985                 1               0.6  
predictions:
  Churn_prediction
0         No churn
1         No churn
2         N

# Summary

Write a short summary of the process and results here.