In [1]:
import pandas as pd
import pickle
from IPython.display import Code
from pycaret.classification import setup, compare_models, predict_model, save_model, load_model
import numpy as np

## Load data

In [2]:
df = pd.read_csv("churn_data.csv")
df.tail(5)

Unnamed: 0,customerID,tenure,PhoneService,Contract,PaymentMethod,MonthlyCharges,TotalCharges,Churn
7038,6840-RESVB,24,Yes,One year,Mailed check,84.8,1990.5,No
7039,2234-XADUH,72,Yes,One year,Credit card (automatic),103.2,7362.9,No
7040,4801-JZAZL,11,No,Month-to-month,Electronic check,29.6,346.45,No
7041,8361-LTMKD,4,Yes,Month-to-month,Mailed check,74.4,306.6,Yes
7042,3186-AJIEK,66,Yes,Two year,Bank transfer (automatic),105.65,6844.5,No


## autoML Environment

In [3]:
automl = setup(df, target='Churn')

Unnamed: 0,Description,Value
0,Session id,4858
1,Target,Churn
2,Target type,Binary
3,Target mapping,"No: 0, Yes: 1"
4,Original data shape,"(7043, 8)"
5,Transformed data shape,"(7043, 13)"
6,Transformed train set shape,"(4930, 13)"
7,Transformed test set shape,"(2113, 13)"
8,Ordinal features,1
9,Numeric features,3


This output summarizes the setup information for the PyCaret auto ML environment designed for the binary classification task predicting `Churn`. 

The key points include,

The session has an ID of 4858, and the target variable is Churn, categorized as binary. The original dataset has dimensions (7043, 8), and after transformation, it maintains the same shape. The transformed training set comprises 4930 samples, while the transformed test set has 2113 samples. There are 10 numeric features in the dataset, and the percentage of rows with missing values is 0.2%.

The data has undergone preprocessing with simple imputation. Numeric features have been imputed with the mean, and categorical features with the mode. The cross-validation is performed using StratifiedKFold with 10 folds. The setup utilizes all available CPUs (-1 CPU jobs) and does not employ GPU acceleration. Logging of the experiment is turned off, and the experiment is named `clf-default-name` with a unique session identifier (USI) of 71ea.

The dataset is well-prepared, and the setup is ready for model comparison and selection.

## automl setup obhect elements

In [4]:
automl_element = automl.get_config("X_train")
automl_element


Unnamed: 0,customerID,tenure,PhoneService,Contract,PaymentMethod,MonthlyCharges,TotalCharges
5348,9495-REDIY,25,No,One year,Credit card (automatic),40.650002,970.549988
2441,9050-IKDZA,2,Yes,Month-to-month,Mailed check,81.500000,162.550003
2673,8429-XIBUM,22,Yes,Month-to-month,Bank transfer (automatic),101.349998,2317.100098
956,3261-CQXOL,71,Yes,Two year,Bank transfer (automatic),25.450001,1813.349976
5410,4049-ZPALD,64,Yes,Two year,Bank transfer (automatic),99.000000,6375.799805
...,...,...,...,...,...,...,...
2862,5027-QPKTE,7,Yes,Month-to-month,Electronic check,69.349998,451.100006
76,6416-JNVRK,46,Yes,One year,Credit card (automatic),55.650002,2688.850098
1020,1452-XRSJV,39,Yes,Month-to-month,Credit card (automatic),51.049999,2066.000000
4992,7740-KKCXF,51,No,Month-to-month,Credit card (automatic),30.049999,1529.449951


## compare classification models

In [5]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.7347,0.4767,0.7347,0.5398,0.6223,0.0,0.0,4.785
nb,Naive Bayes,0.7347,0.4893,0.7347,0.5398,0.6223,0.0,0.0,0.224
dt,Decision Tree Classifier,0.7347,0.5,0.7347,0.5398,0.6223,0.0,0.0,0.184
ridge,Ridge Classifier,0.7347,0.0,0.7347,0.5398,0.6223,0.0,0.0,0.223
rf,Random Forest Classifier,0.7347,0.497,0.7347,0.5398,0.6223,0.0,0.0,0.413
qda,Quadratic Discriminant Analysis,0.7347,0.5,0.7347,0.5398,0.6223,0.0,0.0,0.377
ada,Ada Boost Classifier,0.7347,0.5,0.7347,0.5398,0.6223,0.0,0.0,0.182
gbc,Gradient Boosting Classifier,0.7347,0.5,0.7347,0.5398,0.6223,0.0,0.0,0.303
lda,Linear Discriminant Analysis,0.7347,0.5,0.7347,0.5398,0.6223,0.0,0.0,0.251
et,Extra Trees Classifier,0.7347,0.4974,0.7347,0.5398,0.6223,0.0,0.0,0.319
