# Customer Churn Analysis



Churn is one of the biggest problems not only in the telecom industry but also in several other industries. Using the 'Telecom Churn Prediction' dataset we shall go over some of the main reasons why customer churn happens and also build a model to predict if a customer will churn or not.

The higher the churn rate, the more difficult it is to grow because you’ll constantly be chasing new customers to replace the ones that are churning. Customers in the telecom industry can choose from a variety of service providers and actively switch from one to the next. The telecommunications business has an annual churn rate of 15-25 percent in this highly competitive market.

Customer churn is a critical metric because it is much less expensive to retain existing customers than it is to acquire new customers.

### **Data Dictionary**

1. **`CustomerID`**: A unique ID that identifies each customer.

2. **`Gender`**: The customer’s gender: (Male, Female)

3. **`Age`**: The customer’s current age, in years, at the time the fiscal quarter ended.

4. **`Senior Citizen`**: Indicates if the customer is 65 or older: (Yes, No)

5. **`Married (Partner)`**: Indicates if the customer is married: (Yes, No)

6. **`Dependents`**: Indicates if the customer lives with any dependents: (Yes, No)

7. **`Number of Dependents`**: Indicates the number of dependents that live with the customer.

8. **`Phone Service`**: Indicates if the customer subscribes to home phone service with the company: (Yes, No)

9. **`Multiple Lines`**: Indicates if the customer subscribes to multiple telephone lines with the company:(Yes, No)

10. **`Internet Service`**: Indicates if the customer subscribes to Internet service with the company: (DSL, Fiber Optic)

11. **`Online Security`**: Indicates if the customer subscribes to an additional online security service provided by the company: (Yes, No)

12. **`Online Backup`**: Indicates if the customer subscribes to an additional online backup service provided by the company: (Yes, No)

13. **`Device Protection Plan`**: Indicates if the customer subscribes to an additional device protection plan for their Internet equipment provided by the company: (Yes, No)

14. **`Premium Tech Support`**: Indicates if the customer subscribes to an additional technical support plan from the company with reduced wait times: (Yes, No)

15. **`Streaming TV`**: Indicates if the customer uses their Internet service to stream television programing from a third party provider: (Yes, No) The company does not charge an additional fee for this service.

16. **`Streaming Movies`**: Indicates if the customer uses their Internet service to stream movies from a third party provider: (Yes, No) The company does not charge an additional fee for this service.

17. **`Contract`**: Indicates the customer’s current contract type: (Month-to-Month, One Year, Two Year)

18. **`Paperless Billing`**: Indicates if the customer has chosen paperless billing: (Yes, No)

19. **`Payment Method`**: Indicates how the customer pays their bill: (Bank Withdrawal, Credit Card, Mailed Check)

20. **`Monthly Charge`**: Indicates the customer’s current total monthly charge for all their services from the company.

21. **`Total Charges`**: Indicates the customer’s total charges, calculated to the end of the quarter specified above.

22. **`Tenure`**: Indicates the total amount of months that the customer has been with the company.

23. **`Churn`**: Yes = the customer left the company this quarter. No = the customer remained with the company. Directly related to Churn Value.

Reference:
https://www.kaggle.com/blastchar/telco-customer-churn


## A) Dataset

Lets start by importing libraries that we shall need

In [None]:
#EDA Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rcParams

#Model Building Libraries
from sklearn import metrics

from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score, confusion_matrix, classification_report, accuracy_score
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from imblearn.combine import SMOTEENN 
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
#Parameters for plots
rcParams['figure.figsize'] = 8,6
sns.set_style('darkgrid')
rcParams['axes.titlepad']=15
RB = ["#123ea6", "#e63707"]
sns.set_palette(RB)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Loading the dataset

In [None]:
df_base= pd.read_csv("../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv")

Checking the head and attributes of the dataset

In [None]:
df_base.head()

In [None]:
df_base.shape

In [None]:
df_base.columns.values

In [None]:
df_base.dtypes

In [None]:
df_base.describe()

We can conclude that:
1. 75% of the customers have tenure less than 55 months
2. The mean tenure is around 32 months
3. The mean monthly charges are roughly around 64.76 USD and 25% customers pay more than 89.85 USD


In [None]:
sns.countplot(x=df_base['Churn'])
plt.title('No of Churned Customers',fontsize=15)

In [None]:
df_base['Churn'].value_counts()

As the distribution of data is not around 50:50 we can say that it is an 'Imbalanced Dataset'

Now lets check if any missing values exist in our dataset

In [None]:
df_base.isnull().sum()

In [None]:
df_base.info()

Seems like there are no missing values here.... But wait theres a catch!!! 

We can see that majority of the columns are of datatype objects. Hence we should convert them into either float or integer values to really know if any null values exist. We can use the to_numeric method to do so.

In [None]:
df= df_base.copy()
df['TotalCharges']=pd.to_numeric(df_base['TotalCharges'],errors='coerce') #coerce puts NaN values if there are any parsing errors
df.isnull().sum()

As we can see now there are 11 missing values from the TotalCharges colums. Lets check these records

In [None]:
df[df.isnull().any(axis=1)]

As we can see from the above dataframe the customers who have tenure equal to 0 are the ones with blank values in the TotalCharges column. Lets quickly check the dataframe where the tenure is 0

In [None]:
df[df['tenure']==0]

As both the dataframes match we can conclude that values are missing due to the fact that the customer has not yet paid anything to the company as they are new and hence none of them churned yet.

Since the percentage of these records compared to the dataset is very low (~0.156%), it is safe to drop them as they wont have a very large impact.

In [None]:
#Dropping the null values
df.dropna(inplace=True)

In [None]:
df.isnull().sum()

In [None]:
df.info()

In [None]:
df1=df.copy()

df1.drop(['customerID'],axis=1,inplace=True)
df1.head()

Tenure is a specific number and to visualize them would be difficult. Hence we will divide them into bins and assign them into groups. For e.g. tenure < 12 months then assign a group of 1-12 months as 0-1 years, for 13-24 assign group as between 1-2 years and so on...




In [None]:
df['tenure'].max() #The max tenure is 72 months or 6 years

In [None]:
# Group the tenures in bins of 12 months
labels = ["{0} - {1}".format(i,i+11) for i in range (1,72,12)]
df['tenure_group'] = pd.cut(df['tenure'], range(1,80,12), right=False, labels=labels)

In [None]:
df['tenure_group'].value_counts()

In [None]:
sns.countplot(x=df['tenure_group'],hue='Churn',data=df)

We can notice that customers who have tenure less than a year are the ones who are most likely to churn whereas the clients who have been with the company the longest are very less likely to churn. Also the number of customers with a monthly tenure is higher than others.

Now, lets remove the tenure column now as we have already transformed it into tenure_group

Lets drop the tenure and customerID columns as it is not of much use for data analysis and model training

In [None]:
df.drop(['customerID','tenure'],axis=1,inplace=True)
df.head()

## B) Exploratory Data Analysis

### Univariate Analysis

In [None]:
sns.pairplot(df,hue='Churn')

In [None]:
#Plotting the countplot of all important columns to gain insights
for i, predictor in enumerate(df.drop(columns=['Churn','TotalCharges','MonthlyCharges'])):
    plt.figure(i,figsize=(6,4))
    sns.countplot(data=df,x=predictor,hue='Churn')
    plt.tight_layout()

#### Observations: 

1. **`Gender`** - The ratio is almost similar hence gender is not an important feature standalone but when combined will be of importance.
2. **`SeniorCitizen`** - Ratio of churners in Senior citizens is very high. If the customers are a Senior Citizen they are more likely to churn.
3. **`Partner`** - Customers with partners they are more likely to churn
4. **`Dependents`** - Customers with dependents they are less likely to churn
5. **`PhoneService`** - Customers with Phone Service are are more likely to churn
6. **`MultipleLines`** - Not much of an effect
7. **`InternetService`** Customers with Fiber optic as their service are more likely to churn probably due to high costs
8. **`OnlineSecurity`** Customers with Online Security are very less likely to churn than ones who dont have Online Security
9. **`OnlineBackup`** - Customers with OnlineBackup are less likely to churn
10. **`TechSupport`** - Customers with DeviceProtection are less likely to churn
11. **`StreamingTV/StreamingMovies`** - Not much of a effect as company doesn't charge an additional fee for these
12. **`Contract`** - Customers with contract of one year or more are very less likely to churn
12. **`PaperlessBilling`** - Not much of an effect
13. **`PaymentMethod`** - Customers who have paid through electronic check are very likely to churn and ones who paid through credit card are very less likely to churn
13. **`tenure_group`** - Customers who have tenure less than a year are the ones who are most likely to churn whereas the clients who have been with the company the longest are very less likely to churn

Now lets convert the target variable Churn into a binary numeric value i.e Yes=1, No=0

In [None]:
df['Churn'].replace(["Yes","No"], [1,0],inplace=True)

Let's also convert all the categorical variables into dummy variables

In [None]:
#One Hot Encoding
df_dummies = pd.get_dummies(df)
df_dummies.head()

In [None]:
100*df[df['SeniorCitizen']==1].shape[0]/df.shape[0]

There are only 16% of the customers who are senior citizens. Thus most of our customers in the data are younger people.

Now lets check out the pairplot

In [None]:
sns.pairplot(df,hue='Churn')

In [None]:
sns.lmplot(x='TotalCharges',y='MonthlyCharges',data=df,hue='Churn')

We can see that there is a relationship between MonthlyCharges and TotalCharges. TotalCharges Increases as Monthly Charges increases as expected. Now lets check out the kdeplot of MonthlyCharges and TotalCharges

In [None]:
sns.kdeplot(data=df_dummies,x='MonthlyCharges',hue='Churn',shade=True)

In [None]:
sns.kdeplot(data=df_dummies,x='TotalCharges',hue='Churn',shade=True)

We find that the Customers are more likely to churn for lower monthly and total charges...Why is that?


In [None]:
sns.kdeplot(data=df_dummies,x='MonthlyCharges',hue='tenure_group_1 - 12',shade=True)

In [None]:
sns.kdeplot(data=df_dummies,x='TotalCharges',hue='tenure_group_1 - 12',shade=True)

We can see that people having a shorter tenure are the ones who are likely to churn and their Total charges will always be less due to their short tenures. Similarly, Low Monthly Charges users tend to churn becase they maybe trying the service and will churn if they find some other service
***

Lets see if the internet service plays a role in churning or not

In [None]:
sns.violinplot(x="InternetService", y="MonthlyCharges", hue="Churn",split=True, 
               palette="coolwarm", data=df, height=4.5, aspect=1.5);

Customers with fiber optic (fast connection) are more probable to churn than those with DSL connection (slower connection) as their service.<br>
Also we can also observe that customers with DSL with higher charges are less probable to churn but the ones with lesser charges (~ 40-60 USD) are more likely to churn 
***

Now Lets check the additional services offered by the company

In [None]:
cols = ["OnlineSecurity", "OnlineBackup", "DeviceProtection", "TechSupport", "StreamingTV", "StreamingMovies"]
df_service = pd.melt(df[df["InternetService"] != "No"][cols]).rename({'value': 'Internet service'}, axis=1)
plt.figure(figsize=(12, 6))
ax = sns.countplot(data=df_service, x='variable', hue='Internet service')
ax.set(xlabel='Additional service', ylabel='No of customers')
plt.show()

Lets check the correlations of Churn with other features

In [None]:
#Get Correlation of "Churn" with other variables:
plt.figure(figsize=(18,8))
df_dummies.corr()['Churn'].sort_values(ascending = False)[:20].plot(kind='bar')
plt.title('Positive Correlation')

In [None]:
plt.figure(figsize=(18,8))
df_dummies.corr()['Churn'].sort_values(ascending = False)[20:].plot(kind='bar')
plt.title('Negative Correlation')

In [None]:
plt.figure(figsize=(12,12))
sns.heatmap(df_dummies.corr(),cmap='coolwarm')

#### Insights:
From these correlation diagrams we can observe the following things:


1. **High Churn**: Seen in cases of *Month-to-Month Contacts, No Online Security, No Tech Support, First Year of Subscription and Fibre Optics Internet service*

2. **Low Churn**: Seen in cases of *Long term Contacts, Subscriptions without Internet Service, and tenure of 5+ years*

3. **Little to no Impact** : Some cases like *Mutiple Lines, Availability of Phone service and Gender* have little to no impact on Churn on their own

### Bivariate Analysis

To do this lets seperate our database intro two categories- Churners and Non-Churners

In [None]:
churners_df = df.loc[df_dummies['Churn']==1]
non_churners_df = df.loc[df_dummies['Churn']==0]
churners_df.head()

In [None]:
#Plotting the countplot of all important columns to gain insights


for i, predictor in enumerate(churners_df.drop(columns=['Churn','TotalCharges','MonthlyCharges','gender'])):
    plt.figure(i,figsize=(6,4))  
    plt.title("Distribution of {} for Churned Customers".format(predictor),fontsize='16')
    sns.countplot(data=churners_df,x=predictor,hue='gender')
    plt.tight_layout
    plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

#### Observations

1. Gender doesnt seem to have that much of an effect on Churning in customers.
2. Few insights we can gain from this is that female customers are a little more likely to churn than male customers if their tenure is less than a year, but are less likely to churn than male customers for long tenures.
3. Female customers are more likely to churn than male customer if their payment method is Credit Card.


### Conclusion from EDA


1. Short term contracts have higher churn rates.
2. Month to month contract is more likely opted by customers but has the greatest impact on the Churn rate (increases likelihood to churn by 6.31x).
3. Customers with a two yearly contract have a very low churn rate.
3. People with higher tenure are very less likely to churn as compared to shorter tenure (1 year).
3. The customers who pay through electronic checks have higher churn rate whereas the ones who pay through credit card have lower churn rate.
4. Customers without an internet service have a very low churn rate.
5. Customers who have Internet service as Fiber Optics as a service are more likely to Churn.
6. Senior Citizens are more likely to churn.
7. Additional features like Security, Backup, Device Protection and Tech Support make the customer less likely to churn.

## C) Training the Model to Predict Churning

Now lets train the model to predict Churning. We will keep tenure in months as the tenure_group column was made for visualization and our models input will be in tenure

Following are the list of algorithms that are used in this notebook.

| Algorithms |
| ----------- |
| Logistic Regression   | 
| Decision Tree  | 
| Random Forest   | 
| PCA     | 
| SVM|
| AdaBoost  | 
| XGBoost  |
| Neural Network |

In [None]:
df_base =  pd.read_csv("../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv")
replaceStruct = {"Churn":     {"No": 0, "Yes": 1 }  }
oneHotCols = ["gender","SeniorCitizen","Partner","Dependents","PhoneService","MultipleLines"
            ,"InternetService","OnlineSecurity","OnlineBackup",
            "DeviceProtection","TechSupport","StreamingTV","StreamingMovies",
            "Contract","PaperlessBilling","PaymentMethod"]

df2=df_base.replace(replaceStruct, inplace=True)
df2=pd.get_dummies(df_base, columns=oneHotCols)
pd.set_option('display.max_columns',100)
df2.head()

In [None]:
df2['TotalCharges']=pd.to_numeric(df_base['TotalCharges'],errors='coerce') #coerce puts NaN values if there are any parsing errors
df2.dropna(inplace=True)
df2=df2.drop('customerID',axis=1)
df2.to_csv(r'Telco-Customer-Predictions.csv')

In [None]:
df2.info()

In [None]:
df_dummies= pd.get_dummies(df2)
df_dummies

In [None]:
X = df_dummies.drop('Churn',axis=1)
y = df_dummies['Churn']

### a) Decision Tree Classifier


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=11)
dt_model = DecisionTreeClassifier(criterion='gini',max_depth=3, min_samples_leaf=8)
dt_model.fit(X_train, y_train)

In [None]:
y_pred = dt_model.predict(X_test)
y_pred

In [None]:
dt = round(accuracy_score(y_test, y_pred)*100, 2)
dt

In [None]:
def print_report(model,X_test,y_test,y_pred):
    """
    returns the model score, confusion matrix and classfication report
    """
    print("Model Score : {} \n ".format(model.score(X_test, y_test)))
    print("Confusion Matrix : \n {}\n".format(confusion_matrix(y_test,y_pred)))
    print("Accuracy : {}".format( round(accuracy_score(y_test, y_pred)*100, 2)))
    print("Classification report : \n\n"+classification_report(y_test, y_pred, labels=[0,1]))
    

In [None]:
print_report(dt_model,X_test,y_test,y_pred)

We can see that although f1-score for non churners is high but for our minority class, Churn=1 the f1-score is very low (0.58). Thus the model is not properly created.Why you may ask? Well its because our dataset is an imbalanced set as we previously saw and the number of churned customers is way lesser than that of non churners.

"***The challenge of working with imbalanced datasets is that most machine learning techniques will ignore, and in turn have poor performance on, the minority class, although typically it is performance on the minority class that is most important***"<br>ref : machinelearningmastery.com

A technique for addressing imbalanced datasets is to oversample the minority class so that new examples can be synthesized from the existing samples.

We will be using 'SMOTEENN' (Combination of -> Synthetic minority over-sampling technique (SMOTE) and cleaning with Edited nearest neighbor (ENN)) to oversample our minority class. It aims to balance class distribution by randomly increasing minority class examples by replicating them. SMOTE synthesises new minority instances between existing minority instances.

In [None]:
print("Before OverSampling- counts of label '1': {}".format(sum(y_train==1)))
print("Before OverSampling- counts of label '0': {} \n".format(sum(y_train==0)))

sm = SMOTEENN()
X_resampled, y_resampled = sm.fit_resample(X_train,y_train)

print("After OverSampling with SMOTEENN - '1': {}".format(sum(y_resampled==1)))
print("After OverSampling with SMOTEENN - '0': {}".format(sum(y_resampled==0)))

As you can see after over sampling and cleaning with SMOTEENN, the distribution of churners and non-churners is now almost evenly distributed

In [None]:
Xr_train,Xr_test,yr_train,yr_test = train_test_split(X_resampled, y_resampled,test_size=0.2)
model_dt_smote=DecisionTreeClassifier(criterion = "gini",max_depth=3, min_samples_leaf=2)
model_dt_smote.fit(X_resampled,y_resampled)

yr_pred = model_dt_smote.predict(Xr_test)

print_report(model_dt_smote,Xr_test,yr_test,yr_pred)
dt_smote = round(accuracy_score(yr_test, yr_pred)*100, 2)

As we can see that the performance of the model has increased significantly after performing oversampling.

Now lets try some more models and choose one of the models with the best performance.

### b) Logistic regression


In [None]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression

# Using MinMaxScaler to scale the variables in logistic regression so that all of them are within a range of 0 to 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=11)
features = X.columns.values
scaler = MinMaxScaler(feature_range = (0,1))
scaler.fit(X)
X = pd.DataFrame(scaler.transform(X))
X.columns = features


#Model fitting
model_lr = LogisticRegression(solver='lbfgs',max_iter=1000)
model_lr.fit(X_train, y_train)

y_pred = model_lr.predict(X_test)
print_report(model_lr,X_test,y_test,y_pred)
lr = round(accuracy_score(y_test, y_pred)*100, 2)

In [None]:
sm = SMOTEENN()

X_resample, y_resample = sm.fit_resample(X_train,y_train)
Xr_train,Xr_test,yr_train,yr_test = train_test_split(X_resample, y_resample,test_size=0.2)

model_lr_smote = LogisticRegression(solver='lbfgs',max_iter=1000)
model_lr_smote.fit(Xr_train,yr_train)

yr_pred = model_lr_smote.predict(Xr_test)
print_report(model_lr_smote,Xr_test,yr_test,yr_pred)
lr_smote = round(accuracy_score(yr_test, yr_pred)*100, 2)

The above two plots show us the positively and negatively correlated values other than tenure and Total charges with being the most negatively correlated feature and monthlazy contract is the most positively correlated feature w.r.t Churn

### c) Random Forest

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=11)
model_rf=RandomForestClassifier(n_estimators=200, criterion='gini',max_depth=3, min_samples_leaf=2)
model_rf.fit(X_train,y_train)
y_pred = model_rf.predict(X_test)
print_report(model_rf,X_test,y_test,y_pred)
rf = round(accuracy_score(y_test, y_pred)*100, 2)

In [None]:
sm = SMOTEENN()

X_resample, y_resample = sm.fit_resample(X_train,y_train)
Xr_train,Xr_test,yr_train,yr_test = train_test_split(X_resample, y_resample,test_size=0.2)
model_rf_smote=RandomForestClassifier(n_estimators=200, criterion='gini', random_state = 100,max_depth=4, min_samples_leaf=8)

model_rf_smote.fit(Xr_train,yr_train)
yr_pred = model_rf_smote.predict(Xr_test)

print_report(model_rf_smote,Xr_test,yr_test,yr_pred)
rf_smote = round(accuracy_score(yr_test, yr_pred)*100, 2)

In [None]:
Xr_train

In [None]:
yr_pred= model_rf_smote.predict(Xr_test)
yr_pred[0:5]

In [None]:
yr_pred = model_rf_smote.predict_proba(Xr_test)
yr_pred[0:5]

predict_proba gives is the confidence score of that particular label while predict outputs binary probability

### d) PCA

In [None]:
# Applying PCA
from sklearn.decomposition import PCA
pcas = PCA(0.9)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=11)

X_train_pca = pcas.fit_transform(X_train)
X_test_pca = pcas.transform(X_test)
model_pca=RandomForestClassifier(n_estimators=100, criterion='gini', random_state = 100, max_depth=4, min_samples_leaf=8)
model_pca.fit(X_train_pca,y_train)

y_pred_pca = model_pca.predict(X_test_pca)

print_report(model_pca,X_test_pca,y_test,y_pred_pca)
pca = round(accuracy_score(y_test, y_pred_pca)*100, 2)

In [None]:
sm = SMOTEENN()
X_resample, y_resample = sm.fit_resample(X_train,y_train)
Xr_train, Xr_test, yr_train, yr_test=train_test_split(X_resample, y_resample,test_size=0.2)
pcas = PCA(0.9)

Xr_train_pca = pcas.fit_transform(Xr_train)
Xr_test_pca = pcas.transform(Xr_test)
model_pca_smote=RandomForestClassifier(n_estimators=100, criterion='gini', random_state = 100, max_depth=6, min_samples_leaf=8)
model_pca_smote.fit(Xr_train_pca,yr_train)

yr_pred_pca = model_pca_smote.predict(Xr_test_pca)

print_report(model_pca_smote,Xr_test_pca,yr_test,yr_pred_pca)
pca_smote = round(accuracy_score(yr_test, yr_pred_pca)*100, 2)

### e) AdaBoost

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model_ada = AdaBoostClassifier(n_estimators=200, learning_rate=0.2, algorithm='SAMME.R')

model_ada.fit(X_train, y_train)
y_pred = model_ada.predict(X_test)

print_report(model_ada,X_test,y_test,y_pred)
ada = round(accuracy_score(y_test, y_pred)*100, 2)

In [None]:
sm = SMOTEENN()
X_resample, y_resample = sm.fit_resample(X_train,y_train)
Xr_train,Xr_test,yr_train,yr_test = train_test_split(X_resample, y_resample,test_size=0.2)

model_ada_smote = AdaBoostClassifier(n_estimators=200, learning_rate=0.2, algorithm='SAMME')
model_ada_smote.fit(Xr_train, yr_train)

yr_pred = model_ada_smote.predict(Xr_test)
print_report(model_ada_smote,Xr_test,yr_test,yr_pred)
ada_smote = round(accuracy_score(yr_test, yr_pred)*100, 2)

In [None]:
predt = model_ada_smote.predict(Xr_test)
predt[0:10]

### f) Support Vector Machines (SVM)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=11)
from sklearn.svm import SVC

model_svm = SVC(kernel='linear') 
model_svm.fit(X_train,y_train)
y_pred = model_svm.predict(X_test)

print_report(model_svm, X_test, y_test, y_pred)
svm = round(accuracy_score(y_test, y_pred)*100, 2)

In [None]:
sm = SMOTEENN()
X_resample, y_resample = sm.fit_resample(X_train,y_train)
Xr_train,Xr_test,yr_train,yr_test = train_test_split(X_resample, y_resample,test_size=0.2)

model_svm_smote = SVC(kernel='linear')
model_svm_smote.fit(Xr_train, yr_train)
yr_pred = model_svm_smote.predict(Xr_test)

print_report(model_svm_smote,Xr_test,yr_test,yr_pred)
svm_smote = round(accuracy_score(yr_test, yr_pred)*100, 2)

### g) XGBoost

In [None]:
from xgboost import XGBClassifier
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=11)
model_xgb = XGBClassifier(n_estimators=200,max_depth=3)
model_xgb.fit(X_train, y_train)
y_pred = model_xgb.predict(X_test)

print_report(model_xgb,X_test,y_test,y_pred)
xgb = round(accuracy_score(y_test, y_pred)*100, 2)

In [None]:
sm = SMOTEENN()
X_resample, y_resample = sm.fit_resample(X_train,y_train)
Xr_train,Xr_test,yr_train,yr_test = train_test_split(X_resample, y_resample,test_size=0.2)

model_xgb_smote= XGBClassifier(n_estimators=200,max_depth=3)
model_xgb_smote.fit(Xr_train, yr_train)
yr_pred = model_xgb_smote.predict(Xr_test)

print_report(model_xgb_smote,Xr_test,yr_test,yr_pred)
xgb_smote = round(accuracy_score(yr_test, yr_pred)*100, 2)

### h) Neural Network

In [None]:
import tensorflow as tf
from functools import partial
from tensorflow import keras

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=11)

def neural_net(X_train,y_train,X_test,y_test):
    model = keras.Sequential(
        [
            keras.layers.Dense(20,input_shape=(46,),activation='relu'),
            keras.layers.Dropout(0.3),
            keras.layers.Dense(10,activation='relu'),
            keras.layers.Dropout(0.3),
            keras.layers.Dense(5,activation='relu'),
            keras.layers.Dropout(0.3),
            keras.layers.Dense(1,activation='sigmoid'),
        ]
    )

    model.compile(optimizer ='adam',
                 loss='binary_crossentropy',
                 metrics=['accuracy'])

    # model.fit(x_train,y_train,epochs=5)
    model.fit(X_train,y_train,epochs=20,batch_size=16,verbose=0)


    loss, acc = model.evaluate(X_test, y_test,
                            batch_size=16)
    y_pred = model.predict(X_test)
    y_pred_actual = []
    for ele in y_pred:
        if ele > 0.5:
            y_pred_actual.append(1)
        else :
            y_pred_actual.append(0)


    print("Model Score : {} \n ".format(acc))
    print("Confusion Matrix : \n {}\n".format(classification_report(y_test,y_pred_actual)))
    print("Accuracy : {}".format(round(acc*100, 2)))
    nn_acc = round(acc*100, 2)
    return nn_acc 


nn = neural_net(X_train,y_train,X_test,y_test)

In [None]:
sm = SMOTEENN()
X_resample, y_resample = sm.fit_resample(X_train,y_train)
Xr_train,Xr_test,yr_train,yr_test = train_test_split(X_resample, y_resample,test_size=0.2)
nn_smote = neural_net(Xr_train,yr_train,Xr_test,yr_test)


In [None]:
models = pd.DataFrame({
    'Model':['Decision trees', 'Logistic Regression', 'Random Forest', 'PCA', 'AdaBoost', 'SVM',  'XGBoost','Neural Network'],
    'Accuracy_score' : [dt, lr, rf, pca, ada, svm, xgb,nn],
    'Smote_Accuracy_score' : [dt_smote, lr_smote, rf_smote, pca_smote, ada_smote, svm_smote, xgb_smote,nn_smote]
})
sns.barplot(x='Smote_Accuracy_score', y='Model', palette='icefire_r',data=models.sort_values(by='Smote_Accuracy_score',
                                                                         ascending=False, ignore_index=True))

models.sort_values(by='Smote_Accuracy_score', ascending=False, ignore_index=True)

All the models are giving very good performance and their accuracy seems to be very close to each other with XGBoost Leading in terms of performance. After applying SMOTE ENN the models performance jumps up significantly. XGBoost are giving us one of the top model performances. Hence we will be going to use XGBoost as our model for predicting Customer Churn. Lets create a pickle now and store our model in it.

In [None]:
import pickle
# Lets dump our Random Forests model
pickle.dump(model_xgb_smote, open('model.pkl','wb'))

In [None]:
load_model2=pickle.load(open('model.pkl','rb'))
load_model2.score(Xr_test,yr_test)

In [None]:
model_xgb_smote.predict_proba(Xr_test)

We can also predict on the entire dataset to calculate the probability of churning by using our model to predict the Churn. After that you can use Power BI to visualize beautiful graphs and bar charts containing all the information in a concise way.
<a href="https://github.com/VineetDabholkar2002/Customer-Churn-Predictor">Github Repo link containing Power BI Dashboard</a>

In [None]:
# For Power BI predictions
telco_pred= pd.read_csv("../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv")
telco_pred['TotalCharges']=pd.to_numeric(df_base['TotalCharges'],errors='coerce') #coerce puts NaN values if there are any parsing errors
telco_pred.dropna(inplace=True)
pred=model_xgb_smote.predict_proba(X)[:,1]*100
telco_pred['Predictions']=pred
telco_pred.to_csv('Telco-Churn-Predictions.csv')

## PowerBI Dashboard
a) Churners Profile
  <br>
<img src="https://user-images.githubusercontent.com/93699671/179417974-e5d0011f-040c-424e-bca1-3e0697cb0953.png" width=70%>
  
  
b) Churn Risks for each customer (Predicted using XGBoost model)
  <br>
<img src="https://user-images.githubusercontent.com/93699671/183242005-e3168178-b1b0-47b5-ac87-8f4a49151358.png" width=70%>

#### We have now successfully built a model by choosing XGBoost to predict Customer Churn. Using the predictor you will find that the predictions made by our model will match with our EDA we have done previously.

#### You can now easily host the predictor on Heroku by making use of the model.pkl file and even create a beautiful dashboard using PowerBI.

### <i>Thanks for reading !!! I wish you the best of luck in your future endeavors !!!<i>
