# Predicting Customer Churn in the Telecom Industry

### Table of Contents

##### 1. INTRODUCTION
##### 2. DATA PRE-PROCESSING
##### 3. EXPLORATORY DATA ANALYSIS
##### 4. MODEL - LOGISTIC REGRESSION
##### 5. MODEL - DECISION TREE
##### 6. MODEL - RANDOM FOREST
##### 7. CONCLUSION
##### 8. DEPLOYMENT / FUTURE WORK
##### 9. BIBLIOGRAPHY

1. INTRODUCTION

Customer attrition is an important predictor of the long-term success of a company. Often the top priority of any as service business is the retention of customers as keeping old customers is far more economical than acquiring new ones (Myler, 2016). Given the competitive nature of the telecom industry, long standing customers are more likely to not remain loyal and switch to a number of telecom companies within a geographical location. Therefore, many telecom companies have proactively started to look at the customer data to determine which customers are at risk of cancelling and take preventative measure to keep the customers.

Here is where data science can help by building predictive models for customer churn. The model can illustrate the rate of attrition, as well as provide valuable insights of how the churn rate varies over a period, product lines, customer classification and other changes. Furthermore, customer behavior and preferences vary immensely from one another and therefore a simple linear regression analysis may or may not work. This is where supervised learning methods will help to determine who will churn and who will not. This is a binary problem and supervised learning methods such as Logistic Regression, Decision Tree, Random Forest and SVM will be used to build predictor models to determine customers who are going to leave. The intended end use of the results of this project is to identify key features that determine customer churning and provide recommendations to retain the customer.

About the data:

The data is downloaded from IBM Watson Analytics (Stacker IV, 2015). The data being analyzed is from a telecommunication company to predict customer churn rate using telecom dataset. As explained above customer churn is when subscribers stop doing business or decide to switch from one company to another company.

•	The raw data has 7044 rows (customers) and 21 columns (features)

•	The data is divided into 20 attributes

    •	Services each customer has signed up for – phone, internet, multiple lines, online security, online security, online backup, device protection, tech support, streaming TV and movies
    
    •	Customer account information – customer tenure, contract duration, paperless billing, payment method, monthly charges and total charges
    
    •	Demographic information about the customer – gender, age, partner and dependents
    
•	The target variable is Churn – customers who have left the platform within the last month


2. DATA PRE-PROCESSING

Figure 1 below illustrates the structure of the Telco customer dataset.

![image-2.png](attachment:image-2.png)
Figure 1: Structure of the Telco customer data set

Next, the number of missing values in the dataset were determined.

![image-3.png](attachment:image-3.png)
Figure 2: Missing values in the telco dataset, TotalCharges have 11 missing values

From Figure 2, only TotalCharges have 11 missing values. Subsequently, the rows containing missing values were deleted using the code below to have continuous data.

![image-2.png](attachment:image-2.png)

From the structure of the data in Figure 1, there are some categorical variables that are categorized as ‘No’ and ‘No Internet Service’ or ‘No Phone Service’. These will be changed to ‘No’ only using the following code:

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

Change “No phone service” to “No” for column “MultipleLines”
![image-3.png](attachment:image-3.png)

Customer tenures will be grouped into 6 bins from zero to seventy-two months.
![image.png](attachment:image.png)

The columns which are not going to be used in building the model will be deleted.

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

3. EXPLORATORY DATA ANALYSIS

The categorical variables were plotted to analyze customer behavior.

![image.png](attachment:image.png)
Figure 3: Customer churn shown as a percent for the last month

Figure 3 is a bar plot of customer churn. Over 26% of customers have left the company within the last month.

![image.png](attachment:image.png)
Figure 4: Customer churn with respect to gender, Senior Citizen, Partner, Dependents, Phone Service and Multiple Lines

Figure 4 illustrates customer churn with respect to different demographics and customer service. Some of the learning from Figure 4 are:

•	The churn percent is approximately equal for Male and Females

•	The churn count is higher for Senior Citizens

•	Customers with Partners and Dependents have a lower churn count compared to customers who don’t have partners and dependents

![image.png](attachment:image.png)
Figure 5: Customer churn count with respect to different services the customer has signed up for

The learnings from Figure 5 are as follows:

•	Churn rate is higher for customers signed up for fiber optics internet service

•	Customers who are not signed up for services such as Online Security, Online Backup, Device Protection and Tech Support are more likely to leave the company than customers who are signed up for these services


![image.png](attachment:image.png)
Figure 6: Customer churn count with respect to streaming movies, contract, paperless billing and payment method

The learnings from Figure 6 are described below:

•	Customers who have a month-to-month contract are more likely to leave the platform compared to customers who are on a yearly or a two-year contract

•	Customers who are subscribed for paperless billing have a higher churn count than those without paperless

•	A higher percentage of customers who pay by electronic check are more likely to leave the company compared to other methods

Next, the median of the tenure, monthly charges and total charges were plotted to analyze the effect of numerical variables on customer churn.

![image.png](attachment:image.png)
Figure 7: Median tenure for customer churn

Figure 7 shows the median for a customer who leaves the company is around 10 months.

![image-2.png](attachment:image-2.png)
Figure 8: Median of monthly charges for customer churn

Figure 8 demonstrates the median for customers who left in the last is above $75 and their monthly charges were higher than those stayed. 

![image-3.png](attachment:image-3.png)
Figure 9: Median of total charges for customer churn

Figure 9 shows customer who left in the past month had a lower median total charge. This also highlights that these customers were signed up for fewer services than customers who stayed.

![image-4.png](attachment:image-4.png)
Figure 10: Correlation plot of numerical variables

Analyzing Figure 10 shows a positive correlation between Total Charges, Monthly Charges and tenure.

![image-5.png](attachment:image-5.png)
Figure 11: Distribution of customer tenure from zero to six years

From Figure 11 the maximum number of customers are in 0-1 year and 5-6 years.

4. MODEL - LOGISTIC REGRESSION

Logistic regression is a supervised learning method for fitting a regression curve, y=f(x) (Alice, 2015). The typical use of logistic regression is to predict y, a categorical variable, given a set of x values. The predictor values, x, can be continuous, categorical or both. The categorical variable can have different values. In the simplest case, y is binary meaning that it can either have a value of 0 or 1 (Alice, 2015). However, logistic regression can also be used to do multinomial logistic regression. For example, classifying 4, 6 and 8 cylinder cars.

The fist model will be built using Logistic regression using all the variables. The data was divided into 70% training and 30% testing.

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)

The first model has several variables which have a high p-value, therefore a subset of variables needs to be determined to give the optimal model. We use a significance level of 0.05 to denote a statistically significant variable. P-values denotes the significance level of the predictive variable. A value of less than 0.05 to denote a statistically significant variable and above 0.05 means the variable is insignificant and there is a possibility that this variable may even contain a zero.

This will be done by using AIC. AIC is an iterative process of adding and removing variables until a subset of variables is found giving the best performing model.

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

There are still two variables with high p-value, online security and device protection. These two features will also be removed from the model.

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

Model 3 has all the significant variables and it will be used to make the prediction on the test dataset.

Using a 50% cutoff point, the following confusion matrix for Logistic Regression is obtained

![image.png](attachment:image.png)


The accuracy, sensitivity and specificity of the model will be determined

![image.png](attachment:image.png)

The model has a good accuracy and specificity but poor sensitivity. Accuracy is the degree to which the model in this case conforms to the correct value. Sensitivity measures the proportion of customer that are correctly identified as leaving the telecom company (Baratloo, Hosseini, Negida, & Ashal, 2015). Specificity, on the other hand, measures the proportion of customers that are correctly identified as not leaving the company (Baratloo, Hosseini, Negida, & Ashal, 2015). 

We need to find the optimal probability cutoff which gives the maximum accuracy, sensitivity and specificity.


![image.png](attachment:image.png)
Figure 15: Plot of sensitivity, specificity and accuracy for LogModel3. The crossover happens at cutoff point of 0.305.

The maximum accuracy, sensitivity and specificity is found to be at 0.305. Using a cutoff point of 0.305, accuracy, sensitivity and specificity will be recalculated.
![image-2.png](attachment:image-2.png)

Logistic regression model with a cutoff at 0.305 gives a better accuracy, sensitivity and specificity.

5. MODEL - DECISION TREE

Decision tree splits the data into branches and sub-branches to arrive at a prediction. The algorithm builds the tree by finding the variable that best separates the data into two groups (Roth, 2016). Decision tree uses homogeneity as the basis to determine attribute as to where the split should happen (Roth, 2016). Splits that results into the most homogeneous subset are considered better and every following subset is chosen such that it maximizes the homogeneity of each subset (Roth, 2016). The biggest advantage of decision tree is that it is a very intuitive and can even be understood by people with no experience in predictive modeling.

The decision tree model will be built using all the variables. Below are the results using all the variables.

![image.png](attachment:image.png)

The decision tree results in a slightly better accuracy, 76.9%, and specificity, 85.3%, compared to logistic regression. However, it has a lower specificity of 56% compared to 76% from logistic regression.

For illustration purposes, only the top three important features will be used in building the decision tree. The top three most-relevant features include contract, tenure and monthly charges.

![image.png](attachment:image.png)

Some of the takeaways from Figure 16 are as follows:

1.	Contract is the most important variable to predict if customer will stay or leave
2.	Customers who are on a one year or two year contract, regardless of the monthly charge, are less likely to churn 
3.	Customers who are in a month-to-month contract, in a tenure of 0 – 1 year and have high monthly charges are more like to churn.


6. MODEL - RANDOM FOREST

Random forest is similar to a large number of Decision Trees and uses bootstrapped aggregation techniques to select random samples from a dataset to train each tree in the forest. The predictions using random forest are made by aggregating the prediction of individual trees. One of the advantages of this technique is that it reduces the Variance in the trees by averaging them (Walia, 2017). This also called the out of the bag error (OOB). The benefit of averaging the trees is that it reduces variance and improves the performance of decision trees on test sets and avoid overfitting (Walia, 2017). 

Random forest first model
![image-2.png](attachment:image-2.png)

The first model has a low error for predicting ‘No’ and has a high error when predicting yes. The out of the bag error (OOB) is 20.37%, this means the model has 79.63% out of the sample accuracy. Next, the prediction and accuracy of the model will be evaluated using confusion matrix. 

![image.png](attachment:image.png)

Random Forest has a high overall accuracy than both logistic regression and decision tree at 70.08%. It also has a better sensitivity at 63.25% compared to logistic regression and decision tree at 75% and 56% respectively. Lastly, its specificity is similar to decision tree at 83.4% but higher than logistic regression which was 76%.

RANDOM FOREST ERROR RATE

The model accuracy and sensitivity will be improved by reducing the OOB error. This will be determined by finding the optimal number of trees and mtry.

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)
Figure 17: Random forest error rate with respect to the number of trees

From Figure 17, the OOB error decreases and becomes stable after 200 trees.

Next, the random forest will be tuned to find mtry which again gives the lowest OOB error rate.

![image.png](attachment:image.png)
Figure 18: OOB Error rate plot against different feature variable. 

From Figure 17 and Figure 18 the random forest model will be tuned with number of trees = 200 and mtry = 2.

Fitting random forest model after tuning

![image-2.png](attachment:image-2.png)

The OOB error has dropped from 20.37% to 19.8%.

Random forest predictions and confusion matrix after tuning

![image.png](attachment:image.png)

After tuning, the overall accuracy and specificity have stayed the same, but sensitivity of the model has improved from 63% to 65%.

Random Forest Variable Importance

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)
Figure 19: The top ten important features in random forest model

Figure 19 is the variable importance plot. It shows most significant attribute in decreasing by mean decrease in accuracy and in Gini. MeanDecreaseAccuracy shows how much the model accuracy decreases by leaving out that variable. The MeanDecreaseGini measures how pure the nodes are at the end of the tree. The higher the value of mean decrease accuracy or mean decrease gini, the higher the importance of that variable in building model. From Figure 19 tenure_group, Contract and MonthlyCharges are the most variables in both MeanDecreaseAccuracy and MeanDecreaseGini.

7. CONCLUSION

The summary of the three models are as follows:
Logistic Regression:

•	Accuracy 75.33%,

•	Sensitivity 75.71%

•	Specificity 75.19%

Decision Trees:

•	Accuracy 76.90%,

•	Sensitivity 55.60%

•	Specificity 85.39%

Random Forest:

•	Accuracy 79.60%,

•	Sensitivity 65.93%

•	Specificity 82.88%

Checking the area under curve (AUC) for all three models

![image.png](attachment:image.png)
Figure 20: AUC plot for logistic regression, decision tree and random forest models

The AUC is a measure of the power of different prediction models. Based on the AUC results from Figure 20, logistic regression is the best model to predict churn, followed closely by decision tree and random forest.

Key findings from the analysis:

•	Features such as tenure_group, Contract, PaperlessBilling, MonthlyCharges and InternetService appear to play a role in customer churn.

•	Customers in a month-to-month contract, with high MonthlyCharges, and are within 0-1 year tenure, are more likely to churn. On the other hand, customers with one- or two-year contract, with longer than 1 year tenure, with a lower MonthlyCharge, are less likely to churn.


8. DEPLOYMENT / FUTURE WORK

The results from this project can be given to the business and marketing department of the telecom company to create customer retention programs. 

Some examples could be:

•	Lowering the monthly bill for customers in the 0 -1 year tenure

•	Giving incentives to customers to sign-up for one or two year contract as they will pay less in the long term

Furthermore, it is recommended to analyse the data of previous months to make the model more robust with over 90% accuracy, sensitivity and specificity. With a higher confidence, the data scientist and business teams can come up with better customer retention programs.


9. BIBLIOGRAPHY

Alice, M. (2015, September 15). How to perform a Logistic Regression in R. Retrieved from R bloggers: https://www.r-bloggers.com/how-to-perform-a-logistic-regression-in-r/

Baratloo, A., Hosseini, M., Negida, A., & Ashal, E. (2015). Baratloo, A., Hosseini, M., Negida, A., & El Ashal, G. (2015). Part 1: Simple Definition and Calculation of Accuracy, Sensitivity and Specificity. Emergency (Tehran, Iran), 3(2), 48-9. Emergency 3(2), 48-9.

Kodali, T. (2015, December 15). Using Decision Trees to Predict Infant Birth Weights. Retrieved from R Bloggers: https://www.r-bloggers.com/using-decision-trees-to-predict-infant-birth-weights/

Myler, L. (2016, June 8). Acquiring New Customers Is Important, But Retaining Them Accelerates Profitable Growth. Retrieved from Forbes: https://www.forbes.com/sites/larrymyler/2016/06/08/acquiring-new-customers-is-important-but-retaining-them-accelerates-profitable-growth/#3f92b1ec6671

Roth, D. (2016, September 8). Decision Trees. Retrieved from CS 446 Machine Learning Fall 2016: http://l2r.cs.uiuc.edu/Teaching/CS446-17/LectureNotesNew/dtree/main.pdf

Stacker IV, M. (2015, April 2). Guide to Sample Data Sets. Retrieved from IBM Access Watson Analytics: https://www.ibm.com/communities/analytics/watson-analytics-blog/guide-to-sample-datasets/

Walia, A. S. (2017, July 24). Random Forests in R. Retrieved from R Bloggers: https://www.r-bloggers.com/random-forests-in-r/


