<a href="https://colab.research.google.com/github/veeragouda/aaradhyapatil.2015-gmail.com/blob/master/churn_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Python | Customer Churn Analysis Prediction
Last Updated : 23 Mar, 2020
Customer Churn
It is when an existing customer, user, subscriber, or any kind of return client stops doing business or ends the relationship with a company.

Types of Customer Churn –

Contractual Churn : When a customer is under a contract for a service and decides to cancel the service e.g. Cable TV, SaaS.
Voluntary Churn : When a user voluntarily cancels a service e.g. Cellular connection.
Non-Contractual Churn : When a customer is not under a contract for a service and decides to cancel the service e.g. Consumer Loyalty in retail stores.
Involuntary Churn : When a churn occurs without any request of the customer e.g. Credit card expiration.
Reasons for Voluntary Churn

Reasons for Voluntary Churn
Lack of usage;
Poor service;
Better price.

In [None]:
# Import required libraries 
import numpy as np 
import pandas as pd 
  
# Import the dataset 
dataset = pd.read_csv('telcochurndata.csv') 
  
# Glance at the first five records 
dataset.head() 
  
# Print all the features of the data 
dataset.columns


In [None]:
# Churners vs Non-Churners 
dataset['Churn'].value_counts() 

Exploratory Data Analysis on Telco Churn Dataset

Code : To find the number of churners and non-churners in the dataset:

In [None]:
## Group data by 'Churn' and compute the mean 
print(dataset.groupby('Churn')['Customer service calls'].mean()) 

## To group data by Churn and compute the mean to find out if churners make more customer service calls than non-churners:


In [None]:
# Count the number of churners and non-churners by State 
print(dataset.groupby('State')['Churn'].value_counts()) 

## To find out if one State has more churners compared to another.

In [None]:
 ### Exploring Data Visualizations : To understand how variables are distributed.

# Import matplotlib and seaborn 
import matplotlib.pyplot as plt 
import seaborn as sns 
  
# Visualize the distribution of 'Total day minutes' 
plt.hist(dataset['Total day minutes'], bins = 100) 
  
# Display the plot 
plt.show() 


In [None]:
# Create the box plot 
sns.boxplot(x = 'Churn', 
            y = 'Customer service calls', 
            data = dataset, 
            sym = "",                   
            hue = "International plan")  
# Display the plot 
plt.show()

## o visualize the difference in Customer service calls between churners and non-churners

In [None]:
### Data Preprocessing for Telco Churn Dataset

### Many Machine Learning models make certain assumptions about how the data is distributed. Some of the assumptions are as follows:

The features are normally distributed
The features are on the same scale
The datatypes of features are numeric
In telco churn data, Churn, Voice mail plan, and, International plan, in particular, are binary features that can easily be converted into 0’s and 1’s

# Features and Labels 
X = dataset.iloc[:, 0:19].values 
y = dataset.iloc[:, 19].values # Churn 
  
# Encoding categorical data in X 
from sklearn.preprocessing import LabelEncoder 
  
labelencoder_X_1 = LabelEncoder() 
X[:, 3] = labelencoder_X_1.fit_transform(X[:, 3]) 
  
labelencoder_X_2 = LabelEncoder() 
X[:, 4] = labelencoder_X_2.fit_transform(X[:, 4]) 
  
# Encoding categorical data in y 
labelencoder_y = LabelEncoder() 
y = labelencoder_y.fit_transform(y) 

### Encoding State feature using One hot encoding

In [None]:
# Removing extra column to avoid dummy variable trap 
X_State = pd.get_dummies(X[:, 0], drop_first = True) 
  
# Converting X to a dataframe 
X = pd.DataFrame(X) 
  
# Dropping the 'State' column 
X = X.drop([0], axis = 1) 
  
# Merging two dataframes 
frames = [X_State, X] 
result = pd.concat(frames, axis = 1, ignore_index = True) 
  
# Final dataset with all numeric features 
X = result 

### To Create Training and Test sets


In [None]:
# Splitting the dataset into the Training and Test sets 
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y,  
                                                    test_size = 0.2,  
                                                    random_state = 0) 
### To scale features of the training and test sets

In [None]:
# Feature Scaling 
from sklearn.preprocessing import StandardScaler 
sc = StandardScaler() 
X_train = sc.fit_transform(X_train) 
X_test = sc.transform(X_test) 
### Code: To train a Random Forest classifier model on the training set.

In [None]:
# Import RandomForestClassifier 
from sklearn.ensemble import RandomForestClassifier 
  
# Instantiate the classifier 
clf = RandomForestClassifier() 
  
# Fit to the training data 
clf.fit(X_train, y_train) 
## Code : Making Predictions

In [None]:
# Predict the labels for the test set 
y_pred = clf.predict(X_test) 
Code: Evaluating Model Performance

In [None]:
# Compute accuracy 
from sklearn.metrics import accuracy_score 
  
accuracy_score(y_test, y_pred)

In [None]:
## Confusion matrix
from sklearn.metrics import confusion_matrix 
print(confusion_matrix(y_test, y_pred)) 
