EST Project

A telecom company wants to use their historical customer data to predict behaviour to retain customers. You can
analyse all relevant customer data and develop focused customer retention programs.

**DATA DESCRIPTION:** 

Each row represents a customer, each column contains customer’s attributes described on the column Metadata. The data set includes information about:

• Customers who left within the last month – the column is called Churn

• Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device
protection, tech support, and streaming TV and movies

• Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly
charges, and total charges

• Demographic info about customers – gender, age range, and if they have partners and dependents

**PROJECT OBJECTIVE:**

Build a model that will help to identify the potential customers who have a higher probability to churn.
This help the company to understand the painpoints and patterns of customer churn and will increase the focus on strategising
customer retention.

In [1]:
import pandas as  pd
import seaborn as sns
import matplotlib.pyplot as plt

import numpy as np
from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import confusion_matrix,classification_report
from sklearn.metrics import plot_confusion_matrix

In [2]:
%matplotlib inline
sns.set(color_codes=True)

class color:
   PURPLE = '\033[95m'
   CYAN = '\033[96m'
   DARKCYAN = '\033[36m'
   BLUE = '\033[94m'
   GREEN = '\033[92m'
   YELLOW = '\033[93m'
   RED = '\033[91m'
   BOLD = '\033[1m'
   UNDERLINE = '\033[4m'
   END = '\033[0m'

# 1. Import Data 

In [3]:
churnData = pd.read_csv('TelcomCustomer-Churn.csv')


In [4]:
churnData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


In [5]:
churnData.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
SeniorCitizen,7043.0,0.162147,0.368612,0.0,0.0,0.0,0.0,1.0
tenure,7043.0,32.371149,24.559481,0.0,9.0,29.0,55.0,72.0
MonthlyCharges,7043.0,64.761692,30.090047,18.25,35.5,70.35,89.85,118.75


In [6]:
churnData.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [7]:
churnData.gender.value_counts()

Male      3555
Female    3488
Name: gender, dtype: int64

In [8]:
churnData.Partner.value_counts()

No     3641
Yes    3402
Name: Partner, dtype: int64

In [13]:
churnData['Dependents'].unique()

array(['No', 'Yes'], dtype=object)

In [20]:
catCols

Index(['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'PhoneService',
       'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup',
       'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies',
       'Contract', 'PaperlessBilling', 'PaymentMethod', 'Churn'],
      dtype='object')

In [28]:
catCols = churnData.columns
catCols = catCols.drop('customerID').drop('tenure').drop('MonthlyCharges').drop('TotalCharges')
for col in catCols:
    print("\n---------------"+col+"------------\n")
    print(churnData[col].value_counts())


---------------gender------------

Male      3555
Female    3488
Name: gender, dtype: int64

---------------SeniorCitizen------------

0    5901
1    1142
Name: SeniorCitizen, dtype: int64

---------------Partner------------

No     3641
Yes    3402
Name: Partner, dtype: int64

---------------Dependents------------

No     4933
Yes    2110
Name: Dependents, dtype: int64

---------------PhoneService------------

Yes    6361
No      682
Name: PhoneService, dtype: int64

---------------MultipleLines------------

No                  3390
Yes                 2971
No phone service     682
Name: MultipleLines, dtype: int64

---------------InternetService------------

Fiber optic    3096
DSL            2421
No             1526
Name: InternetService, dtype: int64

---------------OnlineSecurity------------

No                     3498
Yes                    2019
No internet service    1526
Name: OnlineSecurity, dtype: int64

---------------OnlineBackup------------

No                     3088
Y