# Problem Statement

### Business Problem Overview

To expand business on the basis of revenue-generating clients, telco businesses must both draw in new ones and prevent contract terminations (=churn). When it comes to contract termination, clients may do so for a variety of reasons, including better price offers more enticing packages, negative service interactions, or changes in their circumstances.

Customers in the telecom sector have access to a variety of service providers and can actively switch from one operator to another. The telecoms business has an average annual churn rate of 15 to 25 percent in this fiercely competitive market. Customer retention has now surpassed customer acquisition in importance due to the fact that it is 5–10 times more expensive to gain new customers than to keep existing ones.

Churn analytics offers useful tools for estimating client churn and defining its fundamental causes of it. The percentage of consumers that cancel a product or service within a specified time frame is the most common way to display the churn indicator (mostly months).


### Business Objective

In order to minimize the customer churn rate, I will perform exploratory data analysis on customer-level data to identify the key signs of why customers are leaving the business.

In [3]:
# Impot Libraries

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

In [4]:
# Importing Dataset 

data = pd.read_csv('Telecom_Churn.csv')

In [5]:
# Viewing data's first 5 row

data.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
0,KS,128,415,No,Yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,No,Yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,No,No,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,Yes,No,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,Yes,No,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [6]:
# Chicking shape of data

data.shape

(3333, 20)

In [9]:
# Checking data types

data.dtypes

State                      object
Account length              int64
Area code                   int64
International plan         object
Voice mail plan            object
Number vmail messages       int64
Total day minutes         float64
Total day calls             int64
Total day charge          float64
Total eve minutes         float64
Total eve calls             int64
Total eve charge          float64
Total night minutes       float64
Total night calls           int64
Total night charge        float64
Total intl minutes        float64
Total intl calls            int64
Total intl charge         float64
Customer service calls      int64
Churn                        bool
dtype: object

In [10]:
# Ckechking null or missing values

data.isnull().sum()

State                     0
Account length            0
Area code                 0
International plan        0
Voice mail plan           0
Number vmail messages     0
Total day minutes         0
Total day calls           0
Total day charge          0
Total eve minutes         0
Total eve calls           0
Total eve charge          0
Total night minutes       0
Total night calls         0
Total night charge        0
Total intl minutes        0
Total intl calls          0
Total intl charge         0
Customer service calls    0
Churn                     0
dtype: int64

#### In our data there no null or missing values

In [15]:
# Checking unique values in each feature

data.nunique()

State                       51
Account length             212
Area code                    3
International plan           2
Voice mail plan              2
Number vmail messages       46
Total day minutes         1667
Total day calls            119
Total day charge          1667
Total eve minutes         1611
Total eve calls            123
Total eve charge          1440
Total night minutes       1591
Total night calls          120
Total night charge         933
Total intl minutes         162
Total intl calls            21
Total intl charge          162
Customer service calls      10
Churn                        2
dtype: int64

In [11]:
# Exploring Target Labels

data['Churn'].value_counts()

False    2850
True      483
Name: Churn, dtype: int64

#### Our data is imblance
Any data with unequal class distribution is technically imblance

In [12]:
# Getting Discrete and Continuous variables into two different list.

cont=[]
disc=[]

for col in data.columns:
    if data[col].nunique()>5:
        cont.append(col)
        
    else:
        disc.append(col)
        
print ('Continuous variables are ',cont)
print()
print('-----------------------------------')
print()
print ('Discrete variables are ' ,disc)

Continuous variables are  ['State', 'Account length', 'Number vmail messages', 'Total day minutes', 'Total day calls', 'Total day charge', 'Total eve minutes', 'Total eve calls', 'Total eve charge', 'Total night minutes', 'Total night calls', 'Total night charge', 'Total intl minutes', 'Total intl calls', 'Total intl charge', 'Customer service calls']

-----------------------------------

Discrete variables are  ['Area code', 'International plan', 'Voice mail plan', 'Churn']


In [13]:
for col in disc:
  print(data[col].value_counts())
  print('------------------------------------------')

415    1655
510     840
408     838
Name: Area code, dtype: int64
------------------------------------------
No     3010
Yes     323
Name: International plan, dtype: int64
------------------------------------------
No     2411
Yes     922
Name: Voice mail plan, dtype: int64
------------------------------------------
False    2850
True      483
Name: Churn, dtype: int64
------------------------------------------


In [16]:
for col in disc:
  print(data[col].value_counts())
  print('------------------------------------------')

415    1655
510     840
408     838
Name: Area code, dtype: int64
------------------------------------------
No     3010
Yes     323
Name: International plan, dtype: int64
------------------------------------------
No     2411
Yes     922
Name: Voice mail plan, dtype: int64
------------------------------------------
False    2850
True      483
Name: Churn, dtype: int64
------------------------------------------
