<a href="https://colab.research.google.com/github/visapthakur/telecom-churn-prediction-individual-eda/blob/main/individual_telecom_churn_eda.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - 



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual/Team
##### **TEAM MEMBER**     - VISHAL TOMER 

# **Project Summary -**

Write the summary here within 500-600 words.

Customer churn, also known as customer retention, customer turnover, or customer defection, is the loss of clients or customers.

Telephone service companies, Internet service providers, pay TV companies, insurance firms, and alarm monitoring services, often use customer attrition analysis and customer attrition rates as one of their key business metrics because the cost of retaining an existing customer is far less than acquiring a new one. Companies from these sectors often have customer service branches which attempt to win back defecting clients, because recovered long-term customers can be worth much more to a company than newly recruited clients.

Companies usually make a distinction between voluntary churn and involuntary churn. Voluntary churn occurs due to a decision by the customer to switch to another company or service provider, involuntary churn occurs due to circumstances such as a customer's relocation to a long-term care facility, death, or the relocation to a distant location. In most applications, involuntary reasons for churn are excluded  from the analytical models. Analysts tend to concentrate on voluntary churn, because it typically occurs due to factors of the company-customer relationship which companies control, such as how billing interactions are handled or how after-sales help is provided.

predictive analytics use churn prediction models that predict customer churn by assessing their propensity of risk to churn. Since these models generate a small prioritized list of potential defectors, they are effective at focusing customer retention marketing programs on the subset of the customer base who are most vulnerable to churn.



# **GitHub Link -**

Provide your GitHub Link here:-       https://github.com/visapthakur/telecom-churn-prediction-individual-eda.git

# **Problem Statement**


**Write Problem Statement Here.**

Exploratory Data Analysis (EDA) is an approach to analyse data. The first and foremost task that the data analysts does is to view the data and tries to make some sense out of it. Later we figure out what questions we want to ask and how to use the available data to get the answers we need.

EDA helps us to:

Delve into the data set
2)Examine the relationships among the variables 3)Identify any interesting observation 4) Develop an initial idea of possible associations among the predictors and the target variable.

The telecom market in the US is saturated and customer growth rates are low. They key focus of market players therefore is on retention and churn control. This project explores the churn dataset to identify the key drivers of churn and grab key insights from the dataset.
In the telecom industry, customers are able to choose from multiple service providers and actively switch from one operator to another. In this highly competitive market, the telecommunications industry experiences an average of 15-25% annual churn rate. Given the fact that it costs 5-10 times more to acquire a new customer than to retain an existing one, customer retention has now become even more important than customer acquisition.

For many incumbent operators, retaining high profitable customers is the number one business goal.

To reduce customer churn, telecom companies need to predict which customers are at high risk of churn.

In this project, we will analyse customer-level data of a leading telecom firm, build predictive models to identify customers at high risk of churn and identify the main indicators of churn.

#### **Define Your Business Objective?**

Answer Here.

This the demonstration in R to analyze all relevant customer data and predict Customer churn.
The dataset contains customer-level information for a span of four consecutive months - June, July, August and September. The months are encoded as 6, 7, 8 and 9, respectively.

The business objective is to predict the churn in the last (i.e. the ninth) month using the data (features) from the first three months. To do this task well, understanding the typical customer behaviour during churn will be helpful.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive/')

In [None]:
pd.set_option("display.precision", 2)

In [None]:
path = '/content/drive/My Drive/'

### Dataset First View

In [None]:
# Dataset First Look
df = pd.read_csv(path + 'Telecom Churn.csv')
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(df.shape)

### Dataset Information

In [None]:
# Dataset Info
print(df.info())

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
print("Duplicate entry in df:",len(df[df.duplicated()])) 

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
#to display all the graph in the workbook 
%matplotlib inline
sns.set_style("whitegrid",{'grid.linestyle': '--'})
import warnings
warnings.filterwarnings("ignore")
plt.figure(figsize=(14, 5))
sns.heatmap(df.isnull(), cbar=True, yticklabels=False)
plt.xlabel("column_name", size=14, weight="bold")
plt.title("missing values in column",fontweight="bold",size=17)
plt.show()

### What did you know about your dataset?

Answer Here


we dont have any missing values in our dataset of telecom churn prediction or we have to deal with it in future


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print(df.columns)

**we have these all columns in our dataset**


In [None]:
# Dataset Describe
df["Churn"] = df["Churn"].astype("int64")
df.describe()


In [None]:
df.describe(include=["object", "bool"])

In [None]:
df["Churn"].value_counts()

In [None]:
df["Churn"].value_counts(normalize=True)

In [None]:
df.sort_values(by=["Churn", "Total day charge"], ascending=[True, False]).head()

In [None]:
df["Churn"].mean()

In [None]:
df[df["Churn"] == 1].mean()

In [None]:
df[df["Churn"] == 1]["Total day minutes"].mean()

In [None]:
df[(df["Churn"] == 0) & (df["International plan"] == "No")]["Total intl minutes"].max()

In [None]:
df.loc[0:5, "State":"Area code"]

In [None]:
df.loc[0:5, "State":"Area code"]

In [None]:
df.iloc[0:5, 0:3]

In [None]:
df[-1:]

In [None]:

df.apply(np.max)

In [None]:
df[df["State"].apply(lambda state: state[0] == "W")].head()

In [None]:
d = {"No": False, "Yes": True}
df["International plan"] = df["International plan"].map(d)
df.head()

In [None]:
df = df.replace({"Voice mail plan": d})
df.head()

In [None]:
columns_to_show = ["Total day minutes", "Total eve minutes", "Total night minutes"]

df.groupby(["Churn"])[columns_to_show].describe(percentiles=[])

In [None]:
columns_to_show = ["Total day minutes", "Total eve minutes", "Total night minutes"]

df.groupby(["Churn"])[columns_to_show].agg([np.mean, np.std, np.min, np.max])

In [None]:
pd.crosstab(df["Churn"], df["International plan"])

In [None]:
pd.crosstab(df["Churn"], df["Voice mail plan"], normalize=True)

In [None]:
df.pivot_table(
    ["Total day calls", "Total eve calls", "Total night calls"],
    ["Area code"],
    aggfunc="mean",
)

In [None]:
total_calls = (
    df["Total day calls"]
    + df["Total eve calls"]
    + df["Total night calls"]
    + df["Total intl calls"]
)
df.insert(loc=len(df.columns), column="Total calls", value=total_calls)

Here  we have all total calls in days,nights and eve and intl also

In [None]:
df.head()

In [None]:
df["Total charge"] = (
    df["Total day charge"]
    + df["Total eve charge"]
    + df["Total night charge"]
    + df["Total intl charge"]
)
df.head()

In [None]:
df.drop(["Total charge", "Total calls"], axis=1, inplace=True)

In [None]:
df.drop([1, 2]).head()

### Variables Description 

Answer Here 
we have total charge of calls here 


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique().sort_values(ascending=True)

we have unique values in every column name 

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df_left= pd.merge(df, df, on="Total day calls", how= "left")
print(df_left.shape)
print(f"Total number of null values obtained from left join: {df.isna().sum().sum()}")

# checking shape of dataset using right join
df_right= pd.merge(df, df, on="Total night calls", how= "right")
print(df_right.shape)
print(f"Total number of null values obtained from right join: {df_right.isna().sum().sum()}")

In [None]:
df.info

### What all manipulations have you done and insights you found?

Answer Here.
Merging datasets: We don't want to compromise with quality and quantity of our dataset in order to get the best accuracy in ML model implementation. So, we were wondering to use the best join for the good results and we got to know with our R&D that every join is giving the same shape of our merged dataset with 0 null values..

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
import missingno as msno
msno.matrix(df,labels=[df.columns],figsize=(30,16),fontsize=12)

##### 1. Why did you pick the specific chart?

Answer Here.
We picked this chart as it shows telecom churn the observations are high or low .

##### 2. What is/are the insight(s) found from the chart?

Answer Here
Avarage telecom churn  value is under 3333 and graph is rightly skewed, which shows most of the calls in data.



##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here
I Matrix plotted graph for data completeness,telecom churn  with the help of which we came to know that many competiting calls are densly making


#### Chart - 2

In [None]:
# Chart - 2 visualization code
import matplotlib.pyplot as plt
import seaborn as sns
%config InlineBackend.figure_format = 'retina'
sns.countplot(x="International plan", hue="Churn", data=df);

##### 1. Why did you pick the specific chart?

Answer Here.


with help of graph we can find the count of churn in international plan


##### 2. What is/are the insight(s) found from the chart?

Answer Here

i found that here in international plan count are above 2500 in churn analysis

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

make more international calls in other to increase the counts of international plan  in churn

#### Chart - 3

In [None]:
# Chart - 3 visualization code
pd.crosstab(df["Churn"], df["Customer service calls"], margins=True)

In [None]:
sns.countplot(x="Customer service calls", hue="Churn", data=df);


##### 1. Why did you pick the specific chart?

Answer Here.


To find the count of customer service calls 


##### 2. What is/are the insight(s) found from the chart?

Answer Here


After plotting the graph i find that count of customer service calls in churn is above 1000.


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code
df["Many_service_calls"] = (df["Customer service calls"] > 3).astype("int")

pd.crosstab(df["Many_service_calls"], df["Churn"], margins=True)

In [None]:
sns.countplot(x="Many_service_calls", hue="Churn", data=df);

##### 1. Why did you pick the specific chart?

Answer Here.

to analysis the many service calls  count in churn data

##### 2. What is/are the insight(s) found from the chart?

Answer Here

i found that many service calls are under 1 which is much less in comparison to churn of 2500 counts

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here


increase service of calls to get counts in churn datasets

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

Answer Here.



you have to work on every column for every value thropugh which increasing calls are increase growth in bussiness by providing a best service to customers for getting  no of count incresed  in lesser value of columns


# **Conclusion**

Write the conclusion here.




Let us consider some of the insights we have gained into the churn data set through the use of exploratory data analysis.

The four charge fields are linear functions of the minute fields.

The area code field and/or the state field are anomalous, and can be omitted.

The correlations among the remaining predictor variables are weak, allowing us to retain them all for any data mining model.

Insights with respect to churn:

Customers with the International Plan tend to churn more frequently.

Customers with four or more customer service calls churn more than four times as often as do the other customers.

Customers with high day minutes and evening minutes tend to churn at a higher rate than do the other customers.

There is no obvious association of churn with the variables day calls, evening calls, night calls, international calls, night minutes, international minutes, account length, or voice mail messages.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***