<a href="https://colab.research.google.com/github/yogesh1199/Projects/blob/main/EDA_Submission.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **TELECOM CHURN ANALYSIS**



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member** -*Yogesh Sharma*

# **Project Summary -**

Write the summary here within 500-600 words.

**The Orange Telecom Churn Dataset** provides valuable insights into customer behavior and factors contributing to churn in the telecommunications industry. This analysis aims to uncover key patterns, identify potential reasons for customer churn, and propose recommendations to enhance customer retention strategies.

The dataset comprises several important features that offer a comprehensive view of customer activity and behavior:

**State and Area Code**: These columns represent the geographic location of the customers. While state information might indicate regional variations, area codes could provide insights into localized patterns.

**Account Length**: This feature represents the duration for which a customer has subscribed to the service. Longer account lengths might indicate customer loyalty and satisfaction.

**International Plan and Voice Mail Plan**: These categorical features highlight whether customers have subscribed to additional services. Customers with an international plan might be less likely to churn due to having specific needs that the plan addresses. On the other hand, the voice mail plan might indicate engagement and communication.

**Number of Voice Mail Messages**: This metric indicates how actively customers engage with voice mail services. Higher engagement might indicate better communication and satisfaction.

**Total Day, Evening, and Night Minutes, Calls, and Charges**: These metrics reflect the extent of customer engagement at different times of the day. Unusually high charges or call volumes might indicate dissatisfaction.

**Total International Minutes, Calls, and Charges**: Similar to domestic calls, international calls and charges can offer insights into specialized customer needs and their satisfaction level.

**Customer Service Calls**: The number of calls made to customer service is an important indicator. Frequent calls might signify unresolved issues, which could contribute to churn.

**Churn**: This binary label indicates whether a customer has churned or not. This is the target variable that we aim to predict and understand the contributing factors.


Analyzing this data requires a multi-faceted approach, involving exploratory data analysis and predictive modeling:

**Exploratory Data Analysis (EDA)**:
By visually exploring distributions, correlations, and trends within the data, we can uncover valuable insights. For instance, we could visualize the distribution of customer service calls and their relationship with churn. EDA might reveal that customers with more service calls are more likely to churn.

# **GitHub Link -**

Provide your GitHub Link here.

https://github.com/yogesh1199/Projects

# **Problem Statement**


**The Orange Telecom Churn Dataset** provides valuable insights into customer behavior and factors contributing to churn in the telecommunications industry. Churn, in this context, refers to the phenomenon where customers cancel their subscriptions to the telecom service. This analysis aims to uncover key patterns, identify potential reasons for customer churn, and propose recommendations to enhance customer retention strategies.

The dataset comprises a variety of columns that offer insights into customer activity and behavior, including state, account length, area code, international plan, voice mail plan, usage metrics during different times of the day and night, customer service calls, and the churn status.

The goal of this analysis is to understand the factors that contribute to customer churn and provide actionable insights to the telecom company. By analyzing the dataset, we will explore correlations between different features and the likelihood of churn.



#### **Define Your Business Objective?**

we will Perform Exploratory Data Analysis (EDA) to understand the distribution of features, identify patterns, and visualize relationships between variables.
Identify correlations and patterns associated with churn. Which features are strongly correlated with churn?

we will provide the telecom company with insights that can guide their business decisions, customer communication strategies, and overall customer experience enhancement.

This analysis will not only contribute to reducing customer churn but also help the company tailor its services to meet the needs of its customers more effectively.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
## added github raw link for dataset so that it could not ask for permission unlike in google drive
url = "https://raw.githubusercontent.com/yogesh1199/Projects/main/Telecom%20Churn.csv"
telecome_df = pd.read_csv(url)

### Dataset First View

In [None]:
# Dataset First Look
## featching top 10 rows for dataset
telecome_df.head(10)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
data_info = telecome_df.shape
rows = data_info[0]
cols = data_info[1]
print("data shape",data_info)
print("NO. Of rows: ",rows)
print("NO. Of cols: ",cols)

### Dataset Information

In [None]:
# Dataset Info
## Here, we are retrieving information related to checking the existence of null values and obtaining information about the data types of each and every column.
telecome_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows = telecome_df[telecome_df.duplicated(keep=False)]
if len(duplicate_rows) > 0:
  duplicate_rows
else:
  print("No duplicate values in Dataset")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
telecome_df.isna().sum()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 6))
plt.imshow(telecome_df.isnull(), cmap='viridis', aspect='auto')
plt.xticks(range(len(telecome_df.columns)), telecome_df.columns, rotation=90)
plt.colorbar(label='Missing Values')
plt.title('Missing Value Heatmap')

plt.show()

### What did you know about your dataset?

Based on the information we fetch from dataset, here's what we can infer about it:

- The dataset contains 3333 entries (rows) with 22 columns.
- The columns in the dataset are named: "State," "Account length," "Area code," "International plan," "Voice mail plan," "Number vmail messages," "Total day minutes," "Total day calls," "Total day charge," "Total eve minutes," "Total eve calls," "Total eve charge," "Total night minutes," "Total night calls," "Total night charge," "Total intl minutes," "Total intl calls," "Total intl charge," "Customer service calls," and "Churn." which contains Non Null values.
- The dataset include information about telecom customer activities and behavior, including usage metrics, plan features, and whether a customer has churned.
- The data types of the columns include boolean (`bool`), integer (`int64`), float (`float64`), and object (`object`).
- The "Churn" column appears to be of boolean data type (`bool`), which indicate whether a customer has churned (`True`) or not (`False`).

Based on the information retreived, the dataset is related to telecom customer behavior analysis, particularly regarding factors that influence customer churn. The dataset includes various features related to customer activity and plan details, and the "Churn" column  serves as the target variable to predict customer churn.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
telecome_df.columns

In [None]:
# Dataset Describe
telecome_df.describe()

### Variables Description

The dataset comprises several important features that offer a comprehensive view of customer activity and behavior:

**State and Area Code**: These columns represent the geographic location of the customers. While state information might indicate regional variations, area codes could provide insights into localized patterns.

**Account Length**: This feature represents the duration for which a customer has subscribed to the service. Longer account lengths might indicate customer loyalty and satisfaction.

**International Plan and Voice Mail Plan**: These categorical features highlight whether customers have subscribed to additional services. Customers with an international plan might be less likely to churn due to having specific needs that the plan addresses. On the other hand, the voice mail plan might indicate engagement and communication.

**Number of Voice Mail Messages**: This metric indicates how actively customers engage with voice mail services. Higher engagement might indicate better communication and satisfaction.

**Total Day, Evening, and Night Minutes, Calls, and Charges**: These metrics reflect the extent of customer engagement at different times of the day. Unusually high charges or call volumes might indicate dissatisfaction.

**Total International Minutes, Calls, and Charges**: Similar to domestic calls, international calls and charges can offer insights into specialized customer needs and their satisfaction level.

**Customer Service Calls**: The number of calls made to customer service is an important indicator. Frequent calls might signify unresolved issues, which could contribute to churn.

**Churn**: This binary label indicates whether a customer has churned or not. This is the target variable that we aim to predict and understand the contributing factors.


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for cols in telecome_df.columns:
  print(cols , telecome_df[cols].unique())
  print()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***