<a href="https://colab.research.google.com/github/yogesh1199/Projects/blob/main/EDA_Submission.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **TELECOM CHURN ANALYSIS**



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member** -*Yogesh Sharma*

# **Project Summary -**

Write the summary here within 500-600 words.

**The Orange Telecom Churn Dataset** provides valuable insights into customer behavior and factors contributing to churn in the telecommunications industry. This analysis aims to uncover key patterns, identify potential reasons for customer churn, and propose recommendations to enhance customer retention strategies.

The dataset comprises several important features that offer a comprehensive view of customer activity and behavior:

**State and Area Code**: These columns represent the geographic location of the customers. While state information might indicate regional variations, area codes could provide insights into localized patterns.

**Account Length**: This feature represents the duration for which a customer has subscribed to the service. Longer account lengths might indicate customer loyalty and satisfaction.

**International Plan and Voice Mail Plan**: These categorical features highlight whether customers have subscribed to additional services. Customers with an international plan might be less likely to churn due to having specific needs that the plan addresses. On the other hand, the voice mail plan might indicate engagement and communication.

**Number of Voice Mail Messages**: This metric indicates how actively customers engage with voice mail services. Higher engagement might indicate better communication and satisfaction.

**Total Day, Evening, and Night Minutes, Calls, and Charges**: These metrics reflect the extent of customer engagement at different times of the day. Unusually high charges or call volumes might indicate dissatisfaction.

**Total International Minutes, Calls, and Charges**: Similar to domestic calls, international calls and charges can offer insights into specialized customer needs and their satisfaction level.

**Customer Service Calls**: The number of calls made to customer service is an important indicator. Frequent calls might signify unresolved issues, which could contribute to churn.

**Churn**: This binary label indicates whether a customer has churned or not. This is the target variable that we aim to predict and understand the contributing factors.


Analyzing this data requires a multi-faceted approach, involving exploratory data analysis and predictive modeling:

**Exploratory Data Analysis (EDA)**:
By visually exploring distributions, correlations, and trends within the data, we can uncover valuable insights. For instance, we could visualize the distribution of customer service calls and their relationship with churn. EDA might reveal that customers with more service calls are more likely to churn.

# **GitHub Link -**

Provide your GitHub Link here.

https://github.com/yogesh1199/Projects

# **Problem Statement**


**The Orange Telecom Churn Dataset** provides valuable insights into customer behavior and factors contributing to churn in the telecommunications industry. Churn, in this context, refers to the phenomenon where customers cancel their subscriptions to the telecom service. This analysis aims to uncover key patterns, identify potential reasons for customer churn, and propose recommendations to enhance customer retention strategies.

The dataset comprises a variety of columns that offer insights into customer activity and behavior, including state, account length, area code, international plan, voice mail plan, usage metrics during different times of the day and night, customer service calls, and the churn status.

The goal of this analysis is to understand the factors that contribute to customer churn and provide actionable insights to the telecom company. By analyzing the dataset, we will explore correlations between different features and the likelihood of churn.



#### **Define Your Business Objective?**

we will Perform Exploratory Data Analysis (EDA) to understand the distribution of features, identify patterns, and visualize relationships between variables.
Identify correlations and patterns associated with churn. Which features are strongly correlated with churn?

we will provide the telecom company with insights that can guide their business decisions, customer communication strategies, and overall customer experience enhancement.

This analysis will not only contribute to reducing customer churn but also help the company tailor its services to meet the needs of its customers more effectively.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
## added github raw link for dataset so that it could not ask for permission unlike in google drive
url = "https://raw.githubusercontent.com/yogesh1199/Projects/main/Telecom%20Churn.csv"
telecome_df = pd.read_csv(url)

### Dataset First View

In [None]:
# Dataset First Look
## featching top 10 rows for dataset
telecome_df.head(10)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
data_info = telecome_df.shape
rows = data_info[0]
cols = data_info[1]
print("data shape",data_info)
print("NO. Of rows: ",rows)
print("NO. Of cols: ",cols)

### Dataset Information

In [None]:
# Dataset Info
## Here, we are retrieving information related to checking the existence of null values and obtaining information about the data types of each and every column.
telecome_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows = telecome_df[telecome_df.duplicated(keep=False)]
if len(duplicate_rows) > 0:
  duplicate_rows
else:
  print("No duplicate values in Dataset")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
telecome_df.isna().sum()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 6))
plt.imshow(telecome_df.isnull(), cmap='viridis', aspect='auto')
plt.xticks(range(len(telecome_df.columns)), telecome_df.columns, rotation=90)
plt.colorbar(label='Missing Values')
plt.title('Missing Value Heatmap')

plt.show()

### What did you know about your dataset?

Based on the information we fetch from dataset, here's what we can infer about it:

- The dataset contains 3333 entries (rows) with 22 columns.
- The columns in the dataset are named: "State," "Account length," "Area code," "International plan," "Voice mail plan," "Number vmail messages," "Total day minutes," "Total day calls," "Total day charge," "Total eve minutes," "Total eve calls," "Total eve charge," "Total night minutes," "Total night calls," "Total night charge," "Total intl minutes," "Total intl calls," "Total intl charge," "Customer service calls," and "Churn." which contains Non Null values.
- The dataset include information about telecom customer activities and behavior, including usage metrics, plan features, and whether a customer has churned.
- The data types of the columns include boolean (`bool`), integer (`int64`), float (`float64`), and object (`object`).
- The "Churn" column appears to be of boolean data type (`bool`), which indicate whether a customer has churned (`True`) or not (`False`).

Based on the information retreived, the dataset is related to telecom customer behavior analysis, particularly regarding factors that influence customer churn. The dataset includes various features related to customer activity and plan details, and the "Churn" column  serves as the target variable to predict customer churn.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
telecome_df.columns

In [None]:
# Dataset Describe
telecome_df.describe()

### Variables Description

The dataset comprises several important features that offer a comprehensive view of customer activity and behavior:

**State and Area Code**: These columns represent the geographic location of the customers. While state information might indicate regional variations, area codes could provide insights into localized patterns.

**Account Length**: This feature represents the duration for which a customer has subscribed to the service. Longer account lengths might indicate customer loyalty and satisfaction.

**International Plan and Voice Mail Plan**: These categorical features highlight whether customers have subscribed to additional services. Customers with an international plan might be less likely to churn due to having specific needs that the plan addresses. On the other hand, the voice mail plan might indicate engagement and communication.

**Number of Voice Mail Messages**: This metric indicates how actively customers engage with voice mail services. Higher engagement might indicate better communication and satisfaction.

**Total Day, Evening, and Night Minutes, Calls, and Charges**: These metrics reflect the extent of customer engagement at different times of the day. Unusually high charges or call volumes might indicate dissatisfaction.

**Total International Minutes, Calls, and Charges**: Similar to domestic calls, international calls and charges can offer insights into specialized customer needs and their satisfaction level.

**Customer Service Calls**: The number of calls made to customer service is an important indicator. Frequent calls might signify unresolved issues, which could contribute to churn.

**Churn**: This binary label indicates whether a customer has churned or not. This is the target variable that we aim to predict and understand the contributing factors.


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for cols in telecome_df.columns:
  print(cols , telecome_df[cols].unique())
  print()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
## __ NOT REQUIRED __

### What all manipulations have you done and insights you found?

After analysing the data we found that no manipulations are required as we can go starightforward with data

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
churn_count = telecome_df['Churn'].value_counts()


ax = sns.countplot(x = telecome_df.Churn)
## calulating the %ages of each annotation
total = len(telecome_df)
percentages = [(count / total) * 100 for count in churn_count]

# Annotate bars with percentages
for i, p in enumerate(ax.patches):
    height = p.get_height()
    ax.annotate(f'{percentages[i]:.2f}%', (p.get_x() + p.get_width() / 2., height), ha='center', va='bottom')

plt.xlabel('Churn')
plt.ylabel('customers')
plt.title('Customer Churn Plot')
plt.show()


##### 1. Why did you pick the specific chart?

The selection of a specific chart,countplot to visualize how many coustomers are associated with orange telecome.


##### 2. What is/are the insight(s) found from the chart?

After visualisation we found that 85.51% of users have deactivated there account and 14.49% of users have remain activated in Orange Telecome

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the visualization will provide impact on business as follow:

**Positive Impact:**  Identifying a high churn rate of 85.51% allows for targeted retention strategies, improved customer experiences, and enhanced customer engagement to boost loyalty.

**Negative Impact:**  Such a substantial churn rate could damage reputation, result in revenue loss, create competitive disadvantages, and strain operations, necessitating comprehensive strategies for sustained growth

#### Chart - 2

In [None]:
# Chart - 2 visualization code
## data visualization with states and churn

grouped = telecome_df.groupby(['State', 'Churn']).size().reset_index(name='Count')
plt.figure(figsize=(25,10))
sns.barplot(data=grouped,x='State',y='Count',hue='Churn')

plt.xlabel('State',fontsize= 25)
plt.ylabel('Churn Count',fontsize= 25)
plt.title('State vs Churn Plot',fontsize= 25)
plt.show()


##### 1. Why did you pick the specific chart?

The selection of a specific chart, like a bar plot, is based on its suitability for visualizing categorical data (states) and comparing the distribution of a categorical variable (churn status). This choice effectively represents counts, allows easy comparison, and uses color to differentiate categories, ensuring clear and concise communication of insights

##### 2. What is/are the insight(s) found from the chart?

From the chart depicting the distribution of churn status across different states

The churn status varies across different states. For instance, the highest churn appears to be in NJ, while WV has a relatively lower churn rate.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Impact:** The insights gained about varying churn rates across states can drive targeted strategies for customer retention, potentially leading to reduced churn and improved customer loyalty.

**Negative Impact:** While insights provide targeted strategies, focusing solely on states with high churn might divert resources from states with lower churn, potentially affecting growth opportunities in those regions. A balanced approach is needed to avoid neglecting regions with growth potential.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
## Area Code VS Churn

grouped = telecome_df.groupby(['Area code', 'Churn']).size().reset_index(name='Count')
plt.figure(figsize=(6,10))
sns.barplot(data=grouped,x='Area code',y='Count',hue='Churn')

plt.xlabel('Area Code',fontsize= 25)
plt.ylabel('Churn Count',fontsize= 25)
plt.title('Area Code vs Churn',fontsize= 25)
plt.show()


##### 1. Why did you pick the specific chart?

The selection of a specific chart, like a bar plot, is based on its suitability for visualizing categorical data Area Code and comparing the distribution of a categorical variable (churn status). This choice effectively represents counts, allows easy comparison, and uses color to differentiate categories, ensuring clear and concise communication of insights

##### 2. What is/are the insight(s) found from the chart?

The churn status varies across different Area Codes. For instance, the highest churn appears to be in area code 415, while area code 510 has a relatively lower churn rate.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Impact:** The insights gained about varying churn rates across Area code can drive targeted strategies for customer retention, potentially leading to reduced churn and improved customer loyalty.

**Negative Impact:** While insights provide targeted strategies, focusing solely on Area Code with high churn might divert resources from states with lower churn, potentially affecting growth opportunities in those regions. A balanced approach is needed to avoid neglecting regions with growth potential.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
## Area Code VS International Plan

grouped = telecome_df.groupby(['Area code', 'International plan', 'Churn']).size().reset_index(name='Count')
##sns.barplot(data=grouped, x='Area code', y='Count', hue='International plan')

# Create a pivot table for pie chart
pivot = grouped.pivot_table(index=['Area code', 'International plan'], columns='Churn', values='Count', fill_value=0)
pivot


In [None]:
# Create subplots for True and False churn
fig, axes = plt.subplots(1, 2, figsize=(12, 6))

# Plot for True churn
axes[0].pie(pivot[True], labels=pivot.index, autopct='%1.1f%%', startangle=90)
axes[0].set_title('Churned (True)')
axes[0].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

# Plot for False churn
axes[1].pie(pivot[False], labels=pivot.index, autopct='%1.1f%%', startangle=90)
axes[1].set_title('Not Churned (False)')
axes[1].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.suptitle('Churn Distribution by Area Code and International Plan (Subplots)')
##plt.legend(title='Churn', labels=['Churned', 'Not Churned'], loc='upper right')

plt.show()

##### 1. Why did you pick the specific chart?


The choice of using pie charts in this context was made to visually represent the proportion of churned and not churned customers across different combinations of area codes and international plans. Pie charts are effective for showing parts of a whole, making it easy to compare the distribution of churn statuses within each category. However, when comparing multiple categories, it's important to consider potential limitations in conveying precise comparisons due to the circular nature of pie charts

##### 2. What is/are the insight(s) found from the chart?


The insights from the chart reveal that in cases of "True" churn, the area code 415 has a higher proportion of "Yes" for international plans (12.8%) compared to the other area codes. Additionally, the relatively lower proportion of "No" international plans in area code 415 (36.9%) might suggest that international plans could be a contributing factor to churn in that area. In contrast, for "False" churn, area code 415 has a higher proportion of "No" international plans (46.7%), indicating that customers without international plans are less likely to churn in that area. This suggests that area code 415 could benefit from targeted strategies to address churn based on international plan preferences.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
## Area Code VS International Plan

grouped = telecome_df.groupby(['Area code', 'Voice mail plan', 'Churn']).size().reset_index(name='Count')
##sns.barplot(data=grouped, x='Area code', y='Count', hue='International plan')

# Create a pivot table for pie chart
pivot = grouped.pivot_table(index=['Area code', 'Voice mail plan'], columns='Churn', values='Count', fill_value=0)
pivot

In [None]:
# Create subplots for True and False churn
fig, axes = plt.subplots(1, 2, figsize=(12, 6))

# Plot for True churn
axes[0].pie(pivot[True], labels=pivot.index, autopct='%1.1f%%', startangle=90)
axes[0].set_title('Churned (True)')
axes[0].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

# Plot for False churn
axes[1].pie(pivot[False], labels=pivot.index, autopct='%1.1f%%', startangle=90)
axes[1].set_title('Not Churned (False)')
axes[1].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.suptitle('Churn Distribution by Area Code and Voice mail plan (Subplots)')
##plt.legend(title='Churn', labels=['Churned', 'Not Churned'], loc='upper right')

plt.show()

##### 1. Why did you pick the specific chart?


The choice of using pie charts in this context was made to visually represent the proportion of churned and not churned customers across different combinations of area codes and international plans. Pie charts are effective for showing parts of a whole, making it easy to compare the distribution of churn statuses within each category. However, when comparing multiple categories, it's important to consider potential limitations in conveying precise comparisons due to the circular nature of pie charts

##### 2. What is/are the insight(s) found from the chart?

The insights from the chart reveal that in cases of "True" churn, the area code 415 has a higher proportion of "Yes" for voice mail plans (7.0%) compared to the other area codes. Additionally, the relatively lower proportion of "No" international plans in area code 415 (41.8%)  suggest that Voice mail plans could be a contributing factor to churn in that area. In contrast, for "False" churn, area code 415 has a higher proportion of "No" voice mail plans (34.5%), indicating that customers without international plans are less likely to churn in that area. This suggests that area code 415 could benefit from targeted strategies to address churn based on voice mail plan preferences.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
fig, axes = plt.subplots(1, 2, figsize=(12, 6), sharey=True)
churn_val = telecome_df[telecome_df['Churn'] == True]
non_churn_val = telecome_df[telecome_df['Churn'] != True]

# Bar plot for churned customers
churn_val = telecome_df[telecome_df['Churn'] == True]
churn_counts = churn_val['Customer service calls'].value_counts().sort_index()
axes[0].bar(churn_counts.index, churn_counts.values, color='red', alpha=0.7)
axes[0].set_title('Bar Plot of Customer Service Calls (Churned)')
axes[0].set_xlabel('Number of Calls')
axes[0].set_ylabel('Frequency')

# Bar plot for non-churned customers
non_churn_val = telecome_df[telecome_df['Churn'] != True]
non_churn_counts = non_churn_val['Customer service calls'].value_counts().sort_index()
axes[1].bar(non_churn_counts.index, non_churn_counts.values, color='green', alpha=0.7)
axes[1].set_title('Bar Plot of Customer Service Calls (Non-Churned)')
axes[1].set_xlabel('Number of Calls')
axes[1].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The selection of a specific chart, like a bar plot, is based on its suitability for visualizing categorical data (No of Calls) and comparing the distribution of a categorical variable (churn status). This choice effectively represents counts, allows easy comparison, and uses color to differentiate categories, ensuring clear and concise communication of insights

##### 2. What is/are the insight(s) found from the chart?

When we explore customer behavior in churn analysis, a clear trend emerges: for those who churn, the most frequent number of service calls is '1', while '9' sees the fewest interactions. Similarly, among customers who stay, '1' again stands out as the most common, with '8' being the least frequent. These patterns paint a straightforward picture of customer engagement, shedding light on the connection between service calls and churn outcomes

#### Chart - 7

In [None]:
# Chart - 7 visualization code
cname = ['Account length', 'Number vmail messages', 'Total day minutes',
       'Total day calls', 'Total day charge', 'Total eve minutes',
       'Total eve calls', 'Total eve charge', 'Total night minutes',
       'Total night calls', 'Total night charge', 'Total intl minutes',
       'Total intl calls', 'Total intl charge', 'Customer service calls']

# Plotting side-by-side Box Plots
num_cols = len(cname)
num_rows = (num_cols + 1) // 2  # Adjust for odd number of columns

fig, axes = plt.subplots(num_rows, 2, figsize=(15, 20))

for i, column in enumerate(cname):
    row_idx = i // 2
    col_idx = i % 2
    sns.boxplot(x=telecome_df[column], palette="deep", ax=axes[row_idx, col_idx])
    axes[row_idx, col_idx].set_title(column)
    axes[row_idx, col_idx].set_xlabel(column)

# Adjust layout
plt.tight_layout(pad=3)
plt.show()

#### Chart - 8 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
f, ax = plt.subplots(figsize=(18,12))  #Width,height

#Generating Corelation Matrix
corr = telecome_df[cname].corr()

#Plot using Seaborn library
sns.heatmap(corr,mask=np.zeros_like(corr, dtype=np.bool), cmap=sns.diverging_palette(220,10, as_cmap=True),\
            square=True, ax=ax,annot=True,linewidths=1 , linecolor= 'black',vmin = -1, vmax = 1)

plt.show()

#### Chart - 9 - Pair Plot

In [None]:
# Pair Plot visualization code

selected_features = ['Total day calls', 'Total eve calls', 'Total night calls', 'Total intl calls','Churn']
sns.set(style='ticks')
sns.pairplot(telecome_df[selected_features], kind='scatter', diag_kind='hist',hue='Churn')

plt.show()

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the churn analysis conducted on the dataset, there are several key insights that can guide the client's actions to achieve their business objective:

**International Plan Impact:** Customers with an international plan seem to have a notable impact on churn. It's essential for the client to focus on understanding the reasons behind this trend. Initiating targeted outreach to customers with international plans and addressing their specific needs could help in retention efforts.

**Customer Service Calls**: The frequency of customer service calls plays a role in churn. The analysis reveals that those who churn have a tendency to make more customer service calls. The client can use this insight to identify the reasons behind such calls and proactively address issues to reduce churn risk.

**Area Code Insights:** The relationship between area codes, international plans, and churn is noteworthy. Targeted strategies can be implemented for specific area codes where churn is prevalent. This might include optimizing international plan offerings, customer support, or addressing unique regional challenges.

**Engagement Metrics:** Observing engagement metrics like the number of voice mail messages and total minutes of usage during different times of the day can provide insights into customer satisfaction and loyalty. The client should focus on maintaining high engagement levels through effective communication and service.

**Non-Churn Patterns:** Understanding the behaviors and attributes of non-churned customers is equally important. The client can leverage these insights to identify and reward loyal customers, potentially inspiring retention and advocacy efforts.

**Customer Segmentation:** Segmenting customers based on their usage patterns, service preferences, and demographics can lead to targeted retention strategies. By tailoring offerings to specific customer segments, the client can enhance customer satisfaction and loyalty.

**Pricing Strategies:** The analysis of charges and their impact on churn can guide the client in optimizing their pricing strategies. Ensuring that charges align with perceived value can help in reducing price-related churn.

**Communication Strategy:** Effective communication can play a pivotal role in customer retention. The client can use the insights gained from the analysis to craft personalized communication strategies that address customer needs, provide value, and maintain positive relationships.

**Continuous Monitoring:** Churn analysis is an ongoing process. The client should establish a system to continuously monitor churn metrics and customer behavior. This will allow them to adapt strategies based on evolving trends and maintain a proactive approach to customer retention.

# **Conclusion**


Through deep churn analysis, we've uncovered insights for the telecom industry. Strategies targeting customers with international plans, efficient customer service, localized approaches, engagement enhancement, personalized communication, and price alignment are key to reducing churn. Ongoing monitoring ensures adaptability for long-term growth and customer satisfaction.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***