## Telco Customer Churn 

### Exploratory Data Analysis (EDA) 

### Contents: 
1. Installing Dependencies 
2. Import Libraries 
3. Overview of the Dataset 
4. Data Cleaning 
5. Exploratory Data Analysis (EDA) 
<br></br>
**Note:**<br> 
This notebook presents a focus on the EDA portion of the workflow to predict customer churn using the Telco Customer Churn dataset. <br> </br>
The notebook *customer_churn_ml.ipynb* focuses on the data preprocessing, feature engineering, pipeline and model development, and modeling and evaluation portion of the workflow. <br>
The notebook *customer_churn_end_to_end.ipynb* contains the complete workflow, which includes code for saving the visualization figures to the **Visuals/** folder and the developed models to the **Models/** folder.  
<br></br> 



#### Installing Dependencies

In [None]:
# Installing dependencies via `requirements.txt`
%pip install -r ../requirements.txt 
# If installing packages in a fresh environment, uncomment the following line 
#%pip install pandas numpy matplotlib seaborn scikit-learn imbalanced-learn shap joblib


Collecting pathlib (from -r ../requirements.txt (line 5))
  Downloading pathlib-1.0.1-py3-none-any.whl.metadata (5.1 kB)
Downloading pathlib-1.0.1-py3-none-any.whl (14 kB)
Installing collected packages: pathlib
Successfully installed pathlib-1.0.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


#### Importing Libraries

In [None]:
# Importing libraries 
import os 
import warnings 
warnings.filterwarnings('ignore') 

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 
from pathlib import Path 


In [None]:
# Setting random_state for reproducibility 
random_state = 42 
np.random.seed(random_state) 


In [None]:
# Defining folder paths 
# Current directory --> Notebooks/ 
data_path = Path('../Data/telco_churn_data.csv').resolve() 
base_data_path = Path('../Data').resolve() 
visuals_path = Path('../Visuals').resolve()  
models_path = Path('../Models').resolve() 

visuals_path.mkdir(parents=True, exist_ok=True) 
models_path.mkdir(parents=True, exist_ok=True) 

print("Data Path --> ", data_path) 
print("Base Data Path -- ", base_data_path) 
print("Visuals Path --> ", visuals_path) 
print("Models Path --> ", models_path) 

# Function to save data file(s) to Data/ folder 
# Not used in this notebook --> Used in complete End-to-End notebook 
def save_data(data: pd.DataFrame, file_name: str): 
    os.makedirs(base_data_path, exist_ok=True) 
    file_path = os.path.join(base_data_path, f"{file_name}.csv") 
    data.to_csv(file_path, index=False) 
    print(f"Saved -> {file_path}") 




In [None]:
# Plotting settings 
sns.set_style('whitegrid') 
plt.rcParams['figure.figsize'] = (10,6) 


#### Overview of the Data

In [None]:
# Loading data 

# file path check 
if not data_path.exists(): 
    raise FileNotFoundError(f"Expected data at {data_path}.") 

telco_data = pd.read_csv(data_path) 
telco_data.head() 
print("\nData Info: ")
print(telco_data.info())  



The Telco Customer Churn dataset contains information about customers of a telcom company. <br> 
Each row in the dataset represents a customer. 
<br></br> 

##### **Dataset Columns:** 
- **customerID**: Unique customer ID 
- **gender**: Whether the customer is male or female 
- **SeniorCitizen**: Whether the customer is a senior citizen or not 
- **Partner**: Whether the customer has a partner or not 
- **Dependents**: Whether the customer has dependents or not 
- **tenure**: Number of months the customer has stayed with the company 
- **PhoneService**: Whether the customer has a phone service or not 
- **MultipleLines**: Whether the customer has multiple lines (phone service) or not 
- **InternetService**: Customer's internet service provider 
- **OnlineSecurity**: Whether the customer has online security (service) or not
- **OnlineBackup**: Whether the customer has online backup (service) or not 
- **DeviceProtection**: Whether the customer has device protection (service) or not 
- **TechSupport**: Whether the customer has tech support (service) or not 
- **StreamingTV**: Whether the customer has streaming TV (service) or not 
- **StreamingMovies**: Whether the customer has streaming movies (service) or not 
- **Contract**: The contract term of the customer 
- **PaperlessBilling**: Whether the customer has paperless billing or not 
- **PaymentMethod**: The customer's payment method 
- **MonthlyCharges**: The amount charged to the customer monthly 
- **TotalCharges**: The total amount charged to the customer 
- **Churn**: (Target variable) Whether the customer churned or not 




#### Data Cleaning

In [None]:
data = telco_data.copy() 

# Data checks 
print("\nMissing values: ") 
print(data.isna().sum()) 

# Checking if there are duplicate customerID values 
print("\nAll unique customerID --> ", data['customerID'].nunique() == len(data)) 

# Converting TotalCharges to numeric 
if data['TotalCharges'].dtype == 'object': 
    data['TotalCharges'] = pd.to_numeric(data['TotalCharges'], errors='coerce') 
    print("\nTotalCharges NA count (After numeric conversion): ", data['TotalCharges'].isna().sum())

# Checking if any rows have tenure == 0 & TotalCharges == NaN 
# Possible for new customers 
new_cust = (data['tenure'] == 0) & (data['TotalCharges'].isna()) 
print("\nRows with tenure == 0 & TotalCharges == NaN : ", new_cust.sum()) 


In [None]:
# Changing TotalCharges to 0 if tenure==0 (Assumed it is a new customer) 
# If any TotalCharges==NaN remain these will be handled later 
data.loc[(data['tenure']==0) & (data['TotalCharges'].isna()), 'TotalCharges'] = 0.0 
print("Remaining NaN TotalCharges values after fix: ", data['TotalCharges'].isna().sum()) 


In [None]:
# Encoding the SeniorCitizen column to match the other columns (converting 0/1 to labels) 
data['SeniorCitizen'] = data['SeniorCitizen'].map({0:'No', 1:'Yes'}) 
print(data['SeniorCitizen'].head())


In [None]:
# Saving cleaned data to Data/ folder 
#save_data(data, "cleaned_raw_data") 
