# Customer Churn Analysis - Exploratory Data Analysis

This notebook performs comprehensive exploratory data analysis on customer churn data.

## Table of Contents
1. Data Loading
2. Data Overview
3. Data Quality Check
4. Univariate Analysis
5. Bivariate Analysis
6. Multivariate Analysis
7. Feature Correlations
8. Key Insights

## 1. Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os

# Add src directory to path
sys.path.append('../src')
sys.path.append('..')

import config
from visualization import ChurnVisualizer
from data_preprocessing import DataPreprocessor
from feature_engineering import FeatureEngineer

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

# Set plot style
plt.style.use('seaborn')
sns.set_palette('husl')

%matplotlib inline

## 2. Load Data

In [None]:
# Load the dataset
# data_path = os.path.join(config.RAW_DATA_DIR, 'customer_data.csv')
# df = pd.read_csv(data_path)

# For now, create a sample dataset
print("To use this notebook:")
print("1. Place your customer data CSV file in the data/raw/ directory")
print("2. Update the data_path variable above")
print("3. Run the cells to perform analysis")

# Uncomment below to see data shape and first few rows
# print(f"Dataset shape: {df.shape}")
# df.head()

## 3. Data Overview

In [None]:
# Display basic information about the dataset
# df.info()

# Display statistical summary
# df.describe()

## 4. Data Quality Check

In [None]:
# Check for missing values
# missing_values = df.isnull().sum()
# missing_percent = (missing_values / len(df)) * 100
# missing_df = pd.DataFrame({
#     'Missing_Count': missing_values,
#     'Missing_Percent': missing_percent
# })
# missing_df[missing_df['Missing_Count'] > 0].sort_values('Missing_Count', ascending=False)

## 5. Churn Distribution

In [None]:
# Visualize churn distribution
# visualizer = ChurnVisualizer()
# visualizer.plot_churn_distribution(df['Churn'])

## 6. Feature Analysis

In [None]:
# Analyze numerical features
# numerical_features = df.select_dtypes(include=[np.number]).columns.tolist()
# print(f"Numerical features: {numerical_features}")

# Analyze categorical features
# categorical_features = df.select_dtypes(include=['object']).columns.tolist()
# print(f"Categorical features: {categorical_features}")

## 7. Correlation Analysis

In [None]:
# Plot correlation matrix
# numerical_df = df.select_dtypes(include=[np.number])
# visualizer.plot_correlation_matrix(numerical_df)

## 8. Key Insights

Based on the exploratory analysis, document key findings here:

1. **Churn Rate**: [To be filled after analysis]
2. **Key Features**: [To be filled after analysis]
3. **Data Quality Issues**: [To be filled after analysis]
4. **Recommendations**: [To be filled after analysis]

## Next Steps

1. Data preprocessing and cleaning
2. Feature engineering
3. Model training and evaluation
4. Model deployment