# Customer Churn Prediction + Retention Optimization\n\nThis notebook executes the full workflow: EDA, feature engineering, modeling (LogReg/RF/XGBoost), SHAP explainability, and ROI simulation.

## 1) Load Dataset + Quick Checks\n**Insights to verify while running:**\n- Dataset is binary imbalanced (churn minority class).\n- `TotalCharges` may contain blank strings and should be converted to numeric.\n- Contract type and tenure are strong churn signals.

In [None]:
import pandas as pd\nimport numpy as np\nimport seaborn as sns\nimport matplotlib.pyplot as plt\n\ndf = pd.read_csv('../data/WA_Fn-UseC_-Telco-Customer-Churn.csv')\ndf['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')\ndf.head()

In [None]:
df.isna().sum().sort_values(ascending=False).head(10), df['Churn'].value_counts(normalize=True)

## 2) EDA\n**Markdown insights (important):**\n- Customers on month-to-month contracts show visibly higher churn rates than annual contracts.\n- Higher monthly charges correlate with churn risk, especially when tenure is low.\n- Longer tenure cohorts are substantially more stable and should be segmented separately in retention campaigns.

In [None]:
plt.figure(figsize=(6,4))\nsns.heatmap(df[['tenure','MonthlyCharges','TotalCharges','SeniorCitizen']].corr(), annot=True, cmap='Blues')\nplt.title('Correlation Heatmap')\nplt.show()

In [None]:
sns.histplot(df, x='MonthlyCharges', hue='Churn', kde=True, bins=30)\nplt.title('MonthlyCharges Distribution by Churn')\nplt.show()

## 3) Data Cleaning + Feature Engineering

In [None]:
df['tenure_bucket'] = pd.cut(df['tenure'], bins=[0,12,24,48,72], labels=['0-12','13-24','25-48','49-72'], include_lowest=True)\ndf['monthly_to_tenure_ratio'] = df['MonthlyCharges'] / np.maximum(df['tenure'], 1)\ndf['contract_risk_level'] = df['Contract'].map({'Month-to-month':'High','One year':'Medium','Two year':'Low'})\ndf['Churn'] = (df['Churn'] == 'Yes').astype(int)

## 4) Modeling (LogReg, RF, XGBoost)\nRun `python ../src/train.py` to train all models and save metrics + artifacts.

## 5) SHAP Explainability\n- Global importance: `reports/figures/shap_summary.png`\n- Single customer explanation: `reports/figures/shap_force_single_customer.png`\n\n**Business interpretation template:**\n- This customer is predicted to churn mainly due to short tenure, high monthly charges, and high-risk contract type (month-to-month).\n- Retention actions should prioritize plan redesign and contract upgrades for this segment.

## 6) ROI Simulation\n- Target top 20% customers by predicted churn probability\n- Incentive = $20\n- Avg revenue = $70\n\n`python ../src/train.py` saves the final cost-benefit summary to `reports/roi_summary.json`.