# Model Comparison and Explainability

This notebook provides a comparative analysis of the machine learning models
used for customer churn prediction. The models are evaluated in terms of
predictive performance, interpretability, and practical applicability.

**1. Overview of Evaluated Models**

## Overview of Evaluated Models

Three different classification models were evaluated in this project:

- **Logistic Regression** – a linear and highly interpretable baseline model.
- **Gradient Boosting** – a non-linear ensemble method capable of capturing
  complex feature interactions.
- **Random Forest** – an ensemble of decision trees focused on stability and
  variance reduction.

Each model was trained using the same preprocessing pipeline and data split to
ensure a fair comparison.   

**2. Summary of Model Performance**

| Model                | ROC-AUC | Recall (Churn) | Accuracy | Interpretability |
|----------------------|--------:|---------------:|---------:|------------------|
| Logistic Regression  | 0.842   | 0.56           | 0.81     | High             |
| Gradient Boosting    | 0.846   | 0.51           | 0.80     | Medium           |
| Random Forest        | 0.829   | 0.49           | 0.80     | Medium           |

ROC-AUC is treated as the primary evaluation metric due to class imbalance,
as it reflects the model’s ability to rank customers by churn risk.

**3. Model-Specific Explainability**

3.1 Logistic Regression

### Logistic Regression

Logistic Regression offers direct interpretability through model coefficients.
Each coefficient represents the direction and relative strength of a feature’s
association with churn probability, assuming all other features are held constant.

This model provides clear insights into key churn drivers such as contract type,
customer tenure, and pricing-related variables, making it especially suitable
for business communication and transparency.

3.2 Gradient Boosting

### Gradient Boosting

Gradient Boosting captures non-linear relationships and feature interactions
that are not accessible to linear models. This results in the highest ROC-AUC
among the evaluated models, indicating superior ranking performance.

However, the model is less transparent than Logistic Regression. Feature
importance scores provide a high-level view of influential variables, but
individual predictions are harder to interpret directly.

3.3 Random Forest

### Random Forest

Random Forest aggregates predictions from multiple decision trees to improve
stability and reduce variance. In this project, the model demonstrates strong
performance on the majority (non-churn) class but exhibits reduced sensitivity
to churned customers.

Despite its ability to model complex patterns, Random Forest does not outperform
the other models in terms of ROC-AUC or churn recall in this dataset.

**4. Comparative Discussion**

## Comparative Discussion

The results highlight a clear trade-off between interpretability and predictive
power.

Logistic Regression is the most interpretable model and provides valuable
business insights, but its linear nature limits its ability to capture complex
churn dynamics. Random Forest does not improve upon the baseline performance and
shows the lowest recall for churned customers.

Gradient Boosting achieves the best overall discriminative performance, as
measured by ROC-AUC, suggesting that non-linear patterns and feature interactions
play a meaningful role in customer churn behavior.

**5. Final Model Selection**

## Final Model Selection

Considering both predictive performance and practical applicability, Gradient
Boosting is selected as the preferred model for churn prediction in this project.

Logistic Regression remains an essential baseline due to its transparency and
interpretability, serving as a reliable reference for understanding churn
drivers and communicating results to non-technical stakeholders.

**6. Limitations and Future Work**

## Limitations and Future Work

This analysis focuses on baseline configurations of each model without extensive
hyperparameter tuning. Additionally, model decisions were evaluated at the
default classification threshold.

Future work may include:
- Threshold tuning to optimize churn recall based on business objectives.
- Advanced explainability techniques (e.g., SHAP) for non-linear models.
- Cost-sensitive evaluation to better reflect real-world retention strategies.

**7. Conclusion**

## Conclusion

This project demonstrates a structured and comparative approach to customer
churn prediction, emphasizing not only model performance but also interpretability
and decision-making trade-offs.

By combining transparent baselines with more expressive ensemble methods, the
analysis provides a balanced and practical framework for selecting machine
learning models in real-world churn prediction tasks.