# Loan Approval Classification - Technical Report


## Objective

The goal of this project is to predict **loan approval status** (`loan_status`) using a classification pipeline that handles:
- Data preprocessing (log transforms, scaling, encoding),
- Class imbalance (undersampling),
- Ensemble learning (Voting Classifier),
- Probability calibration,
- Threshold tuning for optimal F1-score.



## Dataset

The dataset consists of records containing applicant information and loan details. The target variable is:
- `loan_status`: Binary classification (1 = approved, 0 = rejected)

### Features used:
- **Numerical:** `person_age`, `person_emp_exp`, `cb_person_cred_hist_length`, `credit_score`
- **Log-transformed:** `person_income`, `loan_amnt`
- **Categorical:** All object-type columns (e.g., `loan_intent`, `loan_grade`, etc.)



## Modeling Pipeline

1. **Preprocessing**
   - `person_income` and `loan_amnt` were log-transformed and scaled.
   - Other numerical features were standardized.
   - Categorical features were one-hot encoded using `OneHotEncoder`.

2. **Imbalance Handling**
   - Used `RandomUnderSampler` to balance the majority and minority classes.

3. **Modeling**
   - Employed a **VotingClassifier** (soft voting) combining:
     - `DecisionTreeClassifier`
     - `LogisticRegression`
     - `RandomForestClassifier`
   - All models use `class_weight='balanced'`.

4. **Calibration**
   - Applied **Isotonic Calibration** via `CalibratedClassifierCV` to improve probability estimates.

5. **Threshold Tuning**
   - Evaluated different probability thresholds to optimize **F1-score** using `precision_recall_curve`.
   - Selected the best threshold based on maximum F1.



## Experiment Tracking with MLflow

All model training steps and evaluation metrics were tracked using MLflow, including:

- Model parameters (e.g., max depth, regularization strength, class weights)
- Preprocessing steps (e.g., log-transformed features, encoding type)
- Performance metrics (Accuracy, Precision, Recall, F1, AUC, Brier Score)
- Calibrated model and best-found threshold
- Logged the trained pipeline as a reproducible MLflow model artifact

This allows for:
- Reproducibility of experiments
- Easy comparison of different model runs
- Deployment-ready model tracking


## Evaluation

Two evaluation settings were compared:

### Threshold = 0.5 (Default)

| Class | Precision | Recall | F1-score |
|-------|-----------|--------|----------|
| 0     | 0.97      | 0.81   | 0.88     |
| 1     | 0.58      | 0.92   | 0.71     |

- **Accuracy:** 0.83  
- **Macro Avg F1:** 0.80  
- **Weighted Avg F1:** 0.84  

=> Very high **recall** on class `1`, but many false positives (lower precision).

---

### Threshold = Best F1 (≈ 0.57)

| Class | Precision | Recall | F1-score |
|-------|-----------|--------|----------|
| 0     | 0.92      | 0.92   | 0.92     |
| 1     | 0.74      | 0.74   | 0.74     |

- **Accuracy:** 0.88  
- **Macro Avg F1:** 0.83  
- **Weighted Avg F1:** 0.88  

=> Balanced performance between precision and recall across both classes.



##  Additional Metrics

- **AUC-ROC:** `~0.94`
- **Average Precision (PR AUC):** `~0.88`
- **Brier Score:** `0.1004`
  - Interpretation: Low score → good probability calibration.



##  Conclusions

- Threshold tuning significantly improved model balance and accuracy.
- Isotonic calibration further refined predicted probabilities.
- The model generalizes well with solid performance on both recall and precision.



## Future Improvement

- Deploy the model using FastAPI + Kafka for streaming predictions (cloud-native)
