# Access Permission Approval Prediction

Predicting whether internal app permission requests will be approved, using historical decisions plus user, application, and organizational context. Trained with CatBoost and validated with user-grouped cross-validation to avoid leakage.


## Summary
- Data: 116k historical decisions (3.7k users, 500 apps), balanced target (~50%).
- Model: CatBoost classifier on mixed tabular features (app identity/category, org context, machine flag, manager signals).
- Validation: GroupKFold by user to mirror deployment on new users; probabilities used for thresholded decisions.
- OOF metrics: ROC-AUC **0.932**, PR-AUC **0.930**, Accuracy **0.857**.
- Operating point: threshold **0.542** → TPR ~0.854, FPR ~0.140.


## Problem Overview
The task is to predict whether an application permission request will be granted or denied, based on historical decisions and associated metadata about users, applications, and organizational context. The goal is to provide accurate, calibrated predictions that could support or automate parts of the decision process. Success is measured by the model's ability to distinguish approvals from denials, maintain consistent performance under leakage-aware validation, and produce probabilities that remain reliable when converted to binary decisions.


## Data Audit
The dataset comprises 116,063 historical permission decisions involving 3,724 users and 509 applications. Initial exploration revealed:

- **Balanced target**: ~50.1% approval rate provides a well-balanced classification problem
- **Clean data**: Keys in metadata are unique; there are no contradictory labels for a (user, app) pair and no exact duplicates.
- **Missing values**: Significant gaps in organizational fields (40% for department, 22% for office location), while app category shows minimal missingness (3.5%).

The high rate of missing organizational data presented an opportunity for feature engineering through manager metadata recovery.


## Exploratory Data Analysis
Several patterns emerged from examining main effects:

* **Organizational structure**: Departments and offices show substantial, high-volume variation around the global base rate (e.g., department 3 is much higher than average, while departments 1 and 4 are lower).
* **User type**: Machine accounts have very low grant rates (~8.5%) and represent ~2% of requests. Informative but low volume.
* **Seniority**: Mid-level roles trend higher, but effects are modest compared with app and organizational signals.

These findings justify investing in app identity and organizational context while treating rare categories and small groups cautiously.


## Feature Engineering

### Missing Value Handling
* Use a sentinel (`__UNKNOWN__`) for missing categoricals so the model can learn missingness as signal.
* Join manager metadata (department, office, seniority) and **backfill** user fields only when the manager value is known; otherwise retain the sentinel. This preserves missingness while recovering context.

### Identity Management
* Exclude `userId` to avoid user-specific memorization and steer the model toward app/org patterns that generalize.
* Retain `managerId` to capture team-level norms available at inference time.
* Add `manager_seniority_diff` (user seniority - manager seniority) to encode hierarchy; leave NA where no manager exists.

### Categorical Encoding Notes
* Treat IDs/codes (`appId`, `managerId`, `department_filled`, `office_filled`, `category`) as nominal strings and pass as categorical features to CatBoost.


## Modeling and Validation

### Model Selection
**CatBoost** was chosen for:
- Native handling of high-cardinality categorical features
- Robust treatment of missing values
- Strong performance on mixed-type tabular data

### Validation Strategy
Evaluation uses **GroupKFold cross-validation** grouped by `userId` to ensure a given user appears in only one fold, preventing leakage from repeated user patterns and targeting the intended generalization scenario: new users on seen apps. AUC is consistent across folds, indicating patterns generalize well.

### Performance (OOF)
- **ROC-AUC**: 0.932
- **PR-AUC**: 0.930


## Threshold Selection
Several criteria were considered (maximum accuracy, maximum F1, and a 0.50 reference). With classes nearly balanced and no stated asymmetry in error costs, maximizing accuracy is the neutral choice: it minimizes total error at a single threshold rather than overweighting the positive class.

**Selected threshold: t = 0.542**
- **Accuracy:** 85.7%
- **Precision:** 85.9%
- **Recall:** 85.4%
- **False positive rate:** 14.0%


## Segment Coverage
- Evaluated top apps, departments, and manager departments (n>=100) at the chosen threshold.
- Per-segment metrics (count, approval rate, accuracy, ROC-AUC when defined) are stored in `reports/segment_metrics.json`.
- No single segment shows collapse; highest-volume apps and departments retain accuracy near the global operating point.


## Key Figures
- ROC & PR: ![ROC & PR](../docs/figures/roc_pr.png)
- Score distribution: ![Score distribution](../docs/figures/score_distribution.png)
- Confusion matrix: ![Confusion matrix](../docs/figures/confusion_matrix.png)
- Calibration: ![Calibration](../docs/figures/calibration.png)
- SHAP importance: ![SHAP importance](../docs/figures/shap_importance.png)


## Feature Importance
CatBoost importances (relative influence on predictions; normalized to 100):

* **App**: **28.5%**. Individual applications are the largest source of signal.
* **Department**: **20.6%**. Organizational unit is strongly predictive.
* **Office**: **11.9%**. Location or functional division adds meaningful context.
* **Manager**: **11.5%**. Team-level patterns matter.
* **Manager office**: **9.0%**. Manager context reinforces organizational effects.
* **Category**: **4.1%**. Coarser app grouping contributes but far less than specific apps.

Overall, the model is app-driven and heavily informed by organizational context. Core org features account for ~46% of total importance.


## Considerations and Risks

* **Cost asymmetry**: The chosen operating point assumes roughly symmetric costs; adjust the threshold if costs differ.
* **Distribution shifts**: Shifts in app/department mix or approval rate can move the optimal threshold.
* **Cold start on new entities**: Expect lower accuracy on new apps/managers/departments/offices until history accrues.
* **Segment variability**: Aggregate metrics can hide weaker performance in specific segments; monitor segment-level metrics.


## Repro & Usage
- Train/evaluate: `python -m access_perm.train --config config/default.yaml` (artifacts in `models/` and `reports/`).
- Regenerate plots: `python -m access_perm.report --config config/default.yaml`.
- Score new requests: `python scripts/predict.py --config config/default.yaml --input data/submission.csv --output reports/predictions.csv`.
- No code is embedded here; all logic lives in `src/access_perm` and is covered by tests and CLI entrypoints.
