
# MNPS Practical Evaluation Notebook (Data‑Science Style)

**Purpose:** Evaluate model outputs with simple accuracy gates and cost‑of‑error analysis instead of statistical inference.

- **Keep Section 4 as-is** (your batch inference that writes predictions to CSV).
- **Revised Sections 1–3**: configuration and helpers for the practical evaluation.
- **New Sections 5–7**: error thresholds and cost‑of‑error analysis for Major/Minor role groupings.

---



## 1) Overview & Acceptance Gates (Engineering Style)

We use clear, operational acceptance gates based on what an experienced HR job classification specialist would consider acceptable:

- **Major Role Grouping** acceptable error rate: **≤ 2%** (e.g., Analyst vs. Manager misgroup).
- **Minor Role Grouping** acceptable error rate: **≤ 5%** (e.g., I vs II vs III vs Lead).

We also quantify **severity** via a **cost‑of‑error** analysis:
- For **Major** roles, the cost of an error is the **absolute difference** between the annual values mapped to the *true* vs the *predicted* group.
- For **Minor** roles, the cost of an error is the **absolute difference** between the values mapped to the *true* vs the *predicted* level.

> Rationale: Misclassifying across groups/levels can create pay, equity, and staffing impacts. Using the pay deltas captures the seriousness of the error.



## 2) Config & File Paths

Set your file paths and constants here. **Do not** change class labels unless your data uses a different spelling; these are the canonical labels used in this evaluation.


In [None]:

# ==== Required file paths (EDIT as needed) ====
PREDICTIONS_CSV = "data/batch_predictions.csv"  # produced by Section 4 (unchanged)

# ==== Class label vocabularies ====
MAJOR_CLASSES = ["Technician", "Specialist", "Analyst", "Manager", "Coordinator", "Director", "Other"]
MINOR_CLASSES = ["I", "II", "III", "Lead"]

# ==== Acceptance gates ====
ACCEPTABLE_MAJOR_ERROR_RATE = 0.02  # ≤ 2% errors
ACCEPTABLE_MINOR_ERROR_RATE = 0.05  # ≤ 5% errors

# ==== Cost maps (annual $) ====
COST_MAP_MAJOR = {
    "Technician": 54225.00,
    "Specialist": 68765.00,
    "Analyst":   78208.00,
    "Manager":  103904.00,
    "Coordinator": 123133.00,
    "Director": 146246.00,
    "Other":    0.00,   # If "Other" appears, you can adjust as needed.
}

COST_MAP_MINOR = {
    "I":    4405.00,
    "II":   3366.94,
    "III":  6800.00,
    "Lead": 6800.00,
}

# ==== Column names expected in predictions CSV ====
# Adjust here if your Section 4 output uses different names.
COL_RECORD_ID   = "record_id"
COL_TRUE_MAJOR  = "true_major_role"
COL_PRED_MAJOR  = "pred_major_role"
COL_TRUE_MINOR  = "true_minor_role"
COL_PRED_MINOR  = "pred_minor_role"

# ==== Output directory ====
OUTPUT_DIR = "artifacts_eval_practical"



## 3) Helpers

Utility functions for loading data, validating columns, computing confusion matrices, error rates, and cost‑of‑error metrics.


In [None]:

import os
import math
import json
import pandas as pd
from pathlib import Path

Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)

def _require_columns(df: pd.DataFrame, cols):
    missing = [c for c in cols if c not in df.columns]
    if missing:
        raise ValueError(f"Missing required columns: {missing}. "
                         f"Available: {list(df.columns)}")

def load_predictions(path: str) -> pd.DataFrame:
    df = pd.read_csv(path)
    _require_columns(df, [COL_RECORD_ID, COL_TRUE_MAJOR, COL_PRED_MAJOR, COL_TRUE_MINOR, COL_PRED_MINOR])
    return df

def confusion(df: pd.DataFrame, true_col: str, pred_col: str, labels: list) -> pd.DataFrame:
    cm = pd.crosstab(df[true_col], df[pred_col], rownames=["True"], colnames=["Pred"], dropna=False)
    # Ensure all labels exist as rows/cols
    cm = cm.reindex(index=labels, columns=labels, fill_value=0)
    return cm

def accuracy_and_error_rate(df: pd.DataFrame, true_col: str, pred_col: str):
    total = len(df)
    correct = (df[true_col] == df[pred_col]).sum()
    acc = correct / total if total > 0 else float('nan')
    err = 1.0 - acc if total > 0 else float('nan')
    return acc, err, correct, total

def cost_of_error_matrix(df: pd.DataFrame, true_col: str, pred_col: str, cost_map: dict, labels: list) -> pd.DataFrame:
    # Build a matrix of summed absolute cost deltas for each (true, pred) pair
    # For correct cells (true==pred), cost is 0.
    cost_pairs = []
    for _, row in df.iterrows():
        t, p = row[true_col], row[pred_col]
        t_cost = cost_map.get(t, 0.0)
        p_cost = cost_map.get(p, 0.0)
        delta = abs(t_cost - p_cost) if (t != p) else 0.0
        cost_pairs.append((t, p, delta))
    cost_df = pd.DataFrame(cost_pairs, columns=["True", "Pred", "CostDelta"])
    pivot = cost_df.pivot_table(index="True", columns="Pred", values="CostDelta", aggfunc="sum", fill_value=0.0)
    pivot = pivot.reindex(index=labels, columns=labels, fill_value=0.0)
    return pivot

def summarize_costs(df: pd.DataFrame, true_col: str, pred_col: str, cost_map: dict):
    # Total and average cost of misclassifications
    mask_wrong = df[true_col] != df[pred_col]
    if mask_wrong.any():
        deltas = (df.loc[mask_wrong, true_col].map(cost_map).fillna(0.0) - 
                  df.loc[mask_wrong, pred_col].map(cost_map).fillna(0.0)).abs()
        total_cost = float(deltas.sum())
        avg_cost = float(deltas.mean())
        n_errors = int(mask_wrong.sum())
    else:
        total_cost, avg_cost, n_errors = 0.0, 0.0, 0
    return {
        "n_errors": n_errors,
        "total_cost": total_cost,
        "avg_cost_per_error": avg_cost
    }

def export_csv(df: pd.DataFrame, name: str):
    out = Path(OUTPUT_DIR) / name
    df.to_csv(out, index=True)
    return str(out)

def export_json(obj, name: str):
    out = Path(OUTPUT_DIR) / name
    with open(out, "w", encoding="utf-8") as f:
        json.dump(obj, f, indent=2)
    return str(out)




## 4) Load Batch Predictions (Section 4 – Unchanged)

This section **expects** that you already ran your existing **Section 4** (batch inference).  
That step should produce a CSV at `PREDICTIONS_CSV` with the following columns:

- `record_id`
- `true_major_role` and `pred_major_role` (values in the set: Technician, Specialist, Analyst, Manager, Coordinator, Director, Other)
- `true_minor_role` and `pred_minor_role` (values in the set: I, II, III, Lead)

> If your column names differ, adjust the constants in Section 2.


In [None]:

preds = load_predictions(PREDICTIONS_CSV)
print(f"Loaded {len(preds):,} predictions")
preds.head(3)



## 5) Major Role Group Evaluation (Accuracy + Cost of Error)

- Compute Top‑1 accuracy and error rate for **Major** groups.
- Build a **confusion matrix** to see where misgrouping occurs.
- Build a **cost‑of‑error matrix** using the absolute pay delta between true and predicted groups.
- Check against **acceptable error** gate: ≤ 2%.

Exports:
- `confusion_major.csv`
- `cost_matrix_major.csv`
- `metrics_major.json`


In [None]:

# Accuracy/error for Major
major_acc, major_err, major_correct, major_total = accuracy_and_error_rate(preds, COL_TRUE_MAJOR, COL_PRED_MAJOR)
major_gate_pass = (major_err <= ACCEPTABLE_MAJOR_ERROR_RATE)

# Confusion for Major
cm_major = confusion(preds, COL_TRUE_MAJOR, COL_PRED_MAJOR, MAJOR_CLASSES)
cm_major_path = export_csv(cm_major, "confusion_major.csv")

# Cost-of-error for Major
cost_major = cost_of_error_matrix(preds, COL_TRUE_MAJOR, COL_PRED_MAJOR, COST_MAP_MAJOR, MAJOR_CLASSES)
cost_major_path = export_csv(cost_major, "cost_matrix_major.csv")

# Summaries
cost_summary_major = summarize_costs(preds, COL_TRUE_MAJOR, COL_PRED_MAJOR, COST_MAP_MAJOR)

metrics_major = {
    "accuracy": major_acc,
    "error_rate": major_err,
    "correct": major_correct,
    "total": major_total,
    "accept_error_threshold": ACCEPTABLE_MAJOR_ERROR_RATE,
    "pass_gate": bool(major_gate_pass),
    "cost_summary": cost_summary_major,
    "artifacts": {
        "confusion_major_csv": cm_major_path,
        "cost_matrix_major_csv": cost_major_path
    }
}

metrics_major_path = export_json(metrics_major, "metrics_major.json")
print("Major accuracy:", round(major_acc*100, 2), "%")
print("Major error rate:", round(major_err*100, 2), "%", "| PASS gate:", major_gate_pass)
print("Major cost (total): ${:,.2f} | avg per error: ${:,.2f}".format(
    metrics_major["cost_summary"]["total_cost"],
    metrics_major["cost_summary"]["avg_cost_per_error"],
))
cm_major



## 6) Minor Role Group Evaluation (Accuracy + Cost of Error)

- Compute Top‑1 accuracy and error rate for **Minor** levels (I, II, III, Lead).
- Build a **confusion matrix**.
- Build a **cost‑of‑error matrix** using the absolute value difference between true and predicted levels.
- Check against **acceptable error** gate: ≤ 5%.

Exports:
- `confusion_minor.csv`
- `cost_matrix_minor.csv`
- `metrics_minor.json`


In [None]:

# Accuracy/error for Minor
minor_acc, minor_err, minor_correct, minor_total = accuracy_and_error_rate(preds, COL_TRUE_MINOR, COL_PRED_MINOR)
minor_gate_pass = (minor_err <= ACCEPTABLE_MINOR_ERROR_RATE)

# Confusion for Minor
cm_minor = confusion(preds, COL_TRUE_MINOR, COL_PRED_MINOR, MINOR_CLASSES)
cm_minor_path = export_csv(cm_minor, "confusion_minor.csv")

# Cost-of-error for Minor
cost_minor = cost_of_error_matrix(preds, COL_TRUE_MINOR, COL_PRED_MINOR, COST_MAP_MINOR, MINOR_CLASSES)
cost_minor_path = export_csv(cost_minor, "cost_matrix_minor.csv")

# Summaries
cost_summary_minor = summarize_costs(preds, COL_TRUE_MINOR, COL_PRED_MINOR, COST_MAP_MINOR)

metrics_minor = {
    "accuracy": minor_acc,
    "error_rate": minor_err,
    "correct": minor_correct,
    "total": minor_total,
    "accept_error_threshold": ACCEPTABLE_MINOR_ERROR_RATE,
    "pass_gate": bool(minor_gate_pass),
    "cost_summary": cost_summary_minor,
    "artifacts": {
        "confusion_minor_csv": cm_minor_path,
        "cost_matrix_minor_csv": cost_minor_path
    }
}

metrics_minor_path = export_json(metrics_minor, "metrics_minor.json")
print("Minor accuracy:", round(minor_acc*100, 2), "%")
print("Minor error rate:", round(minor_err*100, 2), "%", "| PASS gate:", minor_gate_pass)
print("Minor cost (total): ${:,.2f} | avg per error: ${:,.2f}".format(
    metrics_minor["cost_summary"]["total_cost"],
    metrics_minor["cost_summary"]["avg_cost_per_error"],
))
cm_minor



## 7) Summary & Review Queue

Creates a compact summary of gates and surfaces the most **costly** errors for quick human review.


In [None]:

summary = {
    "major": metrics_major,
    "minor": metrics_minor,
    "gates": {
        "major_pass": metrics_major["pass_gate"],
        "minor_pass": metrics_minor["pass_gate"],
    }
}
summary_path = export_json(summary, "summary_overview.json")

# Build a review sample of the most costly **major** and **minor** errors (top 200 combined by cost delta).
def _error_rows_with_cost(df, true_col, pred_col, cost_map, tag):
    mask = df[true_col] != df[pred_col]
    if not mask.any():
        return pd.DataFrame(columns=["record_id", "true", "pred", "cost_delta", "dimension"])
    t_cost = df.loc[mask, true_col].map(cost_map).fillna(0.0)
    p_cost = df.loc[mask, pred_col].map(cost_map).fillna(0.0)
    d = (t_cost - p_cost).abs()
    out = pd.DataFrame({
        "record_id": df.loc[mask, COL_RECORD_ID],
        "true": df.loc[mask, true_col],
        "pred": df.loc[mask, pred_col],
        "cost_delta": d,
        "dimension": tag
    })
    return out

err_major = _error_rows_with_cost(preds, COL_TRUE_MAJOR, COL_PRED_MAJOR, COST_MAP_MAJOR, "major")
err_minor = _error_rows_with_cost(preds, COL_TRUE_MINOR, COL_PRED_MINOR, COST_MAP_MINOR, "minor")

review = pd.concat([err_major, err_minor], ignore_index=True)
review = review.sort_values("cost_delta", ascending=False).head(200).reset_index(drop=True)

review_path = export_csv(review, "errors_sampled_for_review.csv")

print("Summary written to:", summary_path)
print("Review sample written to:", review_path)
review.head(10)
