# Fraud model training with MLflow (Phase 3)

This notebook demonstrates how to:

- Generate a small synthetic dataset (reusing the Phase 2 generator)
- Build a fraud training DataFrame
- Train an XGBoost model with Optuna tuning
- Log metrics, parameters, model, and SHAP artifacts to MLflow


In [1]:
from pathlib import Path

import mlflow

from scripts.seed_data import generate_synthetic_data
from common.model_utils import build_fraud_training_dataframe, train_fraud_model


ModuleNotFoundError: No module named 'mlflow'

In [3]:
# Generate a modest dataset so the notebook runs quickly.

project_root = Path.cwd()
print(f"Project root: {project_root}")

event_metrics, user_metrics = generate_synthetic_data(
    n_events=50,
    n_users=500,
    n_transactions=5000,
    seed=42,
)

user_metrics.head()

Project root: /workspace/notebooks


NameError: name 'generate_synthetic_data' is not defined

In [None]:
# Build a training DataFrame with a simple binary label derived from fraud_risk_score.

train_df = build_fraud_training_dataframe(user_metrics, fraud_threshold=0.08)
train_df.head()

In [None]:
# Optionally override the tracking URI here, or rely on ENV/`.env.local` via common.config.

# mlflow.set_tracking_uri("http://localhost:5000")

result = train_fraud_model(
    df=train_df,
    target_column="is_fraud_label",
    n_trials=5,
    test_size=0.2,
    random_state=123,
)

print("ROC AUC:", result.roc_auc)
print("Accuracy:", result.accuracy)
print("Run ID:", result.run_id)
print("Features:", result.feature_names)


You can now open the MLflow UI (default: `http://localhost:5000`) to inspect the
runs, parameters, metrics, model artifact, and SHAP outputs under the
`fraud_detection` experiment.
