# Model Lineage & Audit Trail Demonstration

This notebook demonstrates how ZenML tracks complete lineage for ML models, enabling full audit trails required for compliance (HIPAA, GDPR, etc.).

## What You'll Learn

1. **Trace a prediction back to its source** - From production prediction ‚Üí training data ‚Üí code commit
2. **View model promotion history** - Who promoted, when, and why
3. **Access all model metadata** - Metrics, parameters, artifacts
4. **Query model lineage programmatically** - For automated compliance reporting

In [None]:
from zenml.client import Client
from zenml.enums import ModelStages

# Initialize ZenML client
client = Client()

print(f"Connected to ZenML: {client.active_stack_model.name}")

## 1. Get the Current Production Model

First, let's get the model currently deployed in production.

In [None]:
# Get production model
try:
    production_model = client.get_model_version(
        "breast_cancer_classifier", ModelStages.PRODUCTION
    )
    print(f"‚úÖ Production Model Version: {production_model.number}")
    print(f"   Stage: {production_model.stage}")
    print(f"   Created: {production_model.created}")
    print(f"   Dashboard: {production_model.id}")
except KeyError:
    print("‚ö†Ô∏è  No production model found. Using staging or latest instead.")
    production_model = client.get_model_version(
        "breast_cancer_classifier", ModelStages.STAGING
    )
    print(f"   Using: {production_model.stage} (version {production_model.number})")

## 2. View Model Performance Metrics

All metrics logged during training are available for review.

In [None]:
# Get model metrics
metrics = production_model.run_metadata

print("üìä Model Performance Metrics:")
print("-" * 40)
for key, value in metrics.items():
    # Extract value from metadata object
    val = value.value if hasattr(value, "value") else value
    if isinstance(val, (int, float)):
        print(f"   {key}: {val:.4f}")
    else:
        print(f"   {key}: {val}")

## 3. Trace Back to Training Pipeline Run

Every model version is linked to the pipeline run that created it.

In [None]:
# Get the training pipeline run
pipeline_runs = production_model.pipeline_runs

if pipeline_runs:
    training_run = pipeline_runs[0]
    print("üîó Training Pipeline Run:")
    print(f"   Pipeline: {training_run.name}")
    print(f"   Status: {training_run.status}")
    print(f"   Started: {training_run.start_time}")
    print(
        f"   Duration: {training_run.end_time - training_run.start_time if training_run.end_time else 'Running'}"
    )
    print(f"   User: {training_run.user.name if training_run.user else 'Unknown'}")
else:
    print("‚ö†Ô∏è  No pipeline run found for this model")

## 4. Access Training Data Artifacts

We can trace back to the exact data used to train this model.

In [None]:
if pipeline_runs:
    # Get the data loading step
    load_data_step = training_run.steps["load_data"]

    print("üì¶ Training Data Artifacts:")
    print(f"   Step: {load_data_step.name}")
    print(f"   Status: {load_data_step.status}")

    # List all output artifacts from data loading
    print("\n   Output Artifacts:")
    for artifact_name, artifact in load_data_step.outputs.items():
        print(f"     - {artifact_name}: {artifact.id}")
        print(f"       Type: {type(artifact).__name__}")
        print(f"       URI: {artifact.uri}")

## 5. Load and Inspect Training Data

For complete audit trails, we can actually load the exact training data.

In [None]:
if pipeline_runs:
    # Load the training data artifact
    X_train_artifact = load_data_step.outputs["X_train"]
    y_train_artifact = load_data_step.outputs["y_train"]

    # Read the data
    X_train = X_train_artifact.load()
    y_train = y_train_artifact.load()

    print("‚úÖ Training Data Loaded:")
    print(f"   Features shape: {X_train.shape}")
    print(f"   Labels shape: {y_train.shape}")
    print("\n   First 3 rows:")
    display(X_train.head(3))

## 6. View Code Commit Information

When integrated with GitHub, ZenML tracks the exact code commit used.

In [None]:
if pipeline_runs:
    # Check for code repository metadata
    if hasattr(training_run, "code_reference") and training_run.code_reference:
        print("üíª Code Repository Information:")
        print(f"   Commit SHA: {training_run.code_reference.commit}")
        print(f"   Repository: {training_run.code_reference.code_repository.name}")
        print(f"   Subdirectory: {training_run.code_reference.subdirectory}")
    else:
        print("‚ÑπÔ∏è  No code repository linked to this run")
        print("   To enable, register a GitHub code repository:")
        print("   zenml code-repository register ...")

## 7. Model Promotion History

Track who promoted the model and when for compliance audit trails.

In [None]:
# Get all versions of this model
all_versions = client.list_model_versions(
    model_name_or_id="breast_cancer_classifier"
)

print("üìú Model Version History:")
print("-" * 70)

for version in all_versions:
    print(f"Version {version.number}:")
    print(f"  Stage: {version.stage or 'None'}")
    print(f"  Created: {version.created}")
    print(f"  Updated: {version.updated}")
    print("-" * 70)

## 8. Complete Lineage Graph

Put it all together: Production prediction ‚Üí Model ‚Üí Training Run ‚Üí Training Data ‚Üí Code

In [None]:
print("üîç Complete Lineage Trace:")
print("=" * 70)

if pipeline_runs:
    print("\n1Ô∏è‚É£  PRODUCTION MODEL")
    print("    Model: breast_cancer_classifier")
    print(f"    Version: {production_model.number}")
    print(f"    Stage: {production_model.stage}")
    print(f"    Dashboard: https://cloud.zenml.io/.../{production_model.id}")

    print("\n    ‚Üì")

    print("\n2Ô∏è‚É£  TRAINING PIPELINE RUN")
    print(f"    Pipeline: {training_run.name}")
    print(f"    Run ID: {training_run.id}")
    print(f"    User: {training_run.user.name if training_run.user else 'Unknown'}")
    print(f"    Timestamp: {training_run.start_time}")

    print("\n    ‚Üì")

    print("\n3Ô∏è‚É£  TRAINING DATA")
    print(f"    Features: {X_train.shape}")
    print(f"    Storage: {X_train_artifact.uri}")
    print(f"    Artifact ID: {X_train_artifact.id}")

    print("\n    ‚Üì")

    print("\n4Ô∏è‚É£  CODE COMMIT")
    if hasattr(training_run, "code_reference") and training_run.code_reference:
        print(f"    Repository: {training_run.code_reference.code_repository.name}")
        print(f"    Commit: {training_run.code_reference.commit}")
    else:
        print("    Not tracked (code repository not configured)")

    print("\n=" * 70)
    print("\n‚úÖ COMPLIANCE VERIFIED: Complete lineage from production to source!")
else:
    print("‚ö†Ô∏è  Pipeline run not found")

## Summary

This notebook demonstrated how ZenML provides:

‚úÖ **Complete model lineage** - Trace any prediction back to source data and code  
‚úÖ **Audit trails** - Who trained, when, with what data  
‚úÖ **Compliance-ready** - All metadata preserved for regulatory review  
‚úÖ **Programmatic access** - Build automated compliance reports  
‚úÖ **Version history** - Track all model promotions and changes  

For healthcare and regulated industries, this level of traceability is essential for:
- HIPAA compliance
- FDA/regulatory submissions  
- Internal audits
- Incident investigation
- Quality assurance