# L3 M4.1: Model Cards & AI Governance

## Learning Arc

**Purpose:** Implement comprehensive AI governance for RAG systems through model card documentation, statistical bias detection, human-in-the-loop workflows, and governance committee oversight - aligned with NIST AI RMF and EU AI Act requirements.

**Concepts Covered:**
- Three-pillar AI governance framework (Fairness, Transparency, Accountability)
- NIST AI Risk Management Framework (Govern, Map, Measure, Manage)
- EU AI Act risk classification and high-risk system requirements
- 10-section model card documentation standard
- Statistical bias detection with demographic parity testing
- Human-in-the-loop workflows for high-stakes queries
- Governance committee review processes with voting and approval thresholds
- Incident tracking and severity-based escalation
- Regulatory compliance (GDPR, DPDPA, SOX, EU AI Act)
- Cost-benefit analysis for GCC governance investment

**After Completing This Notebook:**
- You will understand AI governance principles and regulatory requirements (NIST AI RMF, EU AI Act)
- You can create comprehensive model cards documenting all 10 required sections
- You can implement statistical bias detection to identify unfair treatment across demographics
- You will recognize when queries require human review (legal, HR, financial decisions)
- You can establish governance committee processes with approval workflows
- You will know how to track and escalate AI system incidents by severity
- You can justify governance investment costs through regulatory risk analysis

**Context in Track L3.M4:**
This module builds on L3 M1-M3 (RAG fundamentals, evaluation, compliance) and prepares you for L3 M4.2 (Security & Privacy Controls). It shifts focus from technical implementation to organizational governance and regulatory compliance.

In [None]:
# Cell 1: Environment Setup
import os
import sys

# Add src to path for imports
if '../src' not in sys.path:
    sys.path.insert(0, '../src')
if '..' not in sys.path:
    sys.path.insert(0, '..')

# This module runs entirely LOCAL/OFFLINE
# No external AI service APIs required (OpenAI, Anthropic, Pinecone, etc.)
# Uses only local Python libraries: json, pandas, scipy

print("✓ L3 M4.1: Model Cards & AI Governance")
print("✓ SERVICE: LOCAL (offline operation)")
print("✓ No external API keys required")
print("✓ Core features: Model cards, bias detection, governance workflows")

## Section 1: AI Governance Principles

### The Three Pillars of Responsible AI

**Fairness:** Ensuring your AI system does not discriminate or produce biased outcomes
- Test whether different user groups receive similar quality results
- Measure demographic parity across regions, departments, languages
- Set thresholds (e.g., <10% disparity acceptable)

**Transparency:** Making decisions explainable with source documentation
- Model cards document components, data sources, limitations
- Audit logs enable query traceability
- Users can see which documents were retrieved and why

**Accountability:** Clear ownership and oversight mechanisms
- Governance committee with decision authority
- Formal approval workflows for changes
- Incident escalation with severity-based SLAs

## Section 2: NIST AI Risk Management Framework

Four core functions for managing AI risks:

**1. GOVERN** - Establish policies and organizational structures
- Form governance committee (Security, Legal, Privacy, Product, Engineering)
- Define review cadence (quarterly)
- Set approval thresholds (75% for major changes)

**2. MAP** - Identify and assess AI-specific risks
- Document data sources and preprocessing
- Identify potential bias sources (data, retrieval, generation)
- Map high-stakes use cases requiring human review

**3. MEASURE** - Track metrics and performance testing
- Run bias tests across demographics
- Monitor retrieval quality by user group
- Track incident frequency and severity

**4. MANAGE** - Mitigate risks and respond to incidents
- Human-in-the-loop for high-risk queries
- Incident escalation based on severity
- Remediation tracking with owners and deadlines

## Section 3: EU AI Act Risk Classification

### Risk Levels and Requirements

**Unacceptable Risk** (BANNED):
- Social scoring systems
- Real-time biometric surveillance in public spaces
- Manipulation of vulnerable groups

**High Risk** (REGULATED) ← RAG systems often fall here:
- HR decisions (hiring, termination, promotions)
- Legal systems (case research, contract analysis)
- Credit scoring and financial decisions

**Requirements for High-Risk Systems:**
- ✓ Model card documentation
- ✓ Bias assessment and testing
- ✓ Human oversight capabilities
- ✓ Audit trail logging

**Limited Risk** (TRANSPARENCY REQUIRED):
- Chatbots must disclose they are AI
- Deepfakes must be labeled

**Minimal Risk** (NO REGULATION):
- Most informational queries
- Low-stakes applications

In [None]:
# Cell 2: Import Governance Classes
from l3_m4_enterprise_integration import (
    RAGModelCard,
    BiasDetector,
    HumanInTheLoopWorkflow,
    GovernanceReviewer
)
import json

print("✓ Imported all governance classes")
print("  - RAGModelCard: 10-section model card generation")
print("  - BiasDetector: Statistical demographic parity testing")
print("  - HumanInTheLoopWorkflow: High-stakes query routing")
print("  - GovernanceReviewer: Committee voting and incident tracking")

## Section 4: Model Card Generation

### The 10-Section Standard

Model cards are standardized documentation for AI systems - like nutrition labels for food.

**10 Required Sections:**
1. **Model Details:** Identity, version, owner, contact
2. **Components:** Embedding model, vector DB, LLM, retrieval method
3. **Intended Use:** Primary use cases + **out-of-scope uses** (liability protection)
4. **Training & Data:** Sources, volume, preprocessing, gaps
5. **Performance:** Metrics with baselines (precision@5, recall@10)
6. **Ethical Considerations:** Fairness testing, bias mitigation, privacy
7. **Limitations:** Known failure modes and edge cases
8. **Recommendations:** Usage guidelines and monitoring
9. **Governance:** Committee structure, review cadence, escalation
10. **Change Log:** Version history with timestamps

**Critical Insight:** "Out-of-scope uses protect from liability" - Explicitly documenting prohibited applications defends against misuse claims.

In [None]:
# Cell 3: Create a Model Card
card = RAGModelCard(
    model_name="GCC_Compliance_RAG",
    model_version="1.0.0",
    model_owner="AI Engineering Team",
    contact_email="ai-engineering@gcc.com"
)

# Document RAG components
card.set_components(
    embedding_model="text-embedding-3-small",
    vector_database="Qdrant",
    generation_model="gpt-4",
    retrieval_method="hybrid (semantic + keyword)",
    reranker="cohere-rerank-v3"
)

# Define intended and prohibited uses
card.set_intended_use(
    primary_use_cases=[
        "Employee HR policy questions",
        "Compliance document retrieval for legal team",
        "Benefits information lookup"
    ],
    out_of_scope_uses=[
        "Making hiring/firing decisions without human review",
        "Providing legal advice for lawsuits",
        "Setting employee compensation",
        "Medical diagnosis or treatment recommendations"
    ],
    target_users=["HR team", "Legal team", "Employees"],
    use_limitations=["Requires human review for high-stakes decisions"]
)

# Add limitations
card.add_limitation("May hallucinate when document coverage is sparse")
card.add_limitation("Retrieval quality degrades with ambiguous queries")
card.add_limitation("Does not handle multi-lingual queries equally well")

# Add recommendations
card.add_recommendation("Conduct quarterly bias testing across user demographics")
card.add_recommendation("Route legal/HR/financial queries to human review")
card.add_recommendation("Monitor retrieval quality by region and department")

# Set governance
card.set_governance(
    review_committee=["Security", "Legal", "Privacy", "Product", "Engineering"],
    review_cadence="Quarterly",
    incident_escalation="Report via JIRA to AI Governance Board",
    approval_authority="VP Engineering and Chief Legal Officer"
)

# Add change log
card.add_change_log_entry(
    version="1.0.0",
    changes="Initial release with GPT-4 and hybrid search",
    author="AI Engineering Team"
)

print("✓ Model card created with 10 sections")
print(f"  Name: {card.model_name}")
print(f"  Version: {card.model_version}")
print(f"  Primary uses: {len(card.intended_use.get('primary_use_cases', []))}")
print(f"  Out-of-scope uses: {len(card.intended_use.get('out_of_scope_uses', []))}")
print(f"  Limitations: {len(card.limitations)}")
print(f"  Recommendations: {len(card.recommendations)}")

In [None]:
# Cell 4: Export Model Card as JSON
json_output = card.to_json()
parsed = json.loads(json_output)

print("✓ Model card exported to JSON")
print(f"\nTop-level keys: {list(parsed.keys())}")
print(f"\nModel Details:")
print(f"  {parsed['model_details']}")
print(f"\nComponents:")
for key, value in parsed['components'].items():
    print(f"  {key}: {value}")

# Expected: JSON with all 10 sections populated

In [None]:
# Cell 5: Export Model Card as Markdown
markdown_output = card.to_markdown()

print("✓ Model card exported to Markdown")
print(f"\nFirst 500 characters:")
print(markdown_output[:500])
print("\n...")

# Uncomment to see full markdown
# print(markdown_output)

# Expected: Markdown document with all 10 sections formatted

**SAVED_SECTION:1** (Model Card Generation)

## Section 5: Bias Detection Framework

### Three Categories of Bias in RAG Systems

**1. Data Bias:** Training collection over-represents certain groups
- Example: 80% North America docs, 20% Asia-Pacific → NA queries get better results
- Mitigation: Balanced data collection, stratified sampling

**2. Retrieval Bias:** Semantic search favors particular document types
- Example: Technical docs rank higher than policy docs for all queries
- Mitigation: Hybrid search (semantic + keyword), reranking

**3. Generation Bias:** LLM training data imbalances
- Example: Stereotyped outputs (engineers assumed male, nurses assumed female)
- Mitigation: Fine-tuning, output filters, human review

### Statistical Testing Approach

**Demographic Parity:** Do different groups receive similar quality?
- Compare average scores across groups (Region A vs Region B)
- Flag if disparity >10% (configurable threshold)
- Use pandas and scipy for statistical significance

In [None]:
# Cell 6: Initialize Bias Detector
detector = BiasDetector(disparity_threshold=0.10)  # 10% threshold

print("✓ BiasDetector initialized")
print(f"  Disparity threshold: {detector.disparity_threshold * 100}%")
print(f"  Interpretation: Flag if quality difference between groups > 10%")

In [None]:
# Cell 7: Test Case 1 - No Bias Detected
# Scenario: Both regions get similar quality results

north_america_scores = [0.88, 0.90, 0.87, 0.89, 0.91, 0.88, 0.90]
asia_pacific_scores = [0.85, 0.87, 0.86, 0.84, 0.88, 0.85, 0.87]

result = detector.test_demographic_parity(
    group_a_scores=north_america_scores,
    group_b_scores=asia_pacific_scores,
    group_a_name="North America",
    group_b_name="Asia Pacific"
)

print("Test Case 1: Regional Quality Comparison")
print(f"  North America avg: {result['group_a']['avg_score']}")
print(f"  Asia Pacific avg: {result['group_b']['avg_score']}")
print(f"  Relative disparity: {result['disparity']['relative'] * 100:.1f}%")
print(f"  Bias detected: {result['bias_detected']}")
print(f"  Severity: {result['severity']}")

# Expected: bias_detected = False (disparity < 10%)

In [None]:
# Cell 8: Test Case 2 - Bias Detected (Medium Severity)
# Scenario: Engineering gets much better results than Marketing

engineering_scores = [0.92, 0.94, 0.91, 0.93, 0.92, 0.94, 0.93]
marketing_scores = [0.75, 0.73, 0.76, 0.74, 0.77, 0.75, 0.76]

result = detector.test_demographic_parity(
    group_a_scores=engineering_scores,
    group_b_scores=marketing_scores,
    group_a_name="Engineering Dept",
    group_b_name="Marketing Dept"
)

print("Test Case 2: Department Quality Comparison")
print(f"  Engineering avg: {result['group_a']['avg_score']}")
print(f"  Marketing avg: {result['group_b']['avg_score']}")
print(f"  Relative disparity: {result['disparity']['relative'] * 100:.1f}%")
print(f"  Bias detected: {result['bias_detected']}")
print(f"  Severity: {result['severity']}")

# Expected: bias_detected = True (disparity >10%, likely medium severity)

In [None]:
# Cell 9: Bias Testing Summary
summary = detector.get_summary()

print("Bias Testing Summary")
print(f"  Total tests: {summary['total_tests']}")
print(f"  Bias detected: {summary['bias_detected']} tests")
print(f"  Bias rate: {summary['bias_rate'] * 100:.1f}%")
print(f"\nSeverity breakdown:")
for severity, count in summary['severities'].items():
    print(f"    {severity}: {count}")

# Add results to model card
card.set_ethical_considerations(
    fairness_testing=summary,
    bias_mitigation=[
        "Regular demographic testing across regions and departments",
        "Balanced document collection from all business units",
        "Hybrid search to reduce semantic bias"
    ],
    privacy_measures=[
        "PII detection and redaction",
        "Access controls by user role",
        "Audit logging for compliance"
    ]
)

print("\n✓ Bias testing results added to model card")

**SAVED_SECTION:2** (Bias Detection)

## Section 6: Human-in-the-Loop Workflows

### When to Require Human Review

**High-Risk Query Indicators (Keywords):**
- Legal: "lawsuit", "legal advice", "compliance violation"
- HR: "termination", "fire", "hire", "promotion", "discrimination"
- Financial: "investment", "financial advice", "credit"
- Medical: "diagnosis", "treatment", "medical advice"

**Workflow:**
1. Classify query risk (HIGH or LOW)
2. If HIGH-RISK: Route to review queue, block auto-response
3. Human reviewer approves or rejects within SLA (e.g., 30 minutes)
4. Audit trail logs classification, reviewer decision, timestamp

**Critical:** Technical enforcement prevents bypass - code blocks auto-response for HIGH-risk queries even under pressure.

In [None]:
# Cell 10: Initialize Human-in-the-Loop Workflow
workflow = HumanInTheLoopWorkflow()

print("✓ HumanInTheLoopWorkflow initialized")
print(f"  High-risk keywords: {len(workflow.high_risk_keywords)}")
print(f"  Examples: {workflow.high_risk_keywords[:5]}")

In [None]:
# Cell 11: Low-Risk Query Example
query_1 = "What is our vacation policy for new employees?"

result = workflow.process_query(
    query=query_1,
    user_context={"department": "HR", "user": "employee123"}
)

print(f"Query: {query_1}")
print(f"  Status: {result['status']}")
print(f"  Risk level: {result.get('classification', {}).get('risk_level', 'N/A')}")
print(f"  Requires review: {result.get('classification', {}).get('requires_review', False)}")

# Expected: status = 'approved', can proceed without review

In [None]:
# Cell 12: High-Risk Query Example (Legal)
query_2 = "What are the legal implications of terminating an employee for poor performance?"

result = workflow.process_query(
    query=query_2,
    user_context={"department": "HR", "user": "manager456"}
)

print(f"Query: {query_2}")
print(f"  Status: {result['status']}")
print(f"  Queue ID: {result.get('queue_id', 'N/A')}")
print(f"  Queue position: {result.get('queue_position', 'N/A')}")
print(f"  Estimated wait: {result.get('estimated_wait', 'N/A')}")

# Expected: status = 'queued_for_review', blocked until human approves

In [None]:
# Cell 13: High-Risk Query Example (Financial)
query_3 = "Should we invest company funds in cryptocurrency?"

result = workflow.process_query(query=query_3)

print(f"Query: {query_3}")
print(f"  Status: {result['status']}")
print(f"  Queue ID: {result.get('queue_id', 'N/A')}")

# Expected: status = 'queued_for_review' (financial advice keyword detected)

In [None]:
# Cell 14: Review Queue Status
queue_status = workflow.get_queue_status()

print("Human Review Queue Status")
print(f"  Total queued: {queue_status['total_queued']}")
print(f"  Pending review: {queue_status['pending_review']}")
print(f"  Approved: {queue_status['approved']}")
print(f"  Rejected: {queue_status['rejected']}")
print(f"  Audit log size: {queue_status['audit_log_size']}")

# Expected: 2 queries in queue (legal and financial), audit log has 3 entries (including low-risk)

**SAVED_SECTION:3** (Human-in-the-Loop)

## Section 7: Governance Committee Review

### Committee Structure

**5-Member Committee (Cross-Functional):**
1. **Security:** Assesses cybersecurity risks, access controls
2. **Legal:** Evaluates regulatory compliance, liability
3. **Privacy:** Reviews data protection, PII handling
4. **Product:** Considers user impact, feature value
5. **Engineering:** Evaluates technical feasibility, performance

**Approval Threshold:** 75% (4 of 5 votes required)

**Review Cadence:** Quarterly for routine updates, ad-hoc for major changes

**Changes Requiring Review:**
- Model upgrades (GPT-3.5 → GPT-4)
- New data sources
- Algorithm changes (semantic → hybrid search)
- Deployment to new user groups
- Response to critical incidents

In [None]:
# Cell 15: Initialize Governance Reviewer
reviewer = GovernanceReviewer(
    committee_members=["Security", "Legal", "Privacy", "Product", "Engineering"],
    review_cadence="Quarterly",
    approval_threshold=0.75  # 75% = 4 of 5 votes
)

print("✓ GovernanceReviewer initialized")
print(f"  Committee size: {len(reviewer.committee_members)}")
print(f"  Members: {', '.join(reviewer.committee_members)}")
print(f"  Approval threshold: {reviewer.approval_threshold * 100}%")
print(f"  Votes needed: {int(len(reviewer.committee_members) * reviewer.approval_threshold)} of {len(reviewer.committee_members)}")

In [None]:
# Cell 16: Submit Change for Review
review_result = reviewer.submit_for_review(
    change_type="model_update",
    description="Upgrade from GPT-3.5 to GPT-4 for improved accuracy",
    impact_assessment="Expected 15% improvement in answer quality, no change to latency, +$500/month cost",
    submitted_by="AI Engineering Team"
)

print("Change Submitted for Review")
print(f"  Review ID: {review_result['review_id']}")
print(f"  Status: {review_result['status']}")
print(f"  Committee size: {review_result['committee_size']}")
print(f"  Approval threshold: {review_result['approval_threshold'] * 100}%")

In [None]:
# Cell 17: Committee Voting
# Simulate committee members voting

review_id = review_result['review_id']

# Security approves
vote_1 = reviewer.cast_vote(
    review_id=review_id,
    committee_member="Security",
    vote="approve",
    comments="Security assessment passed, no new vulnerabilities introduced"
)
print(f"Security: {vote_1['status']} - {vote_1['votes_cast']}/{vote_1['votes_needed']} votes")

# Legal approves
vote_2 = reviewer.cast_vote(
    review_id=review_id,
    committee_member="Legal",
    vote="approve",
    comments="No regulatory concerns, maintains compliance"
)
print(f"Legal: {vote_2['status']} - {vote_2['votes_cast']}/{vote_2['votes_needed']} votes")

# Privacy approves
vote_3 = reviewer.cast_vote(
    review_id=review_id,
    committee_member="Privacy",
    vote="approve",
    comments="No change to data handling, PII controls remain in place"
)
print(f"Privacy: {vote_3['status']} - {vote_3['votes_cast']}/{vote_3['votes_needed']} votes")

# Product rejects (concerned about cost)
vote_4 = reviewer.cast_vote(
    review_id=review_id,
    committee_member="Product",
    vote="reject",
    comments="Cost increase not justified for 15% quality gain, suggest A/B test first"
)
print(f"Product: {vote_4['status']} - {vote_4['votes_cast']}/{vote_4['votes_needed']} votes")

# Engineering approves (final vote)
vote_5 = reviewer.cast_vote(
    review_id=review_id,
    committee_member="Engineering",
    vote="approve",
    comments="Technical implementation straightforward, quality improvement measurable"
)
print(f"Engineering: {vote_5['status']} - {vote_5['votes_cast']}/{vote_5['votes_needed']} votes")

print(f"\nFinal Decision: {vote_5['current_decision']}")
print("(4 of 5 votes approve = 80% > 75% threshold → APPROVED)")

**SAVED_SECTION:4** (Governance Review)

## Section 8: Incident Tracking and Escalation

### Severity Levels and SLAs

**LOW:** Minor issue, track for patterns
- Examples: Query timeout, slow response
- Action: Log and review at next quarterly meeting
- Escalation: None

**MEDIUM:** Needs committee review
- Examples: Hallucination, incorrect policy information
- Action: Immediate correction, review at next meeting
- Escalation: Committee review within 1 week

**HIGH:** Immediate committee attention
- Examples: 20% bias disparity detected, repeated failures
- Action: Pause deployment to affected groups
- Escalation: Committee notification within 24 hours

**CRITICAL:** System shutdown, executive escalation
- Examples: Privacy breach, PII exposure, regulatory violation
- Action: Immediate shutdown, notify affected users
- Escalation: Executive team + Board within 2 hours

In [None]:
# Cell 18: Report Low Severity Incident
incident_1 = reviewer.report_incident(
    incident_type="query_timeout",
    description="System timed out on complex multi-part query with 5 sub-questions",
    severity="low",
    reported_by="Operations Team"
)

print("Incident 1: LOW Severity")
print(f"  Incident ID: {incident_1['incident_id']}")
print(f"  Status: {incident_1['status']}")
print(f"  Escalated: {incident_1['escalated']}")
print(f"  Next steps: {incident_1['next_steps']}")

In [None]:
# Cell 19: Report High Severity Incident
incident_2 = reviewer.report_incident(
    incident_type="bias_detected",
    description="20% quality disparity detected between North America and Asia-Pacific regions in quarterly testing",
    severity="high",
    reported_by="Data Science Team"
)

print("Incident 2: HIGH Severity")
print(f"  Incident ID: {incident_2['incident_id']}")
print(f"  Status: {incident_2['status']}")
print(f"  Escalated: {incident_2['escalated']}")
print(f"  Next steps: {incident_2['next_steps']}")

In [None]:
# Cell 20: Report Critical Severity Incident
incident_3 = reviewer.report_incident(
    incident_type="privacy_breach",
    description="System exposed PII from one user to another user's query results (leaked salary info)",
    severity="critical",
    reported_by="Security Team"
)

print("Incident 3: CRITICAL Severity")
print(f"  Incident ID: {incident_3['incident_id']}")
print(f"  Status: {incident_3['status']}")
print(f"  Escalated: {incident_3['escalated']}")
print(f"  Next steps: {incident_3['next_steps']}")
print("\n⚠️ CRITICAL incidents trigger immediate executive escalation")

**SAVED_SECTION:5** (Incident Tracking)

## Section 9: Governance Health Metrics

### Key Performance Indicators

**Review Metrics:**
- Approval rate (target: >60%)
- Average time to decision (target: <2 weeks)
- Committee participation (target: 100% voting)

**Incident Metrics:**
- Total incidents per quarter
- Critical incidents (target: 0)
- Average time to resolution

**Bias Metrics:**
- Tests conducted per release (minimum 1)
- Bias detection rate (monitored, no fixed target)
- Remediation completion rate (target: 100%)

In [None]:
# Cell 21: Governance Summary
summary = reviewer.get_governance_summary()

print("=" * 60)
print("GOVERNANCE HEALTH SUMMARY")
print("=" * 60)

print("\nCommittee Configuration:")
print(f"  Members: {', '.join(summary['committee']['members'])}")
print(f"  Size: {summary['committee']['size']}")
print(f"  Review cadence: {summary['committee']['review_cadence']}")
print(f"  Approval threshold: {summary['committee']['approval_threshold'] * 100}%")

print("\nReview Activity:")
print(f"  Total reviews: {summary['reviews']['total']}")
print(f"  Approved: {summary['reviews']['approved']}")
print(f"  Rejected: {summary['reviews']['rejected']}")
print(f"  Pending: {summary['reviews']['pending']}")
print(f"  Approval rate: {summary['reviews']['approval_rate'] * 100:.1f}%")

print("\nIncident Tracking:")
print(f"  Total incidents: {summary['incidents']['total']}")
print(f"  Open: {summary['incidents']['open']}")
print(f"  Critical: {summary['incidents']['critical']}")

print("\n" + "=" * 60)

**SAVED_SECTION:6** (Governance Metrics)

## Section 10: Complete Governance Workflow Integration

### End-to-End Example: Deploying a New RAG System

**Step 1:** Create model card documenting all 10 sections
**Step 2:** Run bias testing across demographics (regions, departments, languages)
**Step 3:** Configure human-in-the-loop for high-stakes queries
**Step 4:** Submit for governance committee review
**Step 5:** Committee votes (75% approval required)
**Step 6:** Deploy with monitoring and quarterly reviews

This workflow ensures compliance with NIST AI RMF, EU AI Act, and corporate governance policies.

In [None]:
# Cell 22: Complete Workflow Demonstration
print("COMPLETE AI GOVERNANCE WORKFLOW")
print("=" * 60)

print("\nStep 1: Model Card Created ✓")
print(f"  System: {card.model_name} v{card.model_version}")
print(f"  Sections: 10/10 complete")
print(f"  Exports: JSON, Markdown")

print("\nStep 2: Bias Testing Conducted ✓")
bias_summary = detector.get_summary()
print(f"  Tests run: {bias_summary['total_tests']}")
print(f"  Bias detected: {bias_summary['bias_detected']} tests")
print(f"  Action: Results added to model card ethical considerations")

print("\nStep 3: Human-in-the-Loop Configured ✓")
hitl_status = workflow.get_queue_status()
print(f"  High-risk keywords: {len(workflow.high_risk_keywords)}")
print(f"  Queries routed to review: {hitl_status['total_queued']}")
print(f"  Audit log entries: {hitl_status['audit_log_size']}")

print("\nStep 4: Governance Review Submitted ✓")
gov_summary = reviewer.get_governance_summary()
print(f"  Reviews submitted: {gov_summary['reviews']['total']}")
print(f"  Approval rate: {gov_summary['reviews']['approval_rate'] * 100:.1f}%")

print("\nStep 5: Incident Tracking Active ✓")
print(f"  Incidents reported: {gov_summary['incidents']['total']}")
print(f"  Critical incidents: {gov_summary['incidents']['critical']}")
print(f"  Open incidents: {gov_summary['incidents']['open']}")

print("\nStep 6: Regulatory Compliance ✓")
print("  NIST AI RMF: Govern, Map, Measure, Manage functions implemented")
print("  EU AI Act: High-risk system documentation complete")
print("  GDPR/DPDPA: Transparency and accountability mechanisms in place")

print("\n" + "=" * 60)
print("GOVERNANCE WORKFLOW COMPLETE")
print("System ready for deployment with full governance oversight")
print("=" * 60)

**SAVED_SECTION:7** (Complete Workflow)

## Section 11: Cost-Benefit Analysis for GCCs

### Investment Justification

**Small GCC (500 employees, 10 tenants):**
- Annual Cost: ₹9.2L ($11K USD)
- ROI: Avoiding single DPDPA fine (₹250Cr max) = 2700× return

**Medium GCC (2000 employees, 30 tenants):**
- Annual Cost: ₹70L ($84K USD)
- ROI: Avoiding GDPR fine (€20M = ₹180Cr) = 257× return

**Large GCC (5000 employees, 50+ tenants):**
- Annual Cost: ₹2.5Cr ($300K USD)
- ROI: Avoiding EU AI Act fine (€30M = ₹270Cr) = 108× return

**Key Insight:** Preventing a single major regulatory fine provides 10-100× return on governance investment.

## Section 12: Common Governance Failures to Avoid

### 1. Governance as Theater
**Problem:** Model cards created but ignored, committees rubber-stamp decisions
**Fix:** Give committee veto power, include executives, tie updates to deployment

### 2. Stale Documentation
**Problem:** Model cards become outdated as system evolves
**Fix:** Automate updates in deployment pipeline, enforce version matching

### 3. Bias Testing Without Remediation
**Problem:** Bias detected but no owner or timeline to fix
**Fix:** Assign owners, set deadlines, critical bias blocks deployment

### 4. Human-in-the-Loop Bypassed
**Problem:** Review requirements abandoned when queues back up
**Fix:** Technical enforcement (code blocks auto-response), plan capacity

### 5. Committee Without Authority
**Problem:** Advisory-only committee cannot block risky deployments
**Fix:** Include executives with veto power, document escalation paths

## Key Takeaways

1. **"Out-of-scope uses protect from liability"** - Document prohibited uses explicitly in model cards

2. **Governance requires decision authority** - Advisory committees fail; ensure veto power

3. **Technical enforcement prevents bypass** - Code blocks high-risk auto-responses

4. **Bias testing must lead to remediation** - Detection without action is theater

5. **ROI justification is compelling** - Single regulatory fine avoidance = 10-100× return

6. **Three pillars are essential** - Fairness, Transparency, Accountability working together

7. **Documentation enables compliance** - Model cards satisfy NIST, EU AI Act, GDPR requirements

## Next Steps

- **L3 M4.2:** Security & Privacy Controls - PII detection, encryption, access controls
- **Practice:** Create model card for your own RAG system
- **Explore:** Run bias tests on production query logs
- **Implement:** Configure human-in-the-loop for high-stakes queries

**SAVED_SECTION:8** (Final Summary)

---

**Notebook Complete!** All sections saved incrementally.

This notebook demonstrated:
- ✓ Model card generation (10 sections)
- ✓ Bias detection with statistical testing
- ✓ Human-in-the-loop query routing
- ✓ Governance committee voting
- ✓ Incident tracking and escalation
- ✓ Complete workflow integration
- ✓ Cost-benefit analysis
- ✓ Common failure modes and fixes