# L3 M1.4: Compliance Documentation & Evidence

**Module:** L3 M1 - Compliance Foundations for RAG Systems  
**Video:** M1.4 - Compliance Documentation & Evidence  
**Duration:** 40-45 minutes

## Learning Arc

**By the end of this notebook, you will:**
1. Understand the three types of compliance evidence (system, process, outcome)
2. Implement immutable audit trails using SHA-256 hash chains
3. Build the AuditEvent dataclass with cryptographic linking
4. Build the AuditTrail class for append-only logging
5. Verify hash chain integrity (tamper detection)
6. Generate compliance reports for SOX 404, ISO 27001, SOC 2
7. Conduct vendor risk assessments for third-party AI services
8. Trace requests across components using correlation IDs
9. Understand real-world GCC compliance scenarios

**Prerequisites:**
- Generic CCC Level 1 complete (RAG fundamentals, vector DBs)
- GCC Compliance M1.1 (Regulatory Landscape)
- GCC Compliance M1.2 (Data Privacy in RAG)
- GCC Compliance M1.3 (Access Control & RBAC)

**Real-World Hook:**

It's Friday at 4 PM. Your compliance officer emails:

> "We have a surprise SOX 404 audit Monday morning. They want a full year of evidence—all logs, access records, configuration changes, and **mathematical proof** that logs haven't been tampered with. Can you pull this by Monday?"

Without this module: You spend the weekend manually exporting logs from 20 different systems, praying there are no gaps. The auditor questions log integrity. Your CFO gets a deficiency letter.

With this module: You run one command: `generate_compliance_report(start='2024-01-01', end='2024-12-31')`. Report generated in 45 seconds. Hash chain verified intact. Evidence package ready. Auditor impressed.

Let's build it.

## Section 1: Setup & Configuration

This notebook demonstrates compliance evidence concepts. PostgreSQL and AWS S3 are **optional**—the module can run in offline mode (in-memory) for learning/testing.

In [None]:
import sys
import os

# Add project root to path
sys.path.insert(0, os.path.abspath('..'))

# Check for environment configuration
from config import validate_config, get_config

is_valid = validate_config()
config = get_config()

print("Configuration Status:")
print(f"  Module: {config['module_name']}")
print(f"  Version: {config['version']}")
print(f"  PostgreSQL: {'Enabled' if config['postgres']['enabled'] else 'Disabled (in-memory mode)'}")
print(f"  AWS S3: {'Enabled' if config['s3']['enabled'] else 'Disabled (no evidence export)'}")
print(f"  Audit Retention: {config['compliance']['audit_retention_days']} days (~{config['compliance']['audit_retention_days']//365} years)")

if not config['postgres']['enabled']:
    print("\n⚠️ Running in OFFLINE mode (no PostgreSQL configured)")
    print("This is acceptable for development/testing but NOT for production.")
    print("Audit trail will use in-memory storage.")

**SAVED_SECTION:1**

## Section 2: Understanding Compliance Evidence Types

Compliance evidence comes in three categories. Understanding the taxonomy helps you organize evidence collection.

### Evidence Taxonomy

**1. System Evidence** (Technical Artifacts)
- Audit logs (who did what, when)
- Database schemas (data structure)
- Network diagrams (architecture)
- Configuration files (system settings)

**2. Process Evidence** (Documentation)
- Policies (what we must do)
- Procedures (how we do it)
- Training records (who was trained)
- Change management logs (approval workflows)

**3. Outcome Evidence** (Results)
- Penetration test reports (security testing)
- Vulnerability scans (weakness detection)
- PII detection metrics (privacy controls)
- Incident response logs (breach handling)

This module focuses on **System Evidence** (immutable audit logs).

In [None]:
# Example: Mapping evidence types to compliance frameworks
evidence_mapping = {
    "SOX 404": {
        "System Evidence": ["Audit logs (7-year retention)", "Access control logs", "Configuration change logs"],
        "Process Evidence": ["Financial reporting policies", "Change management procedures"],
        "Outcome Evidence": ["Internal control test results", "IT general controls testing"]
    },
    "ISO 27001": {
        "System Evidence": ["Event logs (A.12.4.1)", "Access logs (A.9.4.1)"],
        "Process Evidence": ["Information security policy", "Risk assessment procedures"],
        "Outcome Evidence": ["Penetration test reports", "Vulnerability scan results"]
    },
    "GDPR": {
        "System Evidence": ["Records of processing activities (Article 30)", "Breach notification logs"],
        "Process Evidence": ["Privacy policy", "Data protection impact assessments"],
        "Outcome Evidence": ["PII detection metrics", "Data subject request logs"]
    }
}

# Display mapping
for framework, categories in evidence_mapping.items():
    print(f"\n{framework}:")
    for category, examples in categories.items():
        print(f"  {category}:")
        for example in examples:
            print(f"    - {example}")

# Expected:
# SOX 404:
#   System Evidence:
#     - Audit logs (7-year retention)
#     - Access control logs
#     - Configuration change logs
#   Process Evidence:
#     - Financial reporting policies
#     - Change management procedures
#   Outcome Evidence:
#     - Internal control test results
#     - IT general controls testing
# (... similar for ISO 27001, GDPR)

**SAVED_SECTION:2**

## Section 3: Implementing Immutable Audit Trails (Hash Chains)

### The Problem with Traditional Logging

Traditional logs are **mutable**—they can be modified or deleted:
- Log rotation deletes old entries → gaps in evidence
- Database UPDATE/DELETE allowed → tampering possible
- No integrity proof → auditor questions authenticity

### The Solution: Cryptographic Hash Chaining

Each log entry contains:
```
current_hash = SHA-256(current_event_data + previous_hash)
```

This creates an immutable chain:
```
Genesis Block → Event 1 → Event 2 → Event 3 → ...
(hash = 0...0)   (hash A)   (hash B)   (hash C)
```

**Properties:**
- **Tamper Detection:** Modifying any event breaks all subsequent hashes
- **Collision Resistance:** Astronomically unlikely to find two inputs with same hash
- **Avalanche Effect:** Tiny change in input creates completely different hash

In [None]:
import hashlib
import json

# Demonstrate hash properties
def compute_sha256(data: str) -> str:
    """Compute SHA-256 hash of input string."""
    return hashlib.sha256(data.encode('utf-8')).hexdigest()

# Example 1: Deterministic (same input = same hash)
hash1 = compute_sha256("hello world")
hash2 = compute_sha256("hello world")
print(f"Hash 1: {hash1}")
print(f"Hash 2: {hash2}")
print(f"Same? {hash1 == hash2}\n")

# Example 2: Avalanche effect (tiny change = completely different hash)
hash_a = compute_sha256("hello world")
hash_b = compute_sha256("hello world!")  # Added one character
print(f"Hash A: {hash_a}")
print(f"Hash B: {hash_b}")
print(f"Difference: {sum(c1 != c2 for c1, c2 in zip(hash_a, hash_b))} / 64 characters changed\n")

# Example 3: Hash chain simulation
events = [
    {"type": "login", "user": "alice"},
    {"type": "query", "user": "alice", "query": "Q3 revenue"},
    {"type": "logout", "user": "alice"}
]

previous_hash = "0" * 64  # Genesis block
print("Hash Chain:")
for i, event in enumerate(events):
    event_str = json.dumps(event, sort_keys=True)
    current_hash = compute_sha256(event_str + previous_hash)
    print(f"  Event {i+1}: {event_str[:30]}... → Hash: {current_hash[:16]}...")
    previous_hash = current_hash

# Expected:
# Hash 1: (64 character hex string)
# Hash 2: (same 64 character hex string)
# Same? True
# Hash A: (64 character hex string)
# Hash B: (completely different 64 character hex string)
# Difference: ~50-60 / 64 characters changed
# Hash Chain:
#   Event 1: ... → Hash: ...
#   Event 2: ... → Hash: ...
#   Event 3: ... → Hash: ...

**SAVED_SECTION:3**

## Section 4: Building the AuditEvent Dataclass

The `AuditEvent` dataclass represents a single audit log entry with all required fields for compliance.

In [None]:
from src.l3_m1_compliance_foundations_rag_systems import AuditEvent

# Create an audit event
event = AuditEvent(
    event_type="document_ingested",
    user_id="system_pipeline",
    resource_id="financial_report_q3.pdf",
    action="create",
    metadata={
        "contains_pii": False,
        "sensitivity": "confidential",
        "document_size_bytes": 2457600
    }
)

# Compute hash
event.current_hash = event.compute_hash()

# Display event
print("AuditEvent Created:")
print(f"  Event Type: {event.event_type}")
print(f"  User ID: {event.user_id}")
print(f"  Resource ID: {event.resource_id}")
print(f"  Action: {event.action}")
print(f"  Timestamp: {event.timestamp}")
print(f"  Correlation ID: {event.correlation_id}")
print(f"  Metadata: {event.metadata}")
print(f"  Previous Hash: {event.previous_hash[:16]}... (genesis)")
print(f"  Current Hash: {event.current_hash[:16]}...")

# Expected:
# AuditEvent Created:
#   Event Type: document_ingested
#   User ID: system_pipeline
#   Resource ID: financial_report_q3.pdf
#   Action: create
#   Timestamp: (ISO 8601 timestamp)
#   Correlation ID: (UUID v4)
#   Metadata: {'contains_pii': False, 'sensitivity': 'confidential', ...}
#   Previous Hash: 0000000000000000... (genesis)
#   Current Hash: (16 characters of 64-char hash)...

**SAVED_SECTION:4**

## Section 5: Building the AuditTrail Class

The `AuditTrail` class manages the append-only log with hash chain integrity.

In [None]:
from src.l3_m1_compliance_foundations_rag_systems import AuditTrail

# Initialize audit trail (in-memory mode for demonstration)
audit = AuditTrail()

print("AuditTrail Initialized:")
print(f"  Latest Hash: {audit._latest_hash[:16]}... (genesis)")
print(f"  Event Count: {audit._event_count}")
print(f"  Storage Mode: In-Memory (for testing)")

# Expected:
# AuditTrail Initialized:
#   Latest Hash: 0000000000000000... (genesis)
#   Event Count: 0
#   Storage Mode: In-Memory (for testing)

**SAVED_SECTION:5**

## Section 6: Logging Events with Hash Chaining

Now let's log some events and observe the hash chain forming.

In [None]:
from src.l3_m1_compliance_foundations_rag_systems import EventType

# Log multiple events
events_to_log = [
    {
        "event_type": EventType.DOCUMENT_INGESTED.value,
        "user_id": "system_pipeline",
        "resource_id": "financial_report_q3.pdf",
        "action": "create",
        "metadata": {"contains_pii": False, "sensitivity": "confidential"}
    },
    {
        "event_type": EventType.QUERY_EXECUTED.value,
        "user_id": "john.doe@example.com",
        "resource_id": "query_12345",
        "action": "execute",
        "metadata": {"query_text": "What were Q3 revenue figures?"}
    },
    {
        "event_type": EventType.PII_DETECTED.value,
        "user_id": "pii_scanner_service",
        "resource_id": "employee_data.xlsx",
        "action": "read",
        "metadata": {"pii_types": ["email", "phone"], "pii_count": 247}
    },
    {
        "event_type": EventType.ACCESS_GRANTED.value,
        "user_id": "jane.smith@example.com",
        "resource_id": "compliance_dashboard",
        "action": "read",
        "metadata": {"role": "compliance_officer", "mfa_verified": True}
    },
    {
        "event_type": EventType.CONFIGURATION_CHANGED.value,
        "user_id": "admin@example.com",
        "resource_id": "audit_retention_policy",
        "action": "update",
        "metadata": {"old_retention_days": 365, "new_retention_days": 2555}
    }
]

print("Logging Events and Building Hash Chain:\n")
logged_events = []
for i, event_data in enumerate(events_to_log):
    event = audit.log_event(**event_data)
    logged_events.append(event)
    
    print(f"Event {i+1}: {event.event_type}")
    print(f"  User: {event.user_id}")
    print(f"  Previous Hash: {event.previous_hash[:16]}...")
    print(f"  Current Hash: {event.current_hash[:16]}...")
    print()

print(f"\nTotal Events Logged: {audit._event_count}")
print(f"Latest Hash: {audit._latest_hash[:16]}...")

# Expected:
# Logging Events and Building Hash Chain:
#
# Event 1: document_ingested
#   User: system_pipeline
#   Previous Hash: 0000000000000000... (genesis)
#   Current Hash: (hash A)
#
# Event 2: query_executed
#   User: john.doe@example.com
#   Previous Hash: (hash A)
#   Current Hash: (hash B)
# (... similar for remaining events)
#
# Total Events Logged: 5
# Latest Hash: (hash E)...

**SAVED_SECTION:6**

## Section 7: Verifying Chain Integrity

The critical feature: Detect if any event has been tampered with.

In [None]:
# Verify integrity of valid chain
print("Verifying Hash Chain Integrity (Valid Chain):\n")
is_valid, message = audit.verify_chain_integrity()

print(f"Integrity Check Result: {is_valid}")
print(f"Message: {message}\n")

# Now simulate tampering
print("Simulating Tampering (Modifying Event 2):\n")
# Save original value
original_metadata = audit._in_memory_chain[1].metadata.copy()
# Tamper with event
audit._in_memory_chain[1].metadata["tampered"] = True

# Verify again
is_valid_tampered, message_tampered = audit.verify_chain_integrity()

print(f"Integrity Check Result: {is_valid_tampered}")
print(f"Message: {message_tampered}\n")

# Restore original (so rest of notebook works)
audit._in_memory_chain[1].metadata = original_metadata

print("Explanation:")
print("  - Valid chain: All hashes match, links intact")
print("  - Tampered chain: Event 2 modified → hash mismatch → subsequent hashes invalid")
print("  - This is mathematical proof of tampering (cannot be faked)")

# Expected:
# Verifying Hash Chain Integrity (Valid Chain):
#
# Integrity Check Result: True
# Message: ✅ Hash chain verified: 5 events intact, no tampering detected
#
# Simulating Tampering (Modifying Event 2):
#
# Integrity Check Result: False
# Message: ❌ Hash mismatch at event 1: ...
#
# Explanation:
#   - Valid chain: All hashes match, links intact
#   - Tampered chain: Event 2 modified → hash mismatch → subsequent hashes invalid
#   - This is mathematical proof of tampering (cannot be faked)

**SAVED_SECTION:7**

## Section 8: Generating Compliance Reports

The real-world use case: Respond to audit requests in seconds.

In [None]:
# Generate comprehensive compliance report
report = audit.generate_compliance_report()

print("Compliance Report Summary:\n")
print(f"Total Events: {report['summary']['total_events']}")
print(f"Date Range: {report['summary']['date_range']['start']} to {report['summary']['date_range']['end']}")
print(f"Unique Users: {report['summary']['unique_users']}")
print(f"Unique Resources: {report['summary']['unique_resources']}")
print(f"\nEvent Type Distribution:")
for event_type, count in report['summary']['event_type_distribution'].items():
    print(f"  {event_type}: {count}")

print(f"\nCompliance Statement:")
print(f"  Chain Integrity: {report['compliance_statement']['chain_integrity']}")
print(f"  Message: {report['compliance_statement']['message']}")
print(f"  Standards: {report['compliance_statement']['audit_standard']}")

print(f"\nReport Metadata:")
print(f"  Generated At: {report['metadata']['report_generated_at']}")
print(f"  Total Events in System: {report['metadata']['total_events_in_system']}")
print(f"  Events in Report: {report['metadata']['events_in_report']}")

# Expected:
# Compliance Report Summary:
#
# Total Events: 5
# Date Range: (timestamp) to (timestamp)
# Unique Users: 5
# Unique Resources: 5
#
# Event Type Distribution:
#   document_ingested: 1
#   query_executed: 1
#   pii_detected: 1
#   access_granted: 1
#   configuration_changed: 1
#
# Compliance Statement:
#   Chain Integrity: VERIFIED
#   Message: ✅ Hash chain verified: 5 events intact, no tampering detected
#   Standards: SOX 404, ISO 27001 A.12.4.1, SOC 2 CC7.2, GDPR Article 30
#
# Report Metadata:
#   Generated At: (ISO 8601 timestamp)
#   Total Events in System: 5
#   Events in Report: 5

**SAVED_SECTION:8**

## Section 9: Framework-Specific Reports (SOX 404, ISO 27001)

Different compliance frameworks require different event types and formats.

In [None]:
from src.l3_m1_compliance_foundations_rag_systems import ComplianceReportGenerator
from datetime import datetime, timezone

# Initialize report generator
report_gen = ComplianceReportGenerator(audit)

# Generate SOX 404 report (focuses on financial controls)
print("SOX Section 404 Compliance Report:\n")
sox_report = report_gen.generate_sox_404_report(
    start_date="2024-01-01T00:00:00Z",
    end_date="2024-12-31T23:59:59Z"
)

print(f"Framework: {sox_report['framework']}")
print(f"Retention Requirement: {sox_report['retention_requirement']}")
print(f"Events Captured: {sox_report['summary']['total_events']}")
print(f"Event Types:")
for event_type, count in sox_report['summary']['event_type_distribution'].items():
    print(f"  - {event_type}: {count}")

print("\n" + "="*60 + "\n")

# Generate ISO 27001 report (information security controls)
print("ISO 27001 Control Evidence Report:\n")
iso_report = report_gen.generate_iso_27001_report(
    control="A.12.4.1",
    start_date="2024-01-01T00:00:00Z",
    end_date="2024-12-31T23:59:59Z"
)

print(f"Framework: {iso_report['framework']}")
print(f"Control: {iso_report['control']} (Event Logging)")
print(f"Events Captured: {iso_report['summary']['total_events']}")
print(f"Compliance Status: {iso_report['compliance_statement']['chain_integrity']}")

# Expected:
# SOX Section 404 Compliance Report:
#
# Framework: SOX Section 404
# Retention Requirement: 7 years
# Events Captured: (count of relevant events)
# Event Types:
#   - document_ingested: 1
#   - access_granted: 1
#   - configuration_changed: 1
#   - pii_detected: 1
#
# ============================================================
#
# ISO 27001 Control Evidence Report:
#
# Framework: ISO 27001
# Control: A.12.4.1 (Event Logging)
# Events Captured: 5
# Compliance Status: VERIFIED

**SAVED_SECTION:9**

## Section 10: Vendor Risk Assessment

Evaluate third-party AI vendors (OpenAI, Pinecone, etc.) against compliance requirements.

In [None]:
from src.l3_m1_compliance_foundations_rag_systems import VendorRiskAssessment

# Initialize assessor
assessor = VendorRiskAssessment()

# Display risk criteria
print("Vendor Risk Assessment Criteria:\n")
for i, (criterion, question) in enumerate(assessor.RISK_CRITERIA.items(), 1):
    print(f"{i}. {criterion}: {question}")

print("\n" + "="*80 + "\n")

# Assess OpenAI
openai_responses = {
    "data_residency": False,  # No India data center guarantee
    "soc2_certified": True,
    "gdpr_compliant": True,
    "encryption_at_rest": True,
    "encryption_in_transit": True,
    "access_logs": True,
    "data_retention": True,
    "data_deletion": True,
    "incident_response": True,
    "subprocessor_disclosure": True
}

openai_assessment = assessor.assess_vendor(
    vendor_name="OpenAI",
    responses=openai_responses
)

print("OpenAI Vendor Risk Assessment:\n")
print(f"Vendor: {openai_assessment['vendor_name']}")
print(f"Assessment Date: {openai_assessment['assessment_date']}")
print(f"Risk Score: {openai_assessment['risk_score']:.1f}%")
print(f"Risk Level: {openai_assessment['risk_level']}")
print(f"Recommendation: {openai_assessment['recommendation']}")
print(f"Criteria Passed: {openai_assessment['criteria_passed']} / {openai_assessment['criteria_total']}")
if openai_assessment['failed_criteria']:
    print(f"Failed Criteria: {', '.join(openai_assessment['failed_criteria'])}")

print("\n" + "="*80 + "\n")

# Assess a hypothetical non-compliant vendor
noncompliant_responses = {
    "data_residency": False,
    "soc2_certified": False,
    "gdpr_compliant": False,
    "encryption_at_rest": True,
    "encryption_in_transit": True,
    "access_logs": False,
    "data_retention": False,
    "data_deletion": False,
    "incident_response": False,
    "subprocessor_disclosure": False
}

noncompliant_assessment = assessor.assess_vendor(
    vendor_name="NonCompliantVendor",
    responses=noncompliant_responses
)

print("NonCompliantVendor Risk Assessment:\n")
print(f"Vendor: {noncompliant_assessment['vendor_name']}")
print(f"Risk Score: {noncompliant_assessment['risk_score']:.1f}%")
print(f"Risk Level: {noncompliant_assessment['risk_level']}")
print(f"Recommendation: {noncompliant_assessment['recommendation']}")
print(f"Failed Criteria: {', '.join(noncompliant_assessment['failed_criteria'])}")

# Expected:
# Vendor Risk Assessment Criteria:
#
# 1. data_residency: Does vendor guarantee GCC data stays in India/compliant region?
# 2. soc2_certified: Is vendor SOC 2 Type II certified?
# (... all 10 criteria)
#
# ============================================================
#
# OpenAI Vendor Risk Assessment:
#
# Vendor: OpenAI
# Assessment Date: (timestamp)
# Risk Score: 90.0%
# Risk Level: LOW
# Recommendation: Approved for production use
# Criteria Passed: 9 / 10
# Failed Criteria: data_residency
#
# ============================================================
#
# NonCompliantVendor Risk Assessment:
#
# Vendor: NonCompliantVendor
# Risk Score: 20.0%
# Risk Level: CRITICAL
# Recommendation: Not approved - find alternative vendor
# Failed Criteria: (list of 8 failed criteria)

**SAVED_SECTION:10**

## Section 11: Request Tracing with Correlation IDs

Trace a single request across multiple components (RAG query → vector DB → LLM → response).

In [None]:
from src.l3_m1_compliance_foundations_rag_systems import generate_correlation_id

# Simulate a RAG request flow
correlation_id = generate_correlation_id()
print(f"Generated Correlation ID: {correlation_id}\n")

# Create audit trail for correlation demo
audit_corr = AuditTrail()

print("Simulating RAG Request Flow with Correlation ID:\n")

# Step 1: User submits query
event1 = audit_corr.log_event(
    event_type="query_received",
    user_id="alice@example.com",
    resource_id="rag_api",
    action="execute",
    correlation_id=correlation_id,
    metadata={"query": "What were Q3 2024 revenue figures?"}
)
print(f"Step 1: Query received - {event1.event_type} (correlation: {event1.correlation_id})")

# Step 2: Vector DB search
event2 = audit_corr.log_event(
    event_type="vector_search",
    user_id="rag_system",
    resource_id="pinecone_index",
    action="read",
    correlation_id=correlation_id,
    metadata={"top_k": 5, "results_found": 5}
)
print(f"Step 2: Vector search - {event2.event_type} (correlation: {event2.correlation_id})")

# Step 3: LLM call
event3 = audit_corr.log_event(
    event_type="llm_invocation",
    user_id="rag_system",
    resource_id="openai_gpt4",
    action="execute",
    correlation_id=correlation_id,
    metadata={"model": "gpt-4", "tokens_used": 342}
)
print(f"Step 3: LLM invocation - {event3.event_type} (correlation: {event3.correlation_id})")

# Step 4: Response returned
event4 = audit_corr.log_event(
    event_type="response_sent",
    user_id="rag_system",
    resource_id="rag_api",
    action="create",
    correlation_id=correlation_id,
    metadata={"response_time_ms": 1250}
)
print(f"Step 4: Response sent - {event4.event_type} (correlation: {event4.correlation_id})")

print("\n" + "="*80 + "\n")

# Retrieve all events for this request
print(f"Tracing Request (Correlation ID: {correlation_id}):\n")
correlated_events = audit_corr.get_events_by_correlation_id(correlation_id)

print(f"Found {len(correlated_events)} events for this request:\n")
for i, event in enumerate(correlated_events, 1):
    print(f"{i}. {event.event_type}")
    print(f"   User: {event.user_id}")
    print(f"   Resource: {event.resource_id}")
    print(f"   Timestamp: {event.timestamp}")
    print(f"   Metadata: {event.metadata}")
    print()

print("Use Case: When investigating slow queries or errors, trace the entire")
print("request path across all components using the correlation ID.")

# Expected:
# Generated Correlation ID: (UUID v4)
#
# Simulating RAG Request Flow with Correlation ID:
#
# Step 1: Query received - query_received (correlation: <UUID>)
# Step 2: Vector search - vector_search (correlation: <UUID>)
# Step 3: LLM invocation - llm_invocation (correlation: <UUID>)
# Step 4: Response sent - response_sent (correlation: <UUID>)
#
# ============================================================
#
# Tracing Request (Correlation ID: <UUID>):
#
# Found 4 events for this request:
#
# 1. query_received
#    User: alice@example.com
#    Resource: rag_api
#    Timestamp: (ISO 8601)
#    Metadata: {'query': 'What were Q3 2024 revenue figures?'}
# (... similar for other 3 events)

**SAVED_SECTION:11**

## Section 12: Real-World GCC Scenario

Let's simulate a realistic compliance scenario combining all concepts.

In [None]:
print("Real-World Scenario: Multi-Framework Compliance Audit\n")
print("="*80)
print("\nContext:")
print("  - Your GCC serves a US parent company (SOX 404 + SOC 2)")
print("  - India operations subject to DPDPA")
print("  - External audit scheduled for next week")
print("  - Need evidence for Q1-Q4 2024")
print("\nAuditor Requests:")
print("  1. Prove audit logs haven't been tampered with")
print("  2. Show all PII detection events (DPDPA)")
print("  3. Show access control events (SOX 404)")
print("  4. Vendor risk assessment for third-party AI services (SOC 2)")
print("\n" + "="*80 + "\n")

# Create comprehensive audit trail
audit_real = AuditTrail()

# Simulate 3 months of activity (sample events)
print("Step 1: Loading audit events from Q1-Q4 2024...\n")

sample_events = [
    # January: Document ingestion
    {"event_type": EventType.DOCUMENT_INGESTED.value, "user_id": "pipeline", "resource_id": "jan_report.pdf", "action": "create"},
    {"event_type": EventType.PII_DETECTED.value, "user_id": "scanner", "resource_id": "jan_report.pdf", "action": "read"},
    
    # February: Access control
    {"event_type": EventType.ACCESS_GRANTED.value, "user_id": "john@example.com", "resource_id": "dashboard", "action": "read"},
    {"event_type": EventType.ACCESS_DENIED.value, "user_id": "jane@example.com", "resource_id": "admin_panel", "action": "read"},
    
    # March: Configuration changes
    {"event_type": EventType.CONFIGURATION_CHANGED.value, "user_id": "admin@example.com", "resource_id": "retention_policy", "action": "update"},
    
    # April-December: More events
    {"event_type": EventType.QUERY_EXECUTED.value, "user_id": "analyst@example.com", "resource_id": "rag_query_1", "action": "execute"},
    {"event_type": EventType.PII_DETECTED.value, "user_id": "scanner", "resource_id": "employee_data.csv", "action": "read"},
    {"event_type": EventType.DATA_EXPORTED.value, "user_id": "compliance@example.com", "resource_id": "audit_logs", "action": "export"},
]

for event_data in sample_events:
    audit_real.log_event(**event_data)

print(f"Loaded {audit_real._event_count} events\n")

# Request 1: Prove integrity
print("Auditor Request 1: Prove logs haven't been tampered with\n")
is_valid, message = audit_real.verify_chain_integrity()
print(f"Response: {message}\n")

# Request 2: PII detection events (DPDPA)
print("Auditor Request 2: Show all PII detection events (DPDPA compliance)\n")
pii_report = audit_real.generate_compliance_report(
    event_types=[EventType.PII_DETECTED.value]
)
print(f"Response: Found {pii_report['summary']['total_events']} PII detection events")
print(f"Events: {[e['resource_id'] for e in pii_report['events']]}\n")

# Request 3: Access control events (SOX 404)
print("Auditor Request 3: Show access control events (SOX 404)\n")
access_report = audit_real.generate_compliance_report(
    event_types=[EventType.ACCESS_GRANTED.value, EventType.ACCESS_DENIED.value]
)
print(f"Response: Found {access_report['summary']['total_events']} access control events")
print(f"Distribution: {access_report['summary']['event_type_distribution']}\n")

# Request 4: Vendor risk assessment (SOC 2)
print("Auditor Request 4: Vendor risk assessment for OpenAI (SOC 2)\n")
vendor_assessment = assessor.assess_vendor(
    vendor_name="OpenAI",
    responses=openai_responses
)
print(f"Response: Vendor={vendor_assessment['vendor_name']}, Risk Level={vendor_assessment['risk_level']}, Score={vendor_assessment['risk_score']:.1f}%")
print(f"Recommendation: {vendor_assessment['recommendation']}\n")

print("="*80)
print("\nAudit Result: ✅ PASSED")
print("  - Hash chain integrity: VERIFIED")
print("  - PII events: DOCUMENTED")
print("  - Access controls: EVIDENCED")
print("  - Vendor risk: ASSESSED")
print("\nTime to generate all evidence: <60 seconds")
print("Manual effort: 0 hours (vs 40-80 hours without automation)")

# Expected:
# Real-World Scenario: Multi-Framework Compliance Audit
#
# ============================================================
#
# Context:
#   - Your GCC serves a US parent company (SOX 404 + SOC 2)
#   - India operations subject to DPDPA
#   - External audit scheduled for next week
#   - Need evidence for Q1-Q4 2024
#
# Auditor Requests:
#   1. Prove audit logs haven't been tampered with
#   2. Show all PII detection events (DPDPA)
#   3. Show access control events (SOX 404)
#   4. Vendor risk assessment for third-party AI services (SOC 2)
#
# ============================================================
#
# Step 1: Loading audit events from Q1-Q4 2024...
#
# Loaded 8 events
#
# Auditor Request 1: Prove logs haven't been tampered with
#
# Response: ✅ Hash chain verified: 8 events intact, no tampering detected
#
# Auditor Request 2: Show all PII detection events (DPDPA compliance)
#
# Response: Found 2 PII detection events
# Events: ['jan_report.pdf', 'employee_data.csv']
#
# Auditor Request 3: Show access control events (SOX 404)
#
# Response: Found 2 access control events
# Distribution: {'access_granted': 1, 'access_denied': 1}
#
# Auditor Request 4: Vendor risk assessment for OpenAI (SOC 2)
#
# Response: Vendor=OpenAI, Risk Level=LOW, Score=90.0%
# Recommendation: Approved for production use
#
# ============================================================
#
# Audit Result: ✅ PASSED
#   - Hash chain integrity: VERIFIED
#   - PII events: DOCUMENTED
#   - Access controls: EVIDENCED
#   - Vendor risk: ASSESSED
#
# Time to generate all evidence: <60 seconds
# Manual effort: 0 hours (vs 40-80 hours without automation)

**SAVED_SECTION:12**

## Summary & Key Takeaways

### What You Built

1. **Immutable Audit Trail** - Cryptographic hash chains (SHA-256) that detect tampering
2. **AuditEvent & AuditTrail Classes** - Production-ready logging infrastructure
3. **Compliance Report Generator** - Framework-specific reports (SOX, ISO, SOC 2, GDPR)
4. **Vendor Risk Assessor** - Objective evaluation of third-party AI services
5. **Request Tracing** - Correlation IDs to trace requests across components

### Why This Matters for GCCs

**Business Impact:**
- Reduces audit prep from **2-4 weeks to <1 hour**
- Prevents audit findings (cost: ₹50L-5Cr in remediation)
- Enables SOC 2 certification (unlocks enterprise clients worth ₹10-50Cr contracts)

**Technical Excellence:**
- Mathematical proof of log integrity (impresses auditors)
- Automated evidence collection (no manual effort)
- Multi-framework support (SOX, ISO, SOC 2, GDPR, DPDPA)

**Career Impact:**
- Distinguishes you from developers who ignore compliance
- Positions you for senior IC roles (L5/L6 at FAANG)
- Demonstrates business acumen + technical depth

### Next Steps

1. **Integrate with your RAG system** - Add audit logging to every operation
2. **Set up PostgreSQL** - Move from in-memory to production database
3. **Configure S3 Object Lock** - Enable immutable evidence storage
4. **Schedule daily exports** - Automate evidence collection
5. **Practice PractaThon Exercise 1.4** - Build multi-tenant compliance system

### Additional Resources

- **README.md** - Comprehensive documentation with cost estimates and real-world examples
- **Augmented Script** - Deep dive into implementation details
- **API Documentation** - FastAPI server endpoints at `/docs`
- **Test Suite** - 30+ tests covering all functionality

---

**Congratulations!** You now have production-ready compliance evidence infrastructure. This is the foundation for building audit-ready RAG systems in GCC environments.

**Remember:** Compliance is not a checkbox—it's a competitive advantage. Build it from Day 1, not as an afterthought.