# L3 M1.4: Compliance Documentation & Evidence

**Module:** L3 M1 - Compliance Foundations for RAG Systems  
**Video:** M1.4 - Compliance Documentation & Evidence  
**Duration:** 40-45 minutes  
**Track:** GCC Compliance Basics

## Learning Arc

**By the end of this notebook, you will:**

1. **Understand compliance evidence types** (system, process, outcome)
2. **Implement immutable audit trails** using SHA-256 hash chains
3. **Build automated evidence collection** pipelines for regulatory compliance
4. **Create compliance documentation structures** with version control
5. **Conduct vendor risk assessments** for third-party AI providers
6. **Generate multi-framework reports** (SOX, SOC 2, ISO 27001, GDPR, DPDPA)

**Prerequisites:**
- Generic CCC Level 1 complete (RAG fundamentals, vector DB, production patterns)
- GCC Compliance M1.1 (Regulatory Landscape)
- GCC Compliance M1.2 (Data Privacy in RAG)
- GCC Compliance M1.3 (Access Control & RBAC)

**Key Innovation:** Cryptographic hash chaining creates mathematically provable tamper-resistant audit logs.

## Setup & Configuration

This notebook demonstrates compliance evidence concepts using **OFFLINE mode** (no external AI APIs required).  
PostgreSQL and AWS S3 are optional for development - the module works with in-memory storage.

In [None]:
import sys
import os

# Add project root to path
sys.path.insert(0, os.path.abspath('..'))

# Check for environment configuration
from config import validate_config, get_config

is_valid = validate_config()
config = get_config()

print("Configuration Status:")
print(f"  Valid: {is_valid}")
print(f"  Module: {config['module_name']}")
print(f"  Version: {config['version']}")
print(f"  Mode: {'OFFLINE' if config['offline_mode'] else 'ONLINE'}")

if config['offline_mode']:
    print("\n‚ö†Ô∏è  Running in OFFLINE mode (no external AI services)")
    print("PostgreSQL/S3 are optional - using in-memory storage for demos")

## Section 1: Understanding Compliance Evidence

Compliance evidence comes in three categories:

### 1. System Evidence (Technical Artifacts)
- Audit logs with cryptographic integrity
- Database schemas and configurations
- Network diagrams and architecture docs
- Access control matrices

### 2. Process Evidence (Governance Documentation)
- Policies and procedures (version-controlled)
- Training records and certifications
- Incident response playbooks
- Change management workflows

### 3. Outcome Evidence (Results)
- Penetration test reports
- Vulnerability scan results
- PII detection metrics
- Business continuity test results

In [None]:
from src.l3_m1_compliance_foundations_rag_systems import (
    AuditEvent,
    create_audit_trail,
    EventType,
    ComplianceFramework
)

# Example: Create a simple audit event
event = AuditEvent(
    event_type="document_accessed",
    user_id="analyst_jane_doe",
    resource_id="financial_report_q3_2024.pdf",
    action="read",
    metadata={
        "ip_address": "192.168.1.105",
        "sensitivity_level": "confidential"
    }
)

print("Audit Event Created:")
print(f"  Event Type: {event.event_type}")
print(f"  User: {event.user_id}")
print(f"  Resource: {event.resource_id}")
print(f"  Timestamp: {event.timestamp}")
print(f"  Correlation ID: {event.correlation_id}")

# Expected: Event created with auto-generated timestamp and correlation ID

SAVED_SECTION:1

## Section 2: Immutable Audit Trails with Hash Chaining

**Problem:** Traditional database logs can be modified or deleted, making them inadmissible as audit evidence.

**Solution:** Cryptographic hash chaining creates a tamper-evident chain where modifying any event breaks the entire chain.

### Hash Chain Mechanism

```
Event 1: hash(event_data_1 + null) ‚Üí hash_1
Event 2: hash(event_data_2 + hash_1) ‚Üí hash_2
Event 3: hash(event_data_3 + hash_2) ‚Üí hash_3
```

**Key Properties:**
- **Append-only:** No updates or deletes allowed
- **Cryptographic linking:** Each event contains SHA-256 hash of previous event
- **Tamper-evident:** Modifying any event breaks the chain
- **Verifiable:** Chain integrity can be recomputed and verified anytime

In [None]:
# Create audit trail
audit_trail = create_audit_trail()

# Log first event
event1 = audit_trail.log_event(
    event_type="user_login",
    user_id="analyst_jane_doe",
    resource_id="rag_system",
    action="login",
    metadata={"ip_address": "192.168.1.105"}
)

print("Event 1:")
print(f"  Previous Hash: {event1.previous_hash}")
print(f"  Current Hash: {event1.current_hash[:32]}...")

# Log second event (chained to first)
event2 = audit_trail.log_event(
    event_type="document_accessed",
    user_id="analyst_jane_doe",
    resource_id="financial_report.pdf",
    action="read"
)

print("\nEvent 2:")
print(f"  Previous Hash: {event2.previous_hash[:32]}...")
print(f"  Current Hash: {event2.current_hash[:32]}...")

# Verify hash chain linkage
print("\nHash Chain Verification:")
print(f"  Event1.current_hash == Event2.previous_hash: {event1.current_hash == event2.previous_hash}")

# Expected: Event2's previous_hash matches Event1's current_hash (chain intact)

SAVED_SECTION:2

## Section 3: Detecting Tampering with Integrity Verification

The power of hash chaining is **tamper detection**. If anyone modifies an event after creation, the hash chain breaks.

In [None]:
from src.l3_m1_compliance_foundations_rag_systems import verify_audit_integrity

# Create fresh trail with multiple events
audit_trail = create_audit_trail()

for i in range(5):
    audit_trail.log_event(
        event_type="test_event",
        user_id=f"user_{i}",
        resource_id=f"resource_{i}",
        action="test"
    )

print(f"Logged {len(audit_trail.events)} events")

# Verify integrity (should pass)
is_valid, error_msg = verify_audit_integrity(audit_trail)
print(f"\nIntegrity Check #1: {is_valid}")
print(f"  Message: {'‚úÖ Chain intact' if is_valid else f'‚ùå {error_msg}'}")

# Simulate tampering (modify event #2)
print("\n‚ö†Ô∏è  Simulating tampering (modifying event #2)...")
audit_trail.events[2].user_id = "hacker_modified_this"

# Verify integrity (should fail)
is_valid, error_msg = verify_audit_integrity(audit_trail)
print(f"\nIntegrity Check #2: {is_valid}")
print(f"  Message: {'‚úÖ Chain intact' if is_valid else f'‚ùå {error_msg}'}")

# Expected: Second check fails with "Hash mismatch" error

SAVED_SECTION:3

## Section 4: Compliance Report Generation

Different compliance frameworks require different reporting formats:

- **SOX Section 404:** Quarterly ITGC control evidence
- **SOC 2 Type II:** 12-month Trust Service Criteria evidence
- **ISO 27001:** Annual control implementation evidence
- **GDPR Article 30:** Records of processing activities
- **DPDPA:** Breach notification evidence (6-hour window)

In [None]:
from src.l3_m1_compliance_foundations_rag_systems import (
    ComplianceReporter,
    generate_compliance_report
)
from datetime import datetime, timedelta

# Create audit trail with sample events
audit_trail = create_audit_trail()

# Log various compliance-relevant events
events_data = [
    ("user_login", "analyst_jane", "system", "login"),
    ("document_accessed", "analyst_jane", "financial_report.pdf", "read"),
    ("pii_accessed", "analyst_jane", "customer_db", "query"),
    ("config_changed", "admin_bob", "access_control", "update"),
    ("user_logout", "analyst_jane", "system", "logout"),
]

for event_type, user, resource, action in events_data:
    audit_trail.log_event(
        event_type=event_type,
        user_id=user,
        resource_id=resource,
        action=action
    )

print(f"Logged {len(audit_trail.events)} events\n")

# Generate SOX compliance report
report = generate_compliance_report(
    audit_trail=audit_trail,
    framework=ComplianceFramework.SOX
)

print("SOX Compliance Report:")
print(f"  Framework: {report['framework']}")
print(f"  Total Events: {report['total_events']}")
print(f"  Integrity Verified: {report['integrity_verified']}")
print(f"  Event Types: {report['event_types_summary']}")

# Expected: Report with all 5 events, integrity=True

SAVED_SECTION:4

## Section 5: SOX Section 404 Reporting

**SOX Section 404** requires quarterly evidence of internal control effectiveness over financial reporting.

**Key Controls (ITGCs):**
- **ITGC-01:** Access Controls (who can access financial data)
- **ITGC-02:** Change Management (approval for system changes)
- **ITGC-03:** Data Backup & Recovery (7-year retention)
- **ITGC-04:** Incident Response (security monitoring)

In [None]:
# Generate SOX report for specific fiscal period
reporter = ComplianceReporter(audit_trail)

sox_report = reporter.generate_sox_report(
    fiscal_year=2024,
    quarter=3
)

print("SOX Section 404 Report - FY2024 Q3\n")
print(f"Framework: {sox_report['framework']}")
print(f"Total Events: {sox_report['total_events']}")
print(f"Integrity Verified: {sox_report['integrity_verified']}")
print("\nSOX Controls:")
for control, status in sox_report['sox_controls'].items():
    print(f"  {control}: {status}")

# Expected: Report with ITGC controls mapped to audit events

SAVED_SECTION:5

## Section 6: SOC 2 Type II Reporting

**SOC 2 Type II** requires 12-month evidence of security controls effectiveness.

**Trust Service Criteria (TSC):**
- **CC6.1:** Logical and Physical Access Controls
- **CC6.2:** Prior to Issuing Credentials
- **CC6.3:** Provisioning and Modification
- **CC7.2:** Detection of Security Events

In [None]:
# Generate SOC 2 report for last 365 days
soc2_report = reporter.generate_soc2_report(
    report_period_days=365
)

print("SOC 2 Type II Report - Last 365 Days\n")
print(f"Framework: {soc2_report['framework']}")
print(f"Total Events: {soc2_report['total_events']}")
print(f"Integrity Verified: {soc2_report['integrity_verified']}")
print("\nTrust Service Criteria:")
for criterion, description in soc2_report['trust_service_criteria'].items():
    print(f"  {criterion}: {description}")

# Expected: Report with TSC controls mapped to audit events

SAVED_SECTION:6

## Section 7: Evidence Collection and Export

**Problem:** Manual evidence collection takes 2-4 weeks during audits.

**Solution:** Automated daily evidence exports organized by compliance framework.

### Evidence Collection Pipeline

1. **Daily Job (03:00 UTC):** Export logs, configs, test results
2. **Organize by Framework:** /sox/, /soc2/, /iso27001/
3. **Upload to S3 with Object Lock:** Immutable 7-year retention
4. **Generate Compliance Reports:** Pre-formatted for auditors

In [None]:
from src.l3_m1_compliance_foundations_rag_systems import EvidenceCollector

# Initialize evidence collector
collector = EvidenceCollector(s3_bucket="compliance-evidence")

# Define collection period (last 30 days)
end_date = datetime.utcnow()
start_date = end_date - timedelta(days=30)

print(f"Collecting evidence from {start_date.date()} to {end_date.date()}\n")

# Collect system evidence
system_evidence = collector.collect_system_evidence(
    audit_trail=audit_trail,
    start_date=start_date,
    end_date=end_date
)

print("System Evidence Collected:")
print(f"  Evidence Type: {system_evidence['evidence_type']}")
print(f"  Collection Date: {system_evidence['collection_date']}")
print(f"  Log Count: {system_evidence['artifacts']['log_count']}")
print(f"  Integrity Status: {system_evidence['artifacts']['integrity_status']}")

# Expected: Evidence package with audit logs and integrity status

SAVED_SECTION:7

## Section 8: Process and Outcome Evidence

Beyond system logs, compliance requires **process** and **outcome** evidence.

In [None]:
# Collect process evidence (policies/procedures)
policy_documents = [
    {
        "name": "Data Retention Policy",
        "version": "v2.0",
        "approved_date": "2024-01-15",
        "approver": "compliance_officer",
        "frameworks": ["SOX", "GDPR"]
    },
    {
        "name": "Access Control Policy",
        "version": "v1.5",
        "approved_date": "2024-02-01",
        "approver": "ciso",
        "frameworks": ["SOC2", "ISO27001"]
    },
    {
        "name": "Incident Response Playbook",
        "version": "v3.0",
        "approved_date": "2024-03-10",
        "approver": "security_lead",
        "frameworks": ["SOC2", "DPDPA"]
    }
]

process_evidence = collector.collect_process_evidence(
    policy_documents=policy_documents
)

print("Process Evidence Collected:")
print(f"  Policy Count: {process_evidence['artifacts']['policy_count']}")
for policy in process_evidence['artifacts']['policies']:
    print(f"  - {policy['name']} ({policy['version']})")

# Collect outcome evidence (test results)
test_results = [
    {
        "test": "Penetration Test",
        "date": "2024-09-01",
        "result": "PASS",
        "findings": "0 critical, 2 medium (remediated)"
    },
    {
        "test": "Vulnerability Scan",
        "date": "2024-09-15",
        "result": "PASS",
        "findings": "0 high, 5 low (accepted risk)"
    }
]

outcome_evidence = collector.collect_outcome_evidence(
    test_results=test_results
)

print("\nOutcome Evidence Collected:")
print(f"  Test Count: {outcome_evidence['artifacts']['test_count']}")
for test in outcome_evidence['artifacts']['results']:
    print(f"  - {test['test']}: {test['result']}")

# Expected: Process evidence (3 policies) + Outcome evidence (2 tests)

SAVED_SECTION:8

## Section 9: Evidence Package Export

Export complete evidence packages for auditor review.

In [None]:
# Export evidence package for SOX audit
export_package = collector.export_evidence_package(
    framework=ComplianceFramework.SOX,
    export_path="./exports/sox_2024_q3"
)

print("Evidence Package Exported:")
print(f"  Framework: {export_package['framework']}")
print(f"  Export Date: {export_package['export_date']}")
print(f"  Export Path: {export_package['export_path']}")
print(f"  Total Evidence Items: {export_package['total_evidence_items']}")
print("\nEvidence Breakdown:")
for evidence_type, items in export_package['evidence'].items():
    print(f"  {evidence_type.capitalize()}: {len(items)} items")

# Expected: Export package with system + process + outcome evidence

SAVED_SECTION:9

## Section 10: Vendor Risk Assessment

**Challenge:** GCC RAG systems often use third-party AI vendors (OpenAI, Anthropic, Pinecone).  
**Auditor Question:** "Do these vendors comply with your security standards?"

**Solution:** Structured vendor risk assessment with quantitative scoring.

In [None]:
from src.l3_m1_compliance_foundations_rag_systems import VendorRiskAssessment

# Initialize vendor risk assessor
assessor = VendorRiskAssessment()

# Assess OpenAI
openai_assessment = assessor.assess_vendor(
    vendor_name="OpenAI",
    services_used=["GPT-4", "Embeddings API", "Fine-tuning"],
    compliance_frameworks=[ComplianceFramework.SOC2, ComplianceFramework.GDPR],
    risk_criteria={
        "data_residency": {"weight": 0.3, "score": 0.7},  # US-only
        "soc2_certified": {"weight": 0.25, "score": 1.0},  # Yes
        "gdpr_compliant": {"weight": 0.25, "score": 0.8},  # DPA available
        "incident_history": {"weight": 0.2, "score": 0.9}   # No major breaches
    }
)

print("Vendor Risk Assessment: OpenAI\n")
print(f"Services Used: {', '.join(openai_assessment['services_used'])}")
print(f"Frameworks: {', '.join(openai_assessment['compliance_frameworks'])}")
print(f"\nRisk Score: {openai_assessment['overall_risk_score']:.2f} / 1.00")
print(f"Risk Level: {openai_assessment['risk_level']}")
print("\nRecommendations:")
for rec in openai_assessment['recommendations']:
    print(f"  - {rec}")

# Expected: Risk score ~0.83 (LOW risk), annual reassessment recommended

SAVED_SECTION:10

## Section 11: Multi-Vendor Comparison

Compare multiple vendors to make informed procurement decisions.

In [None]:
# Assess Pinecone
pinecone_assessment = assessor.assess_vendor(
    vendor_name="Pinecone",
    services_used=["Vector Database", "Hybrid Search"],
    compliance_frameworks=[ComplianceFramework.SOC2, ComplianceFramework.GDPR, ComplianceFramework.ISO27001],
    risk_criteria={
        "data_residency": {"weight": 0.3, "score": 0.9},  # Multi-region
        "soc2_certified": {"weight": 0.25, "score": 1.0},  # Yes
        "gdpr_compliant": {"weight": 0.25, "score": 1.0},  # EU residency available
        "incident_history": {"weight": 0.2, "score": 1.0}   # No incidents
    }
)

# Assess AWS
aws_assessment = assessor.assess_vendor(
    vendor_name="AWS",
    services_used=["S3", "RDS", "Lambda"],
    compliance_frameworks=[
        ComplianceFramework.SOX,
        ComplianceFramework.SOC2,
        ComplianceFramework.ISO27001,
        ComplianceFramework.GDPR
    ],
    risk_criteria={
        "data_residency": {"weight": 0.3, "score": 1.0},  # Global regions
        "soc2_certified": {"weight": 0.25, "score": 1.0},  # SOC 1/2/3
        "gdpr_compliant": {"weight": 0.25, "score": 1.0},  # GDPR DPA
        "incident_history": {"weight": 0.2, "score": 0.95}  # Rare incidents
    }
)

# Compare all vendors
print("Vendor Risk Comparison\n")
print(f"{'Vendor':<15} {'Risk Score':<12} {'Risk Level':<12} {'Status'}")
print("-" * 60)

for assessment in [openai_assessment, pinecone_assessment, aws_assessment]:
    score = assessment['overall_risk_score']
    level = assessment['risk_level']
    status = "‚úÖ Approved" if level == "LOW" else "‚ö†Ô∏è  Review" if level == "MEDIUM" else "‚ùå Rejected"
    print(f"{assessment['vendor_name']:<15} {score:<12.2f} {level:<12} {status}")

# Expected: All three vendors LOW risk, approved for use

SAVED_SECTION:11

## Section 12: End-to-End Compliance Workflow

**Real-World Scenario:** Your GCC faces simultaneous SOX and DPDPA audits.

**Challenge:** Generate evidence for both frameworks from the same audit trail.

Let's simulate a complete compliance workflow:

In [None]:
print("=" * 70)
print("COMPLIANCE WORKFLOW: Dual Audit Scenario")
print("=" * 70)
print()

# Step 1: Create comprehensive audit trail
print("Step 1: Creating Comprehensive Audit Trail")
full_audit_trail = create_audit_trail()

# Simulate 30 days of activity
event_types = [
    "user_login", "document_accessed", "pii_accessed",
    "config_changed", "security_alert", "user_logout"
]
users = ["analyst_jane", "analyst_bob", "admin_alice", "compliance_charlie"]

for i in range(50):
    full_audit_trail.log_event(
        event_type=event_types[i % len(event_types)],
        user_id=users[i % len(users)],
        resource_id=f"resource_{i}",
        action="test_action",
        metadata={"day": i % 30}
    )

print(f"  ‚úÖ Logged {len(full_audit_trail.events)} events")
print()

# Step 2: Verify hash chain integrity
print("Step 2: Verifying Hash Chain Integrity")
is_valid, error = verify_audit_integrity(full_audit_trail)
print(f"  {'‚úÖ' if is_valid else '‚ùå'} Integrity: {is_valid}")
print()

# Step 3: Generate SOX report
print("Step 3: Generating SOX Section 404 Report")
reporter = ComplianceReporter(full_audit_trail)
sox_report = reporter.generate_sox_report(fiscal_year=2024, quarter=3)
print(f"  ‚úÖ SOX Report: {sox_report['total_events']} events")
print(f"     Integrity: {sox_report['integrity_verified']}")
print(f"     Controls: {', '.join(sox_report['sox_controls'].keys())}")
print()

# Step 4: Generate GDPR report
print("Step 4: Generating GDPR Compliance Report")
gdpr_report = generate_compliance_report(
    audit_trail=full_audit_trail,
    framework=ComplianceFramework.GDPR
)
print(f"  ‚úÖ GDPR Report: {gdpr_report['total_events']} events")
print(f"     Integrity: {gdpr_report['integrity_verified']}")
print()

# Step 5: Collect all evidence types
print("Step 5: Collecting Evidence (System + Process + Outcome)")
collector = EvidenceCollector()
now = datetime.utcnow()

collector.collect_system_evidence(full_audit_trail, now - timedelta(days=30), now)
collector.collect_process_evidence(policy_documents)
collector.collect_outcome_evidence(test_results)

total_evidence = sum(len(items) for items in collector.collected_evidence.values())
print(f"  ‚úÖ Collected {total_evidence} evidence items")
print()

# Step 6: Export evidence packages for both audits
print("Step 6: Exporting Evidence Packages")
sox_package = collector.export_evidence_package(
    framework=ComplianceFramework.SOX,
    export_path="./exports/sox_2024_q3"
)
print(f"  ‚úÖ SOX Package: {sox_package['total_evidence_items']} items")

gdpr_package = collector.export_evidence_package(
    framework=ComplianceFramework.GDPR,
    export_path="./exports/gdpr_2024"
)
print(f"  ‚úÖ GDPR Package: {gdpr_package['total_evidence_items']} items")
print()

# Step 7: Vendor risk assessment
print("Step 7: Vendor Risk Assessment")
print(f"  ‚úÖ Assessed {len(assessor.assessments)} vendors")
for assessment in assessor.assessments:
    print(f"     - {assessment['vendor_name']}: {assessment['risk_level']} risk")
print()

print("=" * 70)
print("‚úÖ COMPLIANCE WORKFLOW COMPLETE")
print("=" * 70)
print()
print("Auditor Deliverables:")
print(f"  - SOX Report: {sox_report['total_events']} events, integrity verified")
print(f"  - GDPR Report: {gdpr_report['total_events']} events, integrity verified")
print(f"  - Evidence Packages: {sox_package['total_evidence_items']} + {gdpr_package['total_evidence_items']} items")
print(f"  - Vendor Assessments: {len(assessor.assessments)} completed")
print()
print("Time to Generate: <60 seconds (vs 2-4 weeks manual collection)")

# Expected: Complete audit-ready evidence in <60 seconds

SAVED_SECTION:12

## Summary & Key Takeaways

### What You've Learned

1. **Immutable Audit Trails**
   - SHA-256 hash chaining creates tamper-evident logs
   - Append-only design prevents modifications
   - Integrity verification detects any tampering

2. **Automated Evidence Collection**
   - Daily exports reduce audit prep from weeks to seconds
   - Organized by compliance framework (SOX, SOC 2, GDPR, etc.)
   - Three evidence types: System, Process, Outcome

3. **Multi-Framework Compliance**
   - Same audit trail satisfies SOX, SOC 2, ISO 27001, GDPR, DPDPA
   - Framework-specific reports with control mapping
   - Automated generation in <60 seconds

4. **Vendor Risk Assessment**
   - Quantitative scoring for third-party AI vendors
   - Annual reassessment workflow
   - Risk-based recommendations

### Real-World Impact

**Before:**
- Manual evidence collection: 2-4 weeks
- No tamper detection
- Scattered documentation
- Ad-hoc vendor vetting

**After:**
- Automated evidence: <60 seconds
- Cryptographic integrity proof
- Centralized evidence repository
- Structured vendor assessments

### Next Steps

1. **Production Deployment**
   - Configure PostgreSQL for audit trail persistence
   - Set up S3 Object Lock for evidence storage
   - Schedule daily evidence export jobs

2. **Integration**
   - Integrate with existing RAG system
   - Add audit logging to all sensitive operations
   - Configure alerts for security events

3. **PractaThon Exercise 1.4**
   - Build multi-tenant compliance system
   - Handle SOX + DPDPA audits simultaneously
   - Generate audit-ready reports in <60 seconds

### Resources

- **Augmented Script:** [Augmented_GCC_Compliance_M1_4_Compli.md](https://github.com/yesvisare/gcc_comp_ai_ccc_l2/blob/main/Augmented_GCC_Compliance_M1_4_Compli.md)
- **API Docs:** http://localhost:8000/docs
- **Test Suite:** `pytest tests/ -v`
- **README:** Full documentation with cost estimates and real-world examples

**Congratulations! You've completed L3 M1.4: Compliance Documentation & Evidence** üéâ