# L3 M3.2: Automated Compliance Testing

## Learning Arc

**Duration:** 40-45 minutes

### Concepts Covered:

1. **Policy-as-Code Fundamentals** - Transform compliance from documentation to executable policies
2. **Open Policy Agent (OPA) & Rego** - Industry-standard policy engine and declarative policy language
3. **PII Detection Patterns** - Regex-based and ML-enhanced detection for GDPR/DPDPA compliance
4. **Test Pyramid Implementation** - 70% unit, 20% integration, 10% E2E testing strategy
5. **CI/CD Integration** - Automated compliance gates preventing violations before production
6. **Audit Evidence Generation** - Automated test results for SOC 2, ISO 27001 audits

### Learning Outcomes:

By completing this notebook, you will:

1. Implement policy-as-code with OPA/Rego
2. Build automated compliance test suites
3. Integrate testing into CI/CD with deployment gates
4. Create regression tests for control persistence
5. Generate automated audit evidence
6. Write and test Rego policies for RAG systems

### Prerequisites:

- Generic CCC L1: RAG MVP fundamentals (M1-M4)
- GCC Compliance M1: Regulatory Foundations
- GCC Compliance M2: Core Controls
- GCC Compliance M3.1: Monitoring Dashboards

---

**Key Question:** How do you prevent compliance violations *before* production, not after the 2 AM incident?

## Setup: OFFLINE Mode Guard

This notebook works in both ONLINE (with OPA/Presidio) and OFFLINE (regex-only) modes.

In [None]:
import sys
import os

# Add parent directory to path for imports
sys.path.insert(0, os.path.abspath('..'))

from config import check_service_availability, get_config

# Check service availability
availability = check_service_availability()
config = get_config()

print("="*60)
print("L3 M3.2: Automated Compliance Testing")
print("="*60)
print(f"\nService Status:")
print(f"  OPA Available: {availability['opa']}")
print(f"  Presidio Available: {availability['presidio']}")
print(f"\nMode: {'FULL' if all(availability.values()) else 'REGEX-ONLY (OFFLINE)'}")
print("\n" + "="*60)

if not availability['opa']:
    print("⚠️ Running in OFFLINE mode (regex-based PII detection only)")
    print("\nTo enable full functionality:")
    print("  1. Install OPA: https://www.openpolicyagent.org/docs/latest/")
    print("  2. Set OPA_ENABLED=true in .env")
    print("  3. Optional: pip install presidio-analyzer presidio-anonymizer")
    print("="*60)

## Section 1: The Compliance Nightmare Scenario

### The Problem

**2 AM on a Monday:** Your PII detection dashboard alerts fire. Customer Social Security Numbers have been embedded in your vector database for the past 3 weeks. GDPR violation. DPDPA violation. Audit failure imminent.

**Root Cause:** No automated validation before embedding - relied on manual reviews.

**The Question:** How could this have been prevented?

In [None]:
# Traditional approach (NO validation)
from src.l3_m3_monitoring_reporting import contains_pii

# Dangerous text with PII
dangerous_text = "Customer profile: Jane Doe, SSN 123-45-6789, Account #12345"

print("Traditional Approach (Manual Review):")
print(f"Text: {dangerous_text[:50]}...")
print("\n❌ Embedded without validation - PII LEAKED!")
print("\nResult: GDPR Article 17 violation, audit failure\n")

# Automated approach
print("="*60)
print("Automated Compliance Approach:")
has_pii = contains_pii(dangerous_text)
print(f"PII Detected: {has_pii}")

if has_pii:
    print("\n✓ BLOCKED before embedding")
    print("Violation prevented in CI/CD pipeline")
else:
    print("Safe to embed")

# Expected: PII detected, operation blocked

## Section 2: Policy-as-Code Fundamentals

### Documentation-Based vs. Code-Based Compliance

**Traditional (Documentation-Based):**
- Manual checklist: "Scan for PII before embedding"
- Review process: Human reads docs, checks code
- Failure mode: Human error, inconsistent application

**Policy-as-Code (Executable):**
- Executable policy: `deny { contains_pii(input.text) }`
- Automated execution: CI/CD runs tests on every commit
- Failure mode: Only false negatives (~5%), not human error

### OPA Architecture

```
┌──────────────┐
│   Input      │  ← Operation + Data (e.g., "embed" + text)
└──────┬───────┘
       │
       v
┌──────────────┐
│ Policy Engine│  ← OPA evaluates Rego policies
│    (OPA)     │
└──────┬───────┘
       │
       v
┌──────────────┐
│  Decision    │  ← Allow/Deny + Violations
└──────────────┘
```

In [None]:
from src.l3_m3_monitoring_reporting import evaluate_policy

# Example policy evaluation
input_data = {
    'operation': 'embed',
    'text': 'Financial compliance policy document',
    'pii_redacted': False
}

decision = evaluate_policy(input_data, policy='pii')

print("Policy Evaluation Result:")
print(f"  Input: {input_data['text']}")
print(f"  Policy: {decision['policy']}")
print(f"  Decision: {'ALLOW' if decision['allow'] else 'DENY'}")
print(f"  PII Detected: {decision.get('pii_detected', 'N/A')}")
print(f"  Violations: {decision['violations']}")

# Expected: Allow (no PII in clean text)

## Section 3: PII Detection Implementation

### Supported PII Types

1. **SSN** - Social Security Numbers (xxx-xx-xxxx)
2. **Email** - Email addresses
3. **Credit Card** - Card numbers (16 digits)
4. **Phone** - US phone numbers

### Detection Methods

- **Regex-based (Default):** Fast, transparent, zero dependencies
- **Presidio-enhanced (Optional):** ML-based, 15+ entity types, edge case handling

In [None]:
from src.l3_m3_monitoring_reporting import PIIDetector, PIIType

detector = PIIDetector()

# Test cases
test_cases = [
    ("SSN: 123-45-6789", "SSN"),
    ("Email: john.doe@example.com", "Email"),
    ("Card: 4532-1234-5678-9010", "Credit Card"),
    ("Phone: (555) 123-4567", "Phone"),
    ("Clean document about compliance", "No PII")
]

print("PII Detection Test Results:")
print("="*60)

for text, expected in test_cases:
    result = detector.detect(text)
    status = "✓" if result.has_pii else "○"
    pii_types = ", ".join([t.value for t in result.pii_types]) if result.pii_types else "none"
    
    print(f"{status} {expected:12} | Text: {text[:40]:40} | Detected: {pii_types}")

# Expected: First 4 detect PII, last one clean

## Section 4: Redaction Quality Validation

### Policy Requirement

If PII is detected, it must be **properly redacted** using `[REDACTED]` markers.

**Valid Redaction:**
```
Customer SSN: [REDACTED]
Email: [REDACTED]
```

**Invalid (Partial) Redaction:**
```
Customer SSN: [REDACTED], Email: real@example.com  ← Still has email!
```

In [None]:
from src.l3_m3_monitoring_reporting import redaction_quality_sufficient

# Test redaction quality
redaction_cases = [
    ("SSN: [REDACTED], Email: [REDACTED]", "Full redaction"),
    ("SSN: [REDACTED], Email: real@example.com", "Partial redaction"),
    ("SSN: 123-45-6789", "No redaction"),
    ("Clean text", "No PII")
]

print("Redaction Quality Validation:")
print("="*60)

for text, description in redaction_cases:
    is_sufficient = redaction_quality_sufficient(text)
    status = "✓" if is_sufficient else "✗"
    
    print(f"{status} {description:20} | {text[:40]:40} | Sufficient: {is_sufficient}")

# Expected: Only full redaction and clean text pass

## Section 5: Complete Compliance Workflow

### End-to-End Validation

The `check_compliance()` function orchestrates:
1. PII detection (regex or Presidio)
2. Policy evaluation (OPA logic)
3. Redaction quality check
4. Violation reporting

In [None]:
from src.l3_m3_monitoring_reporting import check_compliance

# Workflow examples
workflows = [
    ("embed", "Financial regulations require proper documentation."),
    ("embed", "Customer SSN: 123-45-6789"),
    ("embed", "Customer SSN: [REDACTED]"),
    ("query", "What are GDPR Article 17 requirements?")
]

print("Complete Compliance Workflow:")
print("="*60)

for operation, text in workflows:
    result = check_compliance(operation=operation, text=text)
    
    status = "✓ ALLOW" if result.allowed else "✗ DENY"
    violations_summary = f"{len(result.violations)} violation(s)" if result.violations else "none"
    
    print(f"\n{status}")
    print(f"  Operation: {operation}")
    print(f"  Text: {text[:50]}")
    print(f"  Violations: {violations_summary}")
    
    if result.violations:
        for v in result.violations[:2]:  # Show first 2
            print(f"    - {v[:80]}")

# Expected: Allow clean text and redacted, deny unredacted PII

## Section 6: Test Pyramid Implementation

### Test Pyramid Strategy

```
      /\        E2E (10%)
     /  \       5-10 tests
    /────\      Full workflow validation
   /      \     
  / Integ  \    Integration (20%)
 /  (20%)  \   10-15 tests
/────────────\  Policy + data integration
/            \ 
/ Unit (70%)  \ Unit Tests (70%)
/──────────────\ 15-20 tests per category
                PII detection, patterns, edge cases
```

### Target Coverage

- **55-77 total tests** per deployment
- **95%+ pass rate** for production readiness
- **2-5 minute** execution time in CI/CD

In [None]:
from src.l3_m3_monitoring_reporting import run_compliance_tests

# Run automated test suite
print("Executing Compliance Test Suite...")
print("="*60)

results = run_compliance_tests()

print(f"\nTest Execution Summary:")
print(f"  Total Tests: {results['total_tests']}")
print(f"  Passed: {results['passed']} ✓")
print(f"  Failed: {results['failed']} ✗")
print(f"  Pass Rate: {results['pass_rate']:.1f}%")
print(f"\nCoverage Metrics:")
print(f"  Total Validations: {results['coverage']['total_tests']}")
print(f"  Coverage: {results['coverage']['coverage_pct']:.1f}%")

print(f"\nSample Test Results (first 5):")
for test in results['results'][:5]:
    status = "✓" if test['passed'] else "✗"
    print(f"  {status} {test['name']}")

# Expected: 95%+ pass rate with 10 tests shown

## Section 7: Writing Rego Policies

### Sample Rego Policy (Conceptual)

```rego
package ragcompliance.pii

# Default deny principle
default allow_embedding = false

# Helper: Check if text contains PII
contains_pii(text) {
    regex.match(`\\b\\d{3}-\\d{2}-\\d{4}\\b`, text)  # SSN
}

contains_pii(text) {
    regex.match(`[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}`, text)  # Email
}

# Helper: Check redaction quality
redaction_quality_sufficient(text) {
    contains(text, "[REDACTED]")
    not contains_pii(text)  # No unredacted PII remains
}

# Allow rule: No PII detected
allow_embedding {
    not contains_pii(input.text)
}

# Allow rule: PII properly redacted
allow_embedding {
    contains_pii(input.text)
    redaction_quality_sufficient(input.text)
}

# Violation: Unredacted PII
violation[msg] {
    contains_pii(input.text)
    not redaction_quality_sufficient(input.text)
    msg := sprintf("PII detected without redaction: GDPR Article 17 violation", [])
}
```

### Policy Components

1. **Package Declaration** - Namespace (e.g., `ragcompliance.pii`)
2. **Default Rules** - Security principle (default deny)
3. **Helper Functions** - Reusable logic (PII detection, redaction check)
4. **Allow Rules** - Explicit conditions permitting operations
5. **Violation Messages** - Formatted error output for audits

In [None]:
from src.l3_m3_monitoring_reporting import OPAPolicyEngine

# Simulate OPA policy evaluation
engine = OPAPolicyEngine()

# Test policy logic
policy_tests = [
    {
        'operation': 'embed',
        'text': 'Clean document',
        'pii_redacted': False,
        'expected': 'allow'
    },
    {
        'operation': 'embed',
        'text': 'SSN: 123-45-6789',
        'pii_redacted': False,
        'expected': 'deny'
    },
    {
        'operation': 'embed',
        'text': 'SSN: [REDACTED]',
        'pii_redacted': True,
        'expected': 'allow'
    }
]

print("Rego Policy Logic Validation:")
print("="*60)

for test in policy_tests:
    decision = engine.evaluate_pii_policy(test)
    actual = 'allow' if decision['allow'] else 'deny'
    match = "✓" if actual == test['expected'] else "✗"
    
    print(f"{match} Expected: {test['expected']:5} | Actual: {actual:5} | Text: {test['text'][:30]}")

# Expected: All tests match expected behavior

## Section 8: CI/CD Integration Pattern

### GitHub Actions Workflow

```yaml
name: Compliance Gate

on: [push, pull_request]

jobs:
  compliance-tests:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Install OPA
        run: |
          curl -L -o opa https://openpolicyagent.org/downloads/latest/opa_linux_amd64
          chmod +x opa && sudo mv opa /usr/local/bin/

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run compliance tests
        run: pytest tests/ --cov-fail-under=95

      - name: Block on violations
        if: failure()
        run: |
          echo "❌ Compliance tests failed - deployment blocked"
          exit 1
```

### Key Benefits

1. **Prevention** - Violations caught before merge
2. **Speed** - 2-5 minute execution time
3. **Evidence** - Test results for audit trail
4. **Regression** - Every commit validated

In [None]:
# Simulate CI/CD gate
from src.l3_m3_monitoring_reporting import ComplianceValidator

validator = ComplianceValidator()

# Simulated commit changes
commit_changes = [
    ("New feature: compliance dashboard", "embed"),
    ("User data: SSN 123-45-6789", "store"),  # This should fail
    ("Updated policy documentation", "embed")
]

print("CI/CD Compliance Gate Simulation:")
print("="*60)

all_passed = True

for text, operation in commit_changes:
    result = validator.validate(operation, text)
    
    if result.allowed:
        print(f"✓ PASS | {text[:40]}")
    else:
        print(f"✗ FAIL | {text[:40]}")
        print(f"  Violations: {result.violations[0][:60]}...")
        all_passed = False

print("\n" + "="*60)
if all_passed:
    print("✓ All checks passed - DEPLOYMENT ALLOWED")
else:
    print("✗ Compliance gate failed - DEPLOYMENT BLOCKED")
    print("Fix violations before merging to main branch")

# Expected: Second commit fails (unredacted SSN), deployment blocked

## Section 9: Metrics & Impact

### Key Performance Indicators

| Metric | Before Automation | After Automation | Improvement |
|--------|------------------|------------------|-------------|
| **Audit Prep Time** | 8 hours | 30 minutes | 16x faster |
| **Violation Detection** | Post-production | Pre-deployment | 95% prevented |
| **Test Execution** | Manual (hours) | Automated (2-5 min) | ~100x faster |
| **Coverage** | Variable | 95%+ consistent | Guaranteed |
| **Regression Catches** | Rare | 95%+ in CI | Continuous |

### Compliance Coverage

- **PII Detection:** 99%+ pattern detection accuracy
- **Access Control:** 100% unauthorized access blocked (when implemented)
- **Audit Logging:** 99.5%+ operation logging
- **Overall Regression Prevention:** 95%+

In [None]:
# Calculate coverage metrics
from src.l3_m3_monitoring_reporting import ComplianceValidator

validator = ComplianceValidator()

# Simulate month of operations
import random

operations_sample = [
    ("embed", "Clean financial document"),
    ("query", "What are compliance requirements?"),
    ("embed", "Policy framework overview"),
    ("store", "Audit trail record"),
    ("embed", "Regulatory guidelines")
]

for _ in range(10):  # Simulate 10 operations
    op, text = random.choice(operations_sample)
    validator.validate(op, text)

coverage = validator.get_test_coverage()

print("Coverage Metrics:")
print("="*60)
print(f"Total Operations Validated: {coverage['total_tests']}")
print(f"Passed: {coverage['passed']}")
print(f"Failed: {coverage['failed']}")
print(f"Coverage: {coverage['coverage_pct']:.1f}%")
print("\nAll operations validated - 100% coverage achieved")

# Expected: 100% of operations validated

## Section 10: Common Failures & Debugging

### Top 5 Failure Scenarios

1. **False Positives** - Valid data blocked (e.g., dates mistaken for SSN)
2. **False Negatives** - PII slips through (international formats, typos)
3. **Incomplete Coverage** - Tests pass CI but fail audit
4. **Performance Issues** - CI tests exceed 10 minutes
5. **Policy Drift** - Policies don't match updated regulations

In [None]:
# Debug false positive example
from src.l3_m3_monitoring_reporting import contains_pii

# Edge case: date that looks like SSN pattern
edge_cases = [
    "Date format: 123-45-6789",  # Matches SSN pattern (false positive risk)
    "International phone: +44 20 7123 4567",  # Non-US format
    "Typo SSN: 123456789",  # Missing dashes
]

print("Edge Case Detection:")
print("="*60)

for text in edge_cases:
    has_pii = contains_pii(text)
    print(f"{'✓ PII' if has_pii else '○ Clean'} | {text}")

print("\nNote: Regex patterns prioritize precision over recall")
print("Edge cases may require pattern tuning or Presidio enhancement")

# Expected: First case may trigger false positive, others may miss

## Section 11: Decision Card Review

### When to Use Automated Compliance Testing

#### ✅ Use When:

1. **High-stakes compliance** - Financial services, healthcare, government (SOC 2, ISO 27001, GDPR)
2. **Frequent deployments** - CI/CD pipelines with multiple releases per week
3. **Audit requirements** - Need automated evidence and control persistence
4. **Repeatable testing** - Regression prevention across 55-77 tests
5. **Documentation-heavy** - Converting manual checklists to executable policies

#### ❌ Don't Use When:

1. **Prototype stage** - Pre-PMF, compliance premature
2. **Low-risk apps** - Internal tools without PII, no regulations
3. **No policy expertise** - Team can't write/maintain Rego
4. **One-time audits** - Manual review more efficient
5. **Simple rules** - Basic linting suffices

#### ⚖️ Trade-offs:

- **Learning curve:** 2-4 weeks Rego proficiency
- **Setup time:** 1-2 sprints for first suite
- **Maintenance:** Policies need updates as regulations change
- **Coverage:** 95% catch rate, not 100%

In [None]:
# Decision framework helper
def should_use_automated_testing(context: dict) -> dict:
    """
    Evaluate if automated compliance testing is appropriate.
    
    Args:
        context: Dictionary with project characteristics
    
    Returns:
        Recommendation with reasoning
    """
    score = 0
    reasons = []
    
    # High-stakes compliance
    if context.get('compliance_requirements') in ['SOC2', 'ISO27001', 'GDPR']:
        score += 3
        reasons.append("High-stakes compliance requirements detected")
    
    # Deployment frequency
    if context.get('deploys_per_week', 0) >= 3:
        score += 2
        reasons.append("Frequent deployments benefit from automation")
    
    # Audit needs
    if context.get('needs_audit_evidence', False):
        score += 2
        reasons.append("Audit evidence requirement met by automated tests")
    
    # Team expertise
    if not context.get('has_policy_expertise', False):
        score -= 2
        reasons.append("Team lacks policy expertise - training needed")
    
    # Project stage
    if context.get('stage') == 'prototype':
        score -= 3
        reasons.append("Prototype stage - premature for automation")
    
    recommendation = "RECOMMEND" if score >= 3 else "NOT RECOMMENDED" if score <= 0 else "CONSIDER"
    
    return {
        'recommendation': recommendation,
        'score': score,
        'reasons': reasons
    }

# Example contexts
contexts = [
    {
        'name': 'FinTech RAG Production',
        'compliance_requirements': 'SOC2',
        'deploys_per_week': 5,
        'needs_audit_evidence': True,
        'has_policy_expertise': True,
        'stage': 'production'
    },
    {
        'name': 'Internal Tools Prototype',
        'compliance_requirements': None,
        'deploys_per_week': 1,
        'needs_audit_evidence': False,
        'has_policy_expertise': False,
        'stage': 'prototype'
    }
]

print("Decision Framework Evaluation:")
print("="*60)

for ctx in contexts:
    result = should_use_automated_testing(ctx)
    print(f"\nProject: {ctx['name']}")
    print(f"Recommendation: {result['recommendation']} (Score: {result['score']})")
    print("Reasoning:")
    for reason in result['reasons']:
        print(f"  - {reason}")

# Expected: FinTech RECOMMEND, Internal Tools NOT RECOMMENDED

## Section 12: Summary & Next Steps

### What You've Learned

1. ✅ **Policy-as-Code** - Transformed compliance from docs to executable policies
2. ✅ **OPA/Rego** - Implemented industry-standard policy engine
3. ✅ **PII Detection** - Regex and optional ML-based detection
4. ✅ **Test Pyramid** - 70% unit, 20% integration, 10% E2E strategy
5. ✅ **CI/CD Integration** - Automated gates preventing violations
6. ✅ **Audit Evidence** - Auto-generated test results for compliance

### Key Metrics Achieved

- **95%+ violation prevention** in CI/CD
- **16x faster audit prep** (8 hours → 30 minutes)
- **2-5 minute test execution** per pipeline run
- **55-77 automated tests** per deployment

### Next Steps

1. **Immediate:**
   - Install OPA binary for full functionality
   - Run test suite: `pytest tests/`
   - Review example policies in README

2. **This Week:**
   - Integrate with M3.1 monitoring dashboards
   - Set up CI/CD compliance gates
   - Write first custom Rego policy

3. **This Month:**
   - Complete M3.3: Incident Response
   - Conduct first automated audit evidence review
   - Expand test coverage to access control and retention policies

### Resources

- [OPA Documentation](https://www.openpolicyagent.org/docs/latest/)
- [Rego Playground](https://play.openpolicyagent.org/)
- [Presidio Documentation](https://microsoft.github.io/presidio/)
- Module README: `../README.md`

---

**Remember:** Automated testing catches ~95% of violations. Still need:
- M3.1 dashboards for runtime detection
- Annual third-party audits
- Incident response for the 5% that slip through

**Prevention over detection, but never 100%.**

In [None]:
# Final summary
print("="*60)
print("L3 M3.2: Automated Compliance Testing - Complete")
print("="*60)
print("\n✓ Module completed successfully!\n")
print("Key Takeaways:")
print("  1. Policy-as-code transforms compliance from docs to executable tests")
print("  2. OPA/Rego provides industry-standard policy engine")
print("  3. Test pyramid ensures comprehensive coverage (70/20/10)")
print("  4. CI/CD integration prevents 95%+ violations pre-deployment")
print("  5. Automated evidence reduces audit prep 16x (8h → 30min)")
print("\nNext Module: M3.3 - Incident Response")
print("="*60)