# L3 M4.2: Vendor Risk Assessment

## Learning Arc

**Purpose:** Build a comprehensive vendor risk assessment framework for Global Capability Centers (GCCs) managing third-party compliance in RAG systems. Learn to prevent vendor-caused data breaches and ensure regulatory compliance through systematic risk evaluation.

**Concepts Covered:**
1. **Vendor Risk Propagation** - Understanding "Third-party risk = Your risk" and legal liability frameworks (GDPR Article 28)
2. **5-Category Weighted Evaluation Matrix** - Security (30%), Privacy (25%), Compliance (20%), Reliability (15%), Data Residency (10%)
3. **Data Processing Agreements (DPAs)** - 12 essential GDPR clauses and automated DPA review
4. **Subprocessor Management** - Tracking vendors' vendors through the supply chain
5. **Continuous Monitoring** - Detecting certification lapses, security incidents, and contract changes
6. **Risk Score Interpretation** - 0-100 scoring with approval thresholds (90+ = Approved, <50 = Rejected)
7. **Real-World Case Studies** - Capital One breach, Facebook-Cambridge Analytica, SolarWinds, MOVEit

**After Completing This Notebook:**
- You will understand how vendor security failures cascade to your organization's liability
- You can evaluate third-party vendors using a 5-category weighted risk matrix
- You can identify missing DPA clauses that violate GDPR Article 28 requirements
- You will recognize when vendor changes (certifications, subprocessors) trigger re-assessment
- You can implement continuous monitoring for 50-200 third-party vendors at scale
- You can generate CFO-ready risk reports showing cost-vs-risk trade-offs

**Context in Track L3.M4:**
This module builds on **M4.1 Model Cards & Documentation** (extracting third-party dependencies) and prepares you for **M4.3 Change Management** (triggering re-assessments when vendors change).

In [None]:
# Cell 2: Environment Setup
import os
import sys
from datetime import datetime, timedelta
import json

# Add src to path for imports
if './src' not in sys.path:
    sys.path.insert(0, './src')

# OFFLINE mode (no external services required for this module)
OFFLINE = os.getenv("OFFLINE", "true").lower() == "true"

print("L3 M4.2: Vendor Risk Assessment - Environment Setup")
print("="*60)
print(f"Processing Mode: {'OFFLINE (Local processing only)' if OFFLINE else 'Online'}")
print("\nThis module operates entirely offline:")
print("  ✓ No external LLM APIs (OpenAI, Anthropic)")
print("  ✓ No vector databases (Pinecone, Qdrant)")
print("  ✓ Local risk calculations using pandas")
print("  ✓ Optional: sentence-transformers for DPA analysis (local model)")
print("\nReady to begin!")

## Section 1: The Business Problem - Why Vendor Risk Assessment Matters

### Real-World Scenario: Bangalore GCC Breach via Vendor

A major Global Capability Center in Bangalore serves 50+ business units with 500K daily RAG queries. Their vector database vendor experiences a data breach exposing:
- API keys for 200 customer environments
- Embeddings metadata (reveals document structure)
- Query logs (contains business intelligence)

**Consequences:**
- **₹8 crore** in incident response costs (forensics, customer notification, credit monitoring)
- Parent company audit reveals **15 vendor compliance gaps**
- **6-month remediation timeline** disrupting normal operations
- **Multi-country regulatory notifications** (GDPR in EU, DPDPA in India, CCPA in California)

**Key Insight:** The GCC's security was strong, but their **vendor's security was weak**. Under GDPR Article 28, the GCC is liable for vendor failures.

### What Makes GCCs Vulnerable?

1. **Multi-Jurisdictional Operations** - One vendor breach triggers notifications in 15+ countries with different requirements
2. **Shared Services Model** - One RAG platform serves 50+ business units; vendor breach impacts entire organization
3. **Compliance Stack** - Must satisfy parent company (SOX), India requirements (DPDPA), AND client countries (GDPR, CCPA)
4. **Scale** - 10-20 vendors per RAG system × 5-10 subprocessors each = **50-200 third parties in supply chain**

**This module solves:** How to systematically evaluate, monitor, and manage 50-200 third-party vendors with consistent, auditable methodology.

In [None]:
# Cell 4: Import the VendorRiskAssessment framework
from l3_m4_enterprise_integration_governance import VendorRiskAssessment

# Initialize the assessor
assessor = VendorRiskAssessment()

print("VendorRiskAssessment Framework Initialized")
print("\nCategory Weights:")
for category, weight in assessor.WEIGHTS.items():
    print(f"  {category.replace('_', ' ').title()}: {weight*100:.0f}%")

print("\n✓ Framework ready for vendor assessments")

# Expected output:
# Category Weights:
#   Security: 30%
#   Privacy: 25%
#   Compliance: 20%
#   Reliability: 15%
#   Data Residency: 10%

## Section 2: Concept 1 - Vendor Risk Propagation

### Core Principle: "Third-party risk = Your risk"

RAG systems depend on multiple vendors:
- **LLM APIs:** OpenAI, Anthropic (process queries containing sensitive data)
- **Vector Databases:** Pinecone, Weaviate (store embeddings with metadata)
- **Cloud Providers:** AWS, GCP, Azure (host infrastructure)
- **Observability Tools:** Datadog, New Relic (view logs containing PII)

### Legal Framework: GDPR Article 28

**GDPR Article 28** establishes **shared responsibility**: Controllers must ensure processors maintain equivalent data protection.

**Regulators don't differentiate:**
- Your breach = Your liability
- **Your vendor's breach = ALSO your liability**

### Supply Chain Analogy

If your ingredient supplier uses contaminated water, **you're liable for food poisoning** despite not directly using contaminated water. Similarly, if your vector database vendor gets breached, **you're liable for the data exposure** despite not directly causing the breach.

### Real-World Examples

**Capital One Breach (2019):**
- **Vendor:** AWS (cloud provider)
- **Issue:** Misconfigured firewall exposed 100M customer records
- **Responsibility:** Capital One's responsibility despite vendor error
- **Cost:** $80M OCC fine + $190M class action settlement
- **Lesson:** Cloud provider misconfiguration = your breach

**Facebook-Cambridge Analytica (2018):**
- **Vendor:** Cambridge Analytica (data processor)
- **Issue:** Vendor misused shared data
- **Cost:** £500K UK fine + $5B US FTC fine
- **Lesson:** Vendor data handling violations are your violations

## Section 3: Concept 2 - The 5-Category Weighted Evaluation Matrix

### Why Weighted Scoring?

Not all risks are equal. Security breaches occur far more frequently than data residency violations, so **Security gets 30% weight vs. Data Residency's 10%**.

### Category Breakdown

| Category | Weight | What We Evaluate | Why This Weight? |
|----------|--------|------------------|------------------|
| **Security** | 30% | SOC 2 Type II, ISO 27001, penetration testing, incident history | Breaches are most common vendor risk; highest impact |
| **Privacy** | 25% | GDPR/CCPA compliance, DPA availability, data handling policies | Privacy violations trigger regulatory fines (up to 4% global revenue) |
| **Compliance** | 20% | Industry certifications, audit reports, regulatory alignment | Proves baseline standards; required for regulated industries |
| **Reliability** | 15% | SLA guarantees, uptime history, support responsiveness | Downtime costly but less catastrophic than breaches |
| **Data Residency** | 10% | Geographic locations, data sovereignty, subprocessors | Important for regulated industries; niche requirement |

### Risk Score Interpretation

- **90-100:** Low Risk → **APPROVED** (suitable for production use)
- **70-89:** Medium Risk → **APPROVED WITH CONDITIONS** (require additional controls/monitoring)
- **50-69:** High Risk → **ADDITIONAL CONTROLS REQUIRED** (risk mitigation plan needed)
- **0-49:** Critical Risk → **REJECTED** (seek alternative vendor)

### Weighted Average Formula

```
Overall Score = (Security × 0.30) + (Privacy × 0.25) + (Compliance × 0.20) 
                + (Reliability × 0.15) + (Data Residency × 0.10)
```

**Example:**
- Security: 95/100
- Privacy: 90/100
- Compliance: 85/100
- Reliability: 88/100
- Data Residency: 80/100

**Overall:** (95×0.30) + (90×0.25) + (85×0.20) + (88×0.15) + (80×0.10) = **89.7** → **Medium Risk (Approved with Conditions)**

In [None]:
# Cell 7: Example - Assess a Low-Risk Vendor (OpenAI)

# Inputs for OpenAI assessment
openai_inputs = {
    # Security inputs
    'soc2_date': datetime.now() - timedelta(days=180),  # 6 months ago
    'iso27001': True,
    'pentest_date': datetime.now() - timedelta(days=90),  # 3 months ago
    'breaches_count': 0,
    
    # Privacy inputs
    'gdpr_compliant': True,
    'dpa_available': True,
    'data_policy_score': 3,  # 0-3 scale, 3 = excellent
    'deletion_process': 'automated_verified',
    'access_controls': 'strong',
    
    # Compliance inputs
    'certifications': ['soc2', 'iso27001'],
    'audit_date': datetime.now() - timedelta(days=150),
    'notification_process': 'proactive',
    'violations_count': 0,
    
    # Reliability inputs
    'sla_guarantee': 99.9,
    'actual_uptime_12m': 99.95,
    'support_response_time': '<1h',
    'dr_plan': 'tested_annually',
    
    # Data residency inputs
    'dc_locations': ['US', 'EU', 'Asia'],
    'dc_selectable': True,
    'subproc_locations': ['US', 'EU'],
    'sccs_available': True,
    'localization_support': 'full'
}

# Calculate overall risk
openai_result = assessor.calculate_overall_risk('OpenAI', openai_inputs)

# Display results
print("=" * 60)
print(f"Vendor Risk Assessment: {openai_result['vendor']}")
print("=" * 60)
print(f"Overall Score: {openai_result['overall_score']}/100")
print(f"Risk Level: {openai_result['risk_level']}")
print(f"Recommendation: {openai_result['recommendation']}")
print("\nCategory Scores:")
for category, score in openai_result['category_scores'].items():
    print(f"  {category.replace('_', ' ').title()}: {score}/100")

print("\nKey Findings (Security):")
for finding in openai_result['findings']['security'][:3]:  # Show first 3
    print(f"  {finding}")

# Expected output:
# Overall Score: 92-95/100
# Risk Level: LOW RISK
# Recommendation: APPROVED

## Section 4: Security Evaluation Deep Dive (30% Weight)

### Why Security Has Highest Weight

Security breaches are the **most common and costly** vendor risk:
- Capital One: $270M total cost
- SolarWinds: 18,000 organizations affected
- MOVEit: 77M+ individuals compromised

### Security Scoring Criteria (100 points total)

#### 1. SOC 2 Type II (30 points)
- **30 points:** Report <12 months old
- **15 points:** Report 12-24 months old
- **0 points:** Report >24 months old OR no report

**Why SOC 2 Type II matters:**
- Type I = Point-in-time snapshot (insufficient)
- Type II = Controls tested over 6+ months (proves sustained effectiveness)
- Most GCC parent companies require SOC 2 Type II for all critical vendors

#### 2. ISO 27001 Certification (20 points)
- **20 points:** Currently certified
- **0 points:** Not certified

**Why ISO 27001 matters:**
- International standard for information security management
- Shows vendor has formal security program, not ad-hoc controls
- Required for many EU customers

#### 3. Penetration Testing (20 points)
- **20 points:** Test <12 months old
- **10 points:** Test 12-24 months old
- **0 points:** Test >24 months old OR no testing

**Why pentesting matters:**
- Proactively identifies vulnerabilities before attackers exploit them
- Annual testing is industry standard for high-risk vendors
- Shows vendor invests in offensive security, not just defensive

#### 4. Incident History (30 points)
- **30 points:** 0 breaches in past 3 years
- **Deduct 10 points per breach** (cap at -30)

**Why breach history matters:**
- Past breaches predict future risk
- Multiple breaches indicate systemic security failures, not one-off mistakes
- Vendor with 3+ breaches scores 0/100 on security → likely REJECTED overall

In [None]:
# Cell 9: Example - Assess Security for Hypothetical Risky Vendor

risky_vendor_inputs = {
    'soc2_date': datetime.now() - timedelta(days=900),  # 30 months ago (expired)
    'iso27001': False,
    'pentest_date': datetime.now() - timedelta(days=540),  # 18 months ago
    'breaches_count': 3,  # 3 breaches in past 3 years
    
    # Privacy (for overall calculation)
    'gdpr_compliant': False,
    'dpa_available': False,
    'data_policy_score': 0,
    'deletion_process': 'unclear',
    'access_controls': 'weak',
    
    # Compliance
    'certifications': [],
    'audit_date': datetime.now() - timedelta(days=900),
    'notification_process': 'reactive',
    'violations_count': 2,
    
    # Reliability
    'sla_guarantee': 99.0,
    'actual_uptime_12m': 98.5,
    'support_response_time': '24h',
    'dr_plan': 'none',
    
    # Data Residency
    'dc_locations': ['US'],
    'dc_selectable': False,
    'subproc_locations': [],
    'sccs_available': False,
    'localization_support': 'none'
}

# Evaluate just security category
security_score, security_findings = assessor.evaluate_security('RiskyVendor', risky_vendor_inputs)

print("Security Evaluation: RiskyVendor")
print("=" * 60)
print(f"Security Score: {security_score}/100\n")
print("Findings:")
for finding in security_findings:
    print(f"  {finding}")

# Calculate overall risk
risky_result = assessor.calculate_overall_risk('RiskyVendor', risky_vendor_inputs)
print(f"\nOverall Score: {risky_result['overall_score']}/100")
print(f"Risk Level: {risky_result['risk_level']}")
print(f"Recommendation: {risky_result['recommendation']}")

# Expected output:
# Security Score: 0/100 (3 breaches = -30, expired SOC 2, no ISO 27001, old pentest)
# Overall Score: <50 (CRITICAL RISK)
# Recommendation: REJECTED

## Section 5: Privacy Evaluation Deep Dive (25% Weight)

### Why Privacy is Second-Highest Weight

Privacy violations trigger **regulatory fines up to 4% of global annual revenue** under GDPR.

### Privacy Scoring Criteria (100 points total)

#### 1. GDPR Compliance (40 points)
- **40 points:** GDPR compliant + DPA available
- **20 points:** Claims GDPR compliance but no DPA
- **0 points:** Not GDPR compliant

**CRITICAL:** DPA is **LEGALLY REQUIRED** by GDPR Article 28 for processing EU personal data. No DPA = cannot use vendor for EU data.

#### 2. Data Handling Policies (30 points)
- **30 points:** Transparent policies (score 2-3)
- **15 points:** Basic policies (score 1)
- **0 points:** Unclear policies (score 0)

**Why transparency matters:** Can you explain to auditors what vendor does with your data?

#### 3. Data Deletion (20 points)
- **20 points:** Automated deletion with verification
- **10 points:** Manual deletion process
- **0 points:** Unclear deletion process

**GDPR Article 17 (Right to Erasure):** Vendors must delete data on request AND prove deletion.

#### 4. Data Access Controls (10 points)
- **10 points:** Strong (MFA, audit logs, need-to-know)
- **5 points:** Basic controls
- **0 points:** Weak controls

**Why this matters:** Who at vendor can access your data? How is access logged? GCCs need to prove to auditors that vendor access is controlled.

## Section 6: Concept 3 - Data Processing Agreements (DPAs)

### What is a DPA?

A **Data Processing Agreement (DPA)** is a contract between:
- **Data Controller:** Your organization (determines purposes/means of processing)
- **Data Processor:** Vendor (processes data on your behalf)

**Required by GDPR Article 28** for any vendor processing EU personal data.

### 12 Essential DPA Clauses

| # | Clause | Why Critical | Missing = Risk |
|---|--------|--------------|----------------|
| 1 | **Data scope and processing activities** | Defines what data vendor can access | Vendor accesses data outside agreed scope |
| 2 | **Purpose limitation** | Vendor can ONLY use data for specified purposes | Vendor resells data, uses for training LLMs |
| 3 | **Technical/organizational security measures** | Minimum security requirements (encryption, access controls) | No security baseline = breaches |
| 4 | **Subprocessor approval requirements** | Vendor must get approval before adding subprocessors | Vendor adds untrusted subprocessor without notice |
| 5 | **Data subject rights assistance** | Vendor helps with access, deletion requests | Cannot fulfill GDPR data subject requests |
| 6 | **Breach notification** | Vendor notifies within 24-72 hours | Late notification prevents timely response |
| 7 | **Data location specifications** | Where data is stored geographically | Data stored in prohibited countries |
| 8 | **Cross-border transfer safeguards** | Standard Contractual Clauses (SCCs) for EU→non-EU | GDPR violation for cross-border transfers |
| 9 | **Audit rights** | Your right to audit vendor's controls | Cannot verify vendor compliance |
| 10 | **Data deletion timelines** | When/how data is deleted post-contract | Data persists indefinitely after termination |
| 11 | **Liability assignments** | Who pays fines if breach occurs | Unclear liability in breach scenarios |
| 12 | **Termination and data return** | Data returned/deleted upon contract end | Data loss or vendor lock-in |

### Failure Scenario: DPA Without Essential Clauses

**Problem:** Vendor provides DPA but missing subprocessor approval clause (Clause 4). You sign blindly.

**Consequence:** Vendor adds untrusted subprocessor without your consent. Subprocessor gets breached. You're liable for GDPR violation because DPA didn't require approval.

**Mitigation:** Use automated DPA clause checker (optional feature with `USE_DPA_ANALYSIS=true`) before signing. Flag missing clauses to legal team.

## Section 7: Concept 4 - Subprocessor Management

### The Subprocessor Problem

**Your vendors use other vendors.** Those are **subprocessors**.

**Example Chain:**
```
You → Pinecone (vector database) → AWS (cloud provider) → Data center operator
```

AWS is Pinecone's **subprocessor**, but AWS processes **your data** indirectly.

### Regulatory Requirement: GDPR Article 28(4)

"Processors shall ensure that subprocessors maintain equivalent data protection."

**If subprocessor fails, YOU remain liable.**

### Subprocessor Management Requirements

1. **Registry:** Maintain list of all subprocessors per vendor
2. **Approval:** Vendor must obtain approval before adding subprocessors
3. **DPAs:** Vendor must maintain DPAs with all subprocessors
4. **Monitoring:** Track when subprocessors change
5. **Evaluation:** Assess subprocessors using same risk matrix

### Real-World Case: SolarWinds Supply Chain Attack (2020)

- **Issue:** Orion software compromised via update mechanism
- **Scope:** 18,000 organizations affected (including US government agencies)
- **Chain:** Customer → SolarWinds (vendor) → Orion (software) → Malicious update
- **Lesson:** Vendor compromises propagate to all customers

### Failure Scenario: Subprocessor Supply Chain Breach

**Problem:** Vendor A uses Vendor B, Vendor B uses Vendor C. Vendor C gets breached.

**Consequence:** Your data exposed through chain you didn't monitor.

**Mitigation:** Maintain subprocessor registry at **all levels**. Audit subprocessor's risk scores too.

## Section 8: Concept 5 - Continuous Monitoring

### Why Vendors Become Riskier Over Time

Vendor risk is **not static**:

1. **Security incidents** - Vendor gets breached, delays disclosure
2. **Certification lapses** - SOC 2 expires, vendor doesn't renew immediately
3. **Terms modifications** - Vendor adds subprocessors, changes data handling
4. **Regulatory changes** - New requirements (DPDPA in India 2024)
5. **Financial instability** - Vendor cuts security investment to reduce costs

### Continuous Monitoring Components

| Frequency | Activity | Alert Trigger |
|-----------|----------|---------------|
| **Weekly** | Check vendor status pages | Incident posted |
| **Monthly** | Review security advisories | New CVE affecting vendor |
| **Quarterly** | Re-evaluate risk scores | Certification expired OR Score drops ≥10 points |
| **Annually** | Request updated SOC 2 reports | Report >12 months old |
| **On-change** | Monitor subprocessor changes | Vendor adds subprocessor without notice |

### Failure Scenario: Vendor Certification Lapses Undetected

**Problem:** Vendor's SOC 2 expired 6 months ago. You still use them (didn't notice).

**Consequence:** SOC 2 provides assurance vendor's controls work. Expired SOC 2 = no assurance. Auditor red-flags this during your annual audit.

**Mitigation:** Continuous monitoring watches certification dates, sends quarterly alerts.

### Implementation Pattern

```python
# Pseudo-code for continuous monitoring
def weekly_monitoring():
    for vendor in active_vendors:
        # Check status page
        incidents = fetch_vendor_status(vendor)
        if incidents:
            alert_compliance_team(vendor, incidents)

def quarterly_monitoring():
    for vendor in active_vendors:
        # Re-evaluate risk
        new_score = assessor.calculate_overall_risk(vendor, get_latest_inputs(vendor))
        old_score = get_previous_score(vendor)
        
        if new_score['overall_score'] < old_score - 10:
            alert_compliance_team(vendor, "Risk score dropped >10 points")
        
        # Check certifications
        if soc2_expiring_soon(vendor):
            alert_compliance_team(vendor, "SOC 2 expiring in 30 days")
```

In [None]:
# Cell 15: Example - Simulating Quarterly Re-Evaluation

# Vendor was LOW RISK in Q1, but SOC 2 expired in Q2
print("Scenario: Quarterly Re-Evaluation - Vendor Certification Lapsed\n")
print("=" * 60)

# Q1 Assessment (SOC 2 current)
q1_inputs = {
    'soc2_date': datetime.now() - timedelta(days=180),  # 6 months ago (current)
    'iso27001': True,
    'pentest_date': datetime.now() - timedelta(days=90),
    'breaches_count': 0,
    'gdpr_compliant': True,
    'dpa_available': True,
    'data_policy_score': 3,
    'deletion_process': 'automated_verified',
    'access_controls': 'strong',
    'certifications': ['soc2', 'iso27001'],
    'audit_date': datetime.now() - timedelta(days=150),
    'notification_process': 'proactive',
    'violations_count': 0,
    'sla_guarantee': 99.9,
    'actual_uptime_12m': 99.95,
    'support_response_time': '<1h',
    'dr_plan': 'tested_annually',
    'dc_locations': ['US', 'EU'],
    'dc_selectable': True,
    'subproc_locations': ['US', 'EU'],
    'sccs_available': True,
    'localization_support': 'full'
}

q1_result = assessor.calculate_overall_risk('VendorX_Q1', q1_inputs)
print(f"Q1 Assessment (SOC 2 current):")
print(f"  Overall Score: {q1_result['overall_score']}/100")
print(f"  Risk Level: {q1_result['risk_level']}")

# Q2 Assessment (SOC 2 expired - 30 months old)
q2_inputs = q1_inputs.copy()
q2_inputs['soc2_date'] = datetime.now() - timedelta(days=900)  # 30 months ago (expired)

q2_result = assessor.calculate_overall_risk('VendorX_Q2', q2_inputs)
print(f"\nQ2 Assessment (SOC 2 expired - 30 months old):")
print(f"  Overall Score: {q2_result['overall_score']}/100")
print(f"  Risk Level: {q2_result['risk_level']}")

# Calculate drop
score_drop = q1_result['overall_score'] - q2_result['overall_score']
print(f"\n⚠️ ALERT: Risk score dropped {score_drop:.1f} points")
print(f"Action Required: Request updated SOC 2 Type II report from vendor")

# Expected output:
# Q1: ~92-95 (LOW RISK)
# Q2: ~82-85 (MEDIUM RISK due to expired SOC 2)
# Drop: ~10 points (triggers alert)

## Section 9: Hands-On - Assess Multiple Vendors

Let's assess a realistic vendor portfolio for a GCC RAG system:

1. **OpenAI** - LLM API provider
2. **Pinecone** - Vector database
3. **AWS** - Cloud infrastructure (with breach history)
4. **Hypothetical Risky Vendor** - Example of vendor that should be rejected

In [None]:
# Cell 17: Load example vendor data from JSON
import json

# Load example data
with open('../example_data.json', 'r') as f:
    example_data = json.load(f)

print(f"Loaded {len(example_data['vendors'])} example vendors:\n")
for vendor in example_data['vendors']:
    print(f"  - {vendor['vendor_name']}: {vendor['description']}")

# Expected output:
# - OpenAI: LLM API provider for RAG systems
# - Pinecone: Vector database for embeddings storage
# - AWS: Cloud infrastructure provider
# - Hypothetical_Risky_Vendor: Example of high-risk vendor

In [None]:
# Cell 18: Assess all vendors from example data

# Create fresh assessor for batch assessment
batch_assessor = VendorRiskAssessment()

print("Assessing All Vendors...\n")
print("=" * 80)

for vendor_data in example_data['vendors']:
    vendor_name = vendor_data['vendor_name']
    
    # Convert date strings to datetime objects
    inputs = vendor_data.copy()
    if inputs.get('soc2_date'):
        inputs['soc2_date'] = datetime.fromisoformat(inputs['soc2_date'])
    if inputs.get('pentest_date'):
        inputs['pentest_date'] = datetime.fromisoformat(inputs['pentest_date'])
    if inputs.get('audit_date'):
        inputs['audit_date'] = datetime.fromisoformat(inputs['audit_date'])
    
    # Remove metadata fields
    inputs.pop('description', None)
    
    # Assess vendor
    result = batch_assessor.calculate_overall_risk(vendor_name, inputs)
    
    # Display summary
    print(f"\n{vendor_name}:")
    print(f"  Overall Score: {result['overall_score']}/100")
    print(f"  Risk Level: {result['risk_level']}")
    print(f"  Recommendation: {result['recommendation']}")
    print(f"  Category Scores: Security={result['category_scores']['security']}, "
          f"Privacy={result['category_scores']['privacy']}, "
          f"Compliance={result['category_scores']['compliance']}")

print("\n" + "=" * 80)
print(f"\n✓ Assessed {len(batch_assessor.vendors)} vendors")

# Expected output:
# OpenAI: 90+ (LOW RISK) - APPROVED
# Pinecone: 80-90 (MEDIUM RISK) - APPROVED WITH CONDITIONS
# AWS: 75-85 (MEDIUM RISK) - Has 1 breach but strong certifications
# Hypothetical_Risky_Vendor: <50 (CRITICAL RISK) - REJECTED

## Section 10: Generate Reports for Stakeholders

GCC compliance teams need to present vendor risk to different audiences:

1. **CFO:** Cost-vs-risk trade-offs, ROI calculations
2. **CTO:** Technical risk details, vendor comparison
3. **Auditors:** Compliance evidence, assessment methodology
4. **Legal:** DPA clause analysis, liability assignments

In [None]:
# Cell 20: Generate Summary Report

# Get report as DataFrame
report_df = batch_assessor.generate_report(output_format='dataframe')

print("Vendor Risk Assessment Summary Report")
print("=" * 80)
print(report_df.to_string(index=False))

# Calculate summary statistics
print("\n\nSummary Statistics:")
print("=" * 80)
print(f"Total Vendors Assessed: {len(report_df)}")
print(f"Average Overall Score: {report_df['Overall Score'].mean():.1f}/100")
print(f"\nRisk Level Distribution:")
risk_counts = report_df['Risk Level'].value_counts()
for risk_level, count in risk_counts.items():
    print(f"  {risk_level}: {count} vendor(s)")

# Identify vendors requiring action
print("\n\nAction Required:")
print("=" * 80)
critical_vendors = report_df[report_df['Overall Score'] < 50]
if len(critical_vendors) > 0:
    print(f"⚠️ CRITICAL: {len(critical_vendors)} vendor(s) should be REJECTED:")
    for _, vendor in critical_vendors.iterrows():
        print(f"   - {vendor['Vendor']} (Score: {vendor['Overall Score']})")
else:
    print("✓ No critical-risk vendors")

high_risk_vendors = report_df[(report_df['Overall Score'] >= 50) & (report_df['Overall Score'] < 70)]
if len(high_risk_vendors) > 0:
    print(f"\n⚠️ HIGH RISK: {len(high_risk_vendors)} vendor(s) require mitigation plans:")
    for _, vendor in high_risk_vendors.iterrows():
        print(f"   - {vendor['Vendor']} (Score: {vendor['Overall Score']})")

# Expected output:
# Table with 4 vendors sorted by risk score
# Summary showing distribution of risk levels
# Action items for critical/high-risk vendors

In [None]:
# Cell 21: Generate Excel Report (Optional)

# Generate Excel report for CFO presentation
excel_result = batch_assessor.generate_report(output_format='excel')

print("Excel Report Generated:")
print(f"  Filename: {excel_result['filename']}")
print(f"  Vendors: {excel_result['rows']}")
print("\n✓ Report ready for CFO/auditor presentation")

# Expected output:
# Filename: vendor_risk_assessment_20250116.xlsx
# Vendors: 4

## Section 11: Real-World Case Studies - Lessons Learned

### Case Study 1: Capital One Breach via AWS (2019)

**Vendor:** AWS (cloud provider)

**Issue:** Misconfigured AWS firewall exposed 100M customer records

**Who was responsible?** Capital One, despite the misconfiguration being in AWS infrastructure

**Consequences:**
- **$80M** OCC (Office of the Comptroller of the Currency) fine
- **$190M** class action settlement
- CTO resignation
- Reputational damage

**Lesson for GCCs:** Cloud provider misconfiguration = **your breach**. You're responsible for configuring vendor services correctly, even if vendor provides the tool.

**How vendor risk assessment helps:**
- AWS would score MEDIUM RISK (breach history deducts 10 points from Security)
- Triggers additional controls: configuration audits, cloud security posture management (CSPM)
- Acceptance criteria: Use AWS but implement guardrails

---

### Case Study 2: Facebook-Cambridge Analytica (2018)

**Vendor:** Cambridge Analytica (data processor)

**Issue:** Facebook shared user data with Cambridge Analytica, who misused it for political profiling

**Who was responsible?** Facebook, despite Cambridge Analytica being the processor

**Consequences:**
- **£500K** UK ICO fine
- **$5B** US FTC fine
- CEO testimony before Congress

**Lesson for GCCs:** Vendor data handling violations are **your violations**. DPA must explicitly prohibit data resale/reuse.

**How vendor risk assessment helps:**
- Privacy evaluation checks for DPA clause: "Purpose limitation - vendor can ONLY use data for specified purposes"
- Vendors without this clause score low on Privacy category
- Rejection criteria: No DPA clause 2 = cannot use for personal data

---

### Case Study 3: SolarWinds Supply Chain Attack (2020)

**Vendor:** SolarWinds (software provider)

**Issue:** Orion software update mechanism compromised by nation-state attackers

**Scope:** 18,000 organizations affected, including US government agencies

**Consequences:**
- Classified data accessed
- Multi-year remediation
- Congressional hearings

**Lesson for GCCs:** Vendor compromises **propagate to all customers**. Subprocessor tracking critical.

**How vendor risk assessment helps:**
- Data Residency evaluation tracks subprocessors
- Continuous monitoring watches for vendor security incidents
- Mitigation: Require vendors to notify of subprocessor changes, assess subprocessor risk scores

---

### Case Study 4: MOVEit Transfer Vulnerability (2023)

**Vendor:** Progress Software (file transfer tool)

**Issue:** Zero-day vulnerability in widely-used file transfer software

**Scope:** 2,600+ organizations, 77M+ individuals affected

**Consequences:**
- Mass data exfiltration
- Regulatory notifications across multiple countries
- Class action lawsuits

**Lesson for GCCs:** Single vulnerable vendor tool affects **thousands simultaneously**. Software supply chain monitoring mandatory.

**How vendor risk assessment helps:**
- Security evaluation requires annual penetration testing
- Continuous monitoring tracks vendor security advisories
- Mitigation: Patch management SLAs, alternative vendor readiness

## Section 12: Key Takeaways & Next Steps

### Key Takeaways

1. **Third-party risk = Your risk** - GDPR Article 28 establishes shared responsibility

2. **5-Category Weighted Matrix:**
   - Security (30%): Breaches are most common vendor risk
   - Privacy (25%): GDPR fines up to 4% global revenue
   - Compliance (20%): Industry certifications prove baseline
   - Reliability (15%): Downtime costly but less catastrophic
   - Data Residency (10%): Important for regulated industries

3. **DPA is LEGALLY REQUIRED** - 12 essential clauses for GDPR Article 28 compliance

4. **Subprocessor tracking mandatory** - Vendors' vendors process your data indirectly

5. **Continuous monitoring required** - Risk not static (certifications expire, incidents occur)

6. **Scale via automation:** Replace 2 FTE analysts (₹1.34Cr/month) with automated framework (₹10K/month)

### ROI Summary

**Year 1 Costs:** ₹2.4L-3.6L (implementation + infrastructure + monitoring)

**Year 1 Benefits:**
- Prevent 1 medium vendor breach: **₹2Cr saved**
- Replace 2 FTE analysts: **₹16L saved**

**ROI:** 5.5× to 8× in year one

### Common Failures & How to Avoid

| Failure | Prevention |
|---------|------------|
| **DPA missing essential clauses** | Automated DPA checker before signing |
| **Certification lapses undetected** | Quarterly re-evaluation alerts |
| **Subprocessor breach** | Subprocessor registry at all levels |
| **Vendor with 3+ breaches approved** | Security category auto-fails with 3+ breaches |

### Next Steps

1. **Assess your current vendors:**
   - List all vendors processing personal data
   - Gather SOC 2 reports, DPAs, SLA metrics
   - Run assessments using this framework

2. **Identify high-risk vendors:**
   - Score <70 = requires mitigation plan
   - Score <50 = seek alternative vendor

3. **Implement continuous monitoring:**
   - Set up quarterly re-evaluation calendar
   - Monitor vendor status pages weekly
   - Track certification expiration dates

4. **Integrate with existing processes:**
   - Link to M4.1 Model Cards (extract vendor dependencies)
   - Link to M3.2 Audit Logs (vendor access monitoring)
   - Link to M2.1 Secrets (vendor API key management)

### Next Module: M4.3 Change Management & Compliance

**Prerequisites satisfied by this module:**
- ✅ Understand vendor risk assessment framework
- ✅ Can evaluate vendors using 5-category matrix
- ✅ Can identify missing DPA clauses
- ✅ Can track subprocessors

**M4.3 builds on this:**
- When vendor changes (contract, subprocessor, certification), trigger **change management process**
- Compliance change = **re-evaluate risk score** + **update documentation** + **notify stakeholders**
- Implement **approval workflows** for risk level changes (LOW → MEDIUM requires approval)

---

**Congratulations!** You've completed L3 M4.2: Vendor Risk Assessment. You can now systematically evaluate and monitor third-party vendors for GCC RAG systems.