# L3 M11.3: Database Isolation & Cross-Tenant Security

## Learning Arc

**Purpose:** Build defense-in-depth multi-tenant isolation systems that prevent cross-tenant data leaks at the database, vector store, cache, and storage layers.

**Concepts Covered:**
- PostgreSQL Row-Level Security (RLS) for 99.9% isolation
- Namespace-Based Isolation in Pinecone for 99.95% isolation
- Separate Database Per Tenant for 99.999% isolation
- Cross-Tenant Security Testing with 1,000+ adversarial queries
- Defense-in-Depth: Multiple isolation layers
- Tenant Context Management (never trust user input)
- Redis Cache Isolation with key prefixing
- S3 Prefix Isolation for document storage
- Audit Logging for all data access
- Incident Response for isolation breaches
- Cost vs. Security Trade-offs (₹5L to ₹50L/month)
- Real Production Failures (CVE-2022-1552, namespace typos, performance degradation)

**After Completing This Notebook:**
- You will understand how to implement PostgreSQL RLS policies that enforce tenant_id filtering on every query
- You can build namespace-based isolation in vector databases with validation logic
- You will design automated cross-tenant leak testing to catch isolation bugs before production
- You can choose the right isolation strategy based on cost, security requirements, and scale
- You will implement defense-in-depth patterns where multiple layers protect against leaks
- You can respond to isolation incidents with proper audit trails and investigation procedures
- You will recognize when NOT to use multi-tenancy (1-5 tenants, extreme customization, physical separation requirements)

**Context in Track L3.M11:**
This module builds on **M11.2 Tenant Registry & Lifecycle** and prepares you for **M11.4 Rate Limiting & Resource Quotas**. You now have tenant identities (M11.2) and secure data isolation (M11.3), next you'll enforce fair resource usage across tenants (M11.4).

In [None]:
# Cell 2: Environment Setup
import os
import sys

# Add src to path for imports
if '../src' not in sys.path:
    sys.path.insert(0, '../src')

# OFFLINE mode for L3 consistency (no external services required)
OFFLINE = os.getenv("OFFLINE", "true").lower() == "true"

# Multi-service detection from environment
POSTGRES_ENABLED = os.getenv("POSTGRES_ENABLED", "false").lower() == "true"
PINECONE_ENABLED = os.getenv("PINECONE_ENABLED", "false").lower() == "true"
REDIS_ENABLED = os.getenv("REDIS_ENABLED", "false").lower() == "true"
AWS_ENABLED = os.getenv("AWS_ENABLED", "false").lower() == "true"

if OFFLINE or not any([POSTGRES_ENABLED, PINECONE_ENABLED, REDIS_ENABLED, AWS_ENABLED]):
    print("⚠️ Running in OFFLINE/SERVICE_DISABLED mode")
    print("   → External API calls will be skipped")
    print("   → Set service flags in .env to enable:")
    print("      - POSTGRES_ENABLED=true (RLS strategy)")
    print("      - PINECONE_ENABLED=true (Namespace strategy)")
    print("      - REDIS_ENABLED=true (Cache isolation)")
    print("      - AWS_ENABLED=true (S3 prefix isolation)")
else:
    print("✓ Online mode - external services enabled")
    print(f"   - PostgreSQL: {POSTGRES_ENABLED}")
    print(f"   - Pinecone: {PINECONE_ENABLED}")
    print(f"   - Redis: {REDIS_ENABLED}")
    print(f"   - AWS: {AWS_ENABLED}")

print("\n✓ Environment setup complete")

## Section 1: The Cross-Tenant Data Leak Nightmare

### The 2:47 AM Incident

**Scenario:** Your GCC platform supports 50 tenants (Finance, Legal, HR, Marketing, etc.). Each has their own RAG workspace. But at 2:47 AM, the Legal VP messages:

> "Why is Finance seeing our privileged attorney-client documents in their RAG queries? We're testing a $500M merger - this data cannot leak. We need answers NOW."

**What happened:**
- Finance ran a routine embedding search at 2:43 AM
- The vector database returned 5 documents - ALL from Legal's privileged namespace
- Finance shouldn't have access
- But your 'multi-tenant' system leaked cross-tenant data

**Consequences:**
- Legal VP threatening to shut down entire GCC platform
- CFO asking if this violates SOX controls
- CISO preparing incident report

**The brutal truth:** Tenant registry alone doesn't prevent data leakage. You need isolation at EVERY data layer:
- PostgreSQL rows (RLS policies)
- Vector database namespaces (Pinecone, Qdrant, Chroma)
- S3 buckets (prefix isolation)
- Redis caches (key prefixing)

Miss one layer, and you have a compliance incident.

### What We're Building Today

**Defense-in-Depth Multi-Tenant Isolation System** with three complete strategies:

1. **PostgreSQL Row-Level Security (RLS)**
   - Single shared database, tenant_id in every row
   - PostgreSQL policies enforce isolation automatically
   - Cost: ₹5L/month for 50 tenants
   - Isolation: 99.9%

2. **Namespace-Based Isolation (Pinecone)**
   - Separate namespaces per tenant (logical partitioning)
   - Namespace validation at query time
   - Cost: ₹15L/month for 50 tenants
   - Isolation: 99.95%

3. **Separate Database Per Tenant**
   - Complete physical isolation (50 PostgreSQL instances)
   - No shared resources, zero risk of policy bugs
   - Cost: ₹50L/month for 50 tenants
   - Isolation: 99.999%

By the end of this notebook:
- ✅ All three strategies implemented with working code
- ✅ Cross-tenant leak testing framework
- ✅ Decision criteria for choosing strategy
- ✅ Audit trails capturing every data access
- ✅ Real incident playbook

## Section 2: Tenant Context Management (Defense Layer 0)

### Critical Security Rule: Never Trust User Input for Tenant ID

**Bad (Vulnerable):**
```python
# DON'T DO THIS - Attacker can change URL parameter
tenant_id = request.get('tenant_id')  # From ?tenant_id=legal
namespace = f'tenant-{tenant_id}'
```

**Good (Secure):**
```python
# Get tenant_id from verified JWT (not user input)
tenant_id = jwt.decode(token)['tenant_id']
# Validate tenant exists in registry
tenant = registry.get_tenant(tenant_id)
if not tenant:
    raise Unauthorized()
```

Let's implement proper tenant context management:

In [None]:
import uuid
from l3_m11_multi_tenant_foundations import TenantContextManager

# Initialize tenant context manager
ctx_mgr = TenantContextManager()

# Example: Set tenant context (in production: from JWT)
tenant_id = uuid.uuid4()
print(f"Setting tenant context: {tenant_id}")
ctx_mgr.set_tenant_context(tenant_id)

# Get tenant context (used in every query)
current_tenant = ctx_mgr.get_tenant_context()
print(f"Current tenant: {current_tenant}")

# Clear context after request
ctx_mgr.clear_tenant_context()
print("✓ Tenant context cleared")

# Expected: Setting tenant context, then getting it, then clearing

In [None]:
# Test: What happens if we forget to set tenant context?
ctx_mgr_test = TenantContextManager()

try:
    # This should FAIL (security violation)
    current_tenant = ctx_mgr_test.get_tenant_context()
except RuntimeError as e:
    print(f"✓ Security check worked: {e}")
    print("   → Queries without tenant context are blocked")

# Expected: RuntimeError - "Tenant context not set - security violation!"

**Key Takeaway:** Every query MUST have tenant context set. No exceptions, even for admin users.

## Section 3: Strategy 1 - PostgreSQL Row-Level Security (RLS)

### How RLS Works

**Concept:** PostgreSQL policies enforce tenant_id filtering on EVERY query, even if developer forgets WHERE clause.

**RLS Policy:**
```sql
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON documents
USING (
    tenant_id = current_setting('app.tenant_id')::uuid
);
```

**Session Context:**
```sql
-- Set tenant context BEFORE every query
SET LOCAL app.tenant_id = '11111111-1111-1111-1111-111111111111';

-- Query (no WHERE tenant_id needed - RLS enforces automatically)
SELECT * FROM documents WHERE title LIKE '%contract%';
-- Returns ONLY documents where tenant_id matches session variable
```

### Cost vs. Security Trade-off

- **Cost:** ₹5L/month for 50 tenants (₹10K per tenant)
- **Isolation:** 99.9% (depends on policy correctness)
- **Risk:** Policy bugs can cause leaks (e.g., CVE-2022-1552)
- **When to use:** Low-risk data, cost-sensitive, < 50 tenants

In [None]:
from l3_m11_multi_tenant_foundations import PostgresRLSManager

# Initialize RLS manager (offline mode - no actual PostgreSQL)
postgres_mgr = PostgresRLSManager(connection_pool=None)

# Example: Query documents with RLS enforcement
tenant_finance = uuid.UUID('11111111-1111-1111-1111-111111111111')

if OFFLINE or not POSTGRES_ENABLED:
    print("⚠️ PostgreSQL not available - demonstrating logic")
    print("   In production with PostgreSQL enabled:")
    print(f"   1. Set session: SET LOCAL app.tenant_id = '{tenant_finance}'")
    print("   2. Query: SELECT * FROM documents WHERE title LIKE '%merger%'")
    print("   3. RLS policy automatically filters to tenant_id match")
    print("   4. Returns ONLY Finance's documents (99.9% isolation)")
else:
    # Actual query with RLS (requires PostgreSQL)
    docs = postgres_mgr.query_documents(tenant_finance, "merger")
    print(f"Found {len(docs)} documents for Finance tenant")
    for doc in docs:
        print(f"  - {doc['title']} (tenant: {doc['tenant_id']})")

# Expected: Offline message or actual query results (tenant-filtered)

### Real Failure: CVE-2022-1552 (PostgreSQL RLS Bypass)

**The Bug:** PostgreSQL 14.3 had RLS bypass vulnerability where attackers could manipulate session variables.

**Impact:** Tenants could potentially see each other's data if application allowed user to set arbitrary session variables.

**Fix:** PostgreSQL patched in 14.4.

**Lesson:** Even battle-tested features have bugs. That 0.1% isolation gap is why you need defense-in-depth.

## Section 4: Strategy 2 - Namespace-Based Isolation (Pinecone)

### How Namespace Isolation Works

**Concept:** Separate Pinecone namespaces per tenant. Queries MUST specify namespace, and vector store enforces isolation.

**Pattern:**
```python
# Construct namespace
namespace = f'tenant-{tenant_id}'  # e.g., 'tenant-11111111-...'

# Query specific namespace ONLY
results = index.query(
    vector=query_embedding,
    namespace=namespace,  # Enforced isolation
    top_k=5
)
# Returns ONLY vectors from this tenant's namespace
```

### Cost vs. Security Trade-off

- **Cost:** ₹15L/month for 50 tenants (₹30K per tenant)
- **Isolation:** 99.95% (stronger than RLS, namespace enforced by vector store)
- **Risk:** Namespace typos cause silent failures (see real incident below)
- **When to use:** Standard GCC RAG, balance cost vs security, 10-100 tenants

In [None]:
from l3_m11_multi_tenant_foundations import PineconeNamespaceManager

# Initialize namespace manager (offline mode)
pinecone_mgr = PineconeNamespaceManager(pinecone_client=None, index_name="multi-tenant-rag")

# Example: Validate namespace
tenant_legal = uuid.UUID('22222222-2222-2222-2222-222222222222')
namespace = pinecone_mgr.validate_namespace(tenant_legal)
print(f"✓ Validated namespace: {namespace}")
print(f"   Format: tenant-{tenant_legal}")
print(f"   This namespace is isolated from all other tenants")

# Mock query vector (1536-dim for OpenAI embeddings)
query_vector = [0.1] * 1536

if OFFLINE or not PINECONE_ENABLED:
    print("\n⚠️ Pinecone not available - demonstrating logic")
    print("   In production with Pinecone enabled:")
    print(f"   1. Validate namespace: '{namespace}'")
    print("   2. Query: index.query(vector=..., namespace=namespace, top_k=5)")
    print("   3. Pinecone enforces isolation - cannot access other namespaces")
    print("   4. Returns ONLY Legal's vectors (99.95% isolation)")
else:
    # Actual query (requires Pinecone)
    results = pinecone_mgr.query_vectors(tenant_legal, query_vector, top_k=5)
    print(f"\nFound {len(results['matches'])} matches in namespace {results['namespace']}")

# Expected: Validated namespace and offline message or actual results

### Real Failure: Namespace Typo (2024 GCC Incident)

**The Bug:**
```python
# Developer accidentally added trailing dash
namespace = f'tenant-{tenant_id}-'  # WRONG!
# Should be: f'tenant-{tenant_id}'
```

**Result:**
- All queries returned 0 results
- Namespace `tenant-uuid-` doesn't exist
- Pinecone returns empty (not error)
- Tenants thought system was broken

**Impact:** 2-hour outage, ₹5L revenue loss (SLA credits)

**Fix:** Always validate namespace format BEFORE query

**Lesson:** String manipulation is error-prone. Validate before using.

In [None]:
# Test: What happens with invalid namespace?
try:
    # This should FAIL validation
    invalid_namespace = pinecone_mgr.validate_namespace("not-a-uuid")
except (ValueError, TypeError) as e:
    print(f"✓ Namespace validation caught error: {e}")
    print("   → Invalid namespaces are blocked before query")

# Expected: ValueError - namespace validation failed

## Section 5: Strategy 3 - Separate Database Per Tenant

### How Physical Isolation Works

**Concept:** Each tenant gets their own PostgreSQL database. Complete separation - no shared resources.

**Pattern:**
```python
# Tenant provisioning (via Terraform in production)
tenant_db = f'tenant_{tenant_id.replace("-", "_")}'

# Create dedicated connection pool for this tenant
connection_pool = psycopg2.pool.SimpleConnectionPool(
    1, 10,
    host='tenant-db-{tenant_id}.rds.amazonaws.com',
    database=tenant_db
)

# Query tenant's database (no tenant_id filter needed)
SELECT * FROM documents WHERE title LIKE '%contract%';
# Returns tenant's data - physically impossible to access other tenants
```

### Cost vs. Security Trade-off

- **Cost:** ₹50L/month for 50 tenants (₹1L per tenant)
- **Isolation:** 99.999% (only hardware failure can cause leak)
- **Risk:** Operational complexity (50 databases to manage)
- **When to use:** High-value data (₹10Cr+ breach cost), HIPAA/PCI-DSS, < 10 high-security tenants

In [None]:
from l3_m11_multi_tenant_foundations import SeparateDatabaseManager

# Initialize separate database manager
sep_db_mgr = SeparateDatabaseManager()

tenant_hr = uuid.UUID('33333333-3333-3333-3333-333333333333')

print("Separate Database Strategy (Physical Isolation)")
print("=" * 50)
print(f"Tenant: HR Department ({tenant_hr})")
print(f"Database: tenant_{str(tenant_hr).replace('-', '_')}")
print(f"Endpoint: tenant-db-{tenant_hr}.rds.amazonaws.com")
print("\nIsolation: 99.999%")
print("Cost: ₹1L/month per tenant")
print("\nAdvantages:")
print("  ✓ Complete physical separation")
print("  ✓ No RLS policy bugs possible")
print("  ✓ No namespace typos possible")
print("  ✓ Blast radius contained (one tenant's issue doesn't affect others)")
print("\nDisadvantages:")
print("  ✗ 4× more expensive than namespace isolation")
print("  ✗ Operational complexity (50 databases to manage)")
print("  ✗ Slow tenant onboarding (10-15 min to provision)")
print("  ✗ Requires Terraform/IaC expertise")

# Expected: Description of separate database strategy

### Hybrid Approach (Most Common in Production)

Most GCCs use **hybrid isolation** based on tenant risk:

- **High-risk tenants (5):** Finance, Legal → Separate Database (₹5L/month)
- **Standard tenants (45):** HR, Marketing, IT → Namespace Isolation (₹13.5L/month)
- **Total cost:** ₹18.5L/month vs. ₹50L for all separate DBs

**Savings:** ₹31.5L/month (63% cost reduction) while maintaining highest security for critical data.

## Section 6: Defense-in-Depth (Multiple Layers)

### Why Single-Layer Isolation Fails

**Scenario:** You trust RLS policies to prevent cross-tenant access.

**What breaks:**
1. Developer forgets to set tenant context → Query returns 0 results (or error)
2. RLS policy has bug (CVE-2022-1552) → Attacker bypasses isolation
3. Admin user has BYPASSRLS privilege → Accidentally sees all tenants' data
4. Application has SQL injection → Attacker manipulates session variables

### Defense-in-Depth Pattern

**Layer 1: Application (Tenant Context)**
- Verify tenant_id from JWT (not user input)
- Set tenant context before EVERY query
- Audit log all access attempts

**Layer 2: Database (RLS Policies)**
- Enforce tenant_id filtering at database level
- No admin exceptions (no BYPASSRLS)
- Continuous testing (1,000+ adversarial queries)

**Layer 3: Network (VPC Isolation)**
- Separate VPCs for high-security tenants
- Firewall rules blocking cross-tenant traffic
- Intrusion detection for anomalous access

**Layer 4: Monitoring (Alerting)**
- Detect cross-tenant queries (user accessing multiple tenant_ids)
- Alert on RLS policy changes
- Weekly security audits

**Result:** If one layer fails, others still protect. No single point of failure.

## Section 7: Cross-Tenant Security Testing

### Automated Leak Detection

**Goal:** Run 1,000+ adversarial queries to validate isolation BEFORE production deployment.

**Test Patterns:**
1. Query as Tenant A, verify results only contain Tenant A's data
2. Query with wrong namespace, verify returns error or 0 results
3. Query without tenant context, verify raises security error
4. Attempt SQL injection in search pattern, verify sanitized
5. Admin user query, verify audit log captures access

**Frequency:** Run daily (automated via CI/CD)

In [None]:
from l3_m11_multi_tenant_foundations import CrossTenantSecurityTests

# Generate test tenant IDs
test_tenants = [
    uuid.UUID('11111111-1111-1111-1111-111111111111'),  # Finance
    uuid.UUID('22222222-2222-2222-2222-222222222222'),  # Legal
    uuid.UUID('33333333-3333-3333-3333-333333333333'),  # HR
    uuid.UUID('44444444-4444-4444-4444-444444444444'),  # Marketing
    uuid.UUID('55555555-5555-5555-5555-555555555555'),  # IT
]

# Mock isolation manager for testing
class MockIsolationManager:
    """Mock manager that returns correct tenant-scoped data."""
    def query_documents(self, tenant_id, pattern):
        # Simulate correct isolation (returns data for querying tenant only)
        return [
            {"id": f"doc_{i}", "title": f"Document {i}", "tenant_id": str(tenant_id)}
            for i in range(3)
        ]

mock_mgr = MockIsolationManager()
security_tester = CrossTenantSecurityTests(mock_mgr)

print("Running Cross-Tenant Security Tests...")
print("=" * 50)
print(f"Testing {len(test_tenants)} tenants")
print("Pattern: Query as Tenant A, verify no Tenant B data returned\n")

# Run adversarial tests
results = security_tester.run_adversarial_tests(test_tenants)

print(f"\nTest Results:")
print(f"  Total tests: {results['total_tests']}")
print(f"  Passed: {results['passed']} ✓")
print(f"  Failed: {results['failed']} ✗")
print(f"  Isolation effective: {results['failed'] == 0}")

if results['failures']:
    print(f"\n⚠️ SECURITY BREACH DETECTED:")
    for failure in results['failures'][:3]:  # Show first 3
        print(f"   - {failure['test']}: Tenant {failure['tenant']} leaked data from {failure['leaked_from']}")
else:
    print("\n✓ All security tests passed - isolation is effective")

# Expected: All tests passed (mock manager returns correct data)

### Production Security Test Schedule

**Daily (Automated):**
- Run 1,000+ adversarial queries
- Verify RLS policies still active
- Check namespace validation logic
- Alert on any failures

**Weekly (Manual):**
- Review audit logs for anomalies
- Penetration testing (security team)
- Update test scenarios with new attack patterns

**After Every Deployment:**
- Re-run all security tests
- Block deployment if any test fails
- Require security sign-off

## Section 8: Audit Logging & Incident Response

### Why Audit Logs Matter

**Scenario:** Security test detects cross-tenant leak in production.

**Questions to answer:**
1. Who accessed the leaked data?
2. When did the leak start?
3. What data was exposed?
4. How many tenants affected?

**Without audit logs:** You can't answer these questions. Legal and compliance nightmare.

**With audit logs:** You know exactly what happened, who to notify, and how to fix.

In [None]:
from l3_m11_multi_tenant_foundations import AuditLogger

# Initialize audit logger
audit_logger = AuditLogger()

# Simulate data access events
print("Logging tenant data access events...")
print("=" * 50)

# Finance queries documents
audit_logger.log_access(
    tenant_id=uuid.UUID('11111111-1111-1111-1111-111111111111'),
    user_id="finance_user_123",
    action="query_rls",
    resource="documents",
    result="success"
)
print("✓ Logged: Finance user queried documents (success)")

# Legal queries vectors
audit_logger.log_access(
    tenant_id=uuid.UUID('22222222-2222-2222-2222-222222222222'),
    user_id="legal_user_456",
    action="query_namespace",
    resource="vectors",
    result="success"
)
print("✓ Logged: Legal user queried vectors (success)")

# HR attempts cross-tenant access (BLOCKED)
audit_logger.log_access(
    tenant_id=uuid.UUID('33333333-3333-3333-3333-333333333333'),
    user_id="hr_user_789",
    action="query_rls",
    resource="documents",
    result="blocked - cross-tenant attempt"
)
print("✓ Logged: HR user attempted cross-tenant access (BLOCKED)")

# Get all audit logs
print("\nAudit Trail:")
all_logs = audit_logger.get_audit_trail()
for log in all_logs:
    print(f"  [{log['timestamp']}] Tenant {log['tenant_id'][:8]}...")
    print(f"    User: {log['user_id']} | Action: {log['action']} | Result: {log['result']}")

# Get logs for specific tenant (incident investigation)
hr_tenant = uuid.UUID('33333333-3333-3333-3333-333333333333')
hr_logs = audit_logger.get_audit_trail(hr_tenant)
print(f"\nHR Tenant Activity ({hr_tenant}):")
print(f"  {len(hr_logs)} events logged")
for log in hr_logs:
    print(f"    - {log['action']}: {log['result']}")

# Expected: 3 audit log entries, with HR's blocked attempt highlighted

### Incident Response Playbook

**Step 1: Detection (0-5 minutes)**
- Security test fails OR monitoring alerts on cross-tenant query
- Page on-call engineer immediately

**Step 2: Investigation (5-30 minutes)**
- Query audit logs for affected tenants
- Identify: Who accessed what, when, how much data exposed
- Determine root cause (RLS bug, namespace typo, missing context, etc.)

**Step 3: Containment (30-60 minutes)**
- Disable affected endpoint OR roll back deployment
- Verify no ongoing leaks (re-run security tests)
- Preserve logs for investigation

**Step 4: Notification (1-4 hours)**
- Notify affected tenants (be transparent, show audit logs)
- Notify legal/compliance (incident report)
- Notify CISO (breach disclosure timeline)

**Step 5: Remediation (4-24 hours)**
- Fix root cause (patch RLS policy, fix namespace validation, etc.)
- Deploy fix with security tests
- Re-enable endpoint

**Step 6: Post-Mortem (1-2 weeks)**
- Write incident report (root cause, timeline, impact, lessons)
- Add test case for this failure pattern
- Update security testing to catch similar issues

## Section 9: Redis Cache Isolation & S3 Prefix Isolation

### Redis Cache Isolation (Key Prefixing)

**Pattern:** Prefix every cache key with tenant UUID.

```python
# Construct tenant-scoped key
cache_key = f'tenant:{tenant_id}:user_session'

# Set cache
redis.setex(cache_key, 3600, session_data)

# Get cache (automatically scoped to tenant)
session_data = redis.get(cache_key)
```

**Security:** Tenant cannot access other tenants' cache keys (prefix mismatch).

In [None]:
from l3_m11_multi_tenant_foundations import RedisIsolationManager

# Initialize Redis manager (offline mode)
redis_mgr = RedisIsolationManager(redis_client=None)

tenant_marketing = uuid.UUID('44444444-4444-4444-4444-444444444444')

# Demonstrate key prefixing
cache_key = redis_mgr.get_tenant_key(tenant_marketing, "user_session_abc123")
print(f"Tenant-scoped cache key: {cache_key}")
print(f"Format: tenant:{tenant_marketing}:user_session_abc123")
print("\nSecurity: Other tenants cannot access this key (prefix mismatch)")

if OFFLINE or not REDIS_ENABLED:
    print("\n⚠️ Redis not available - demonstrating logic")
    print("   In production with Redis enabled:")
    print("   1. Construct key: f'tenant:{uuid}:{key}'")
    print("   2. Set: redis.setex(key, ttl, value)")
    print("   3. Get: redis.get(key)")
    print("   4. Isolation: Prefix ensures tenant-scoped access")
else:
    # Actual Redis operations
    redis_mgr.set_cache(tenant_marketing, "user_session", "session_data_xyz", ttl=3600)
    value = redis_mgr.get_cache(tenant_marketing, "user_session")
    print(f"\nCached value: {value}")

# Expected: Demonstrated tenant-scoped key construction

### S3 Prefix Isolation

**Pattern:** Store documents under tenant-specific S3 prefix.

```python
# Construct S3 key with tenant prefix
s3_key = f'tenants/{tenant_id}/{document_id}/filename.pdf'

# Upload with tenant prefix
s3.put_object(Bucket='multi-tenant-docs', Key=s3_key, Body=content)

# IAM policy enforces prefix isolation
{
  "Effect": "Allow",
  "Action": "s3:GetObject",
  "Resource": "arn:aws:s3:::multi-tenant-docs/tenants/${tenant_id}/*"
}
```

**Security:** IAM policy blocks access to other tenants' prefixes at AWS level.

In [None]:
from l3_m11_multi_tenant_foundations import S3PrefixIsolationManager

# Initialize S3 manager (offline mode)
s3_mgr = S3PrefixIsolationManager(s3_client=None, bucket="multi-tenant-docs")

tenant_it = uuid.UUID('55555555-5555-5555-5555-555555555555')

# Demonstrate prefix construction
prefix = s3_mgr.get_tenant_prefix(tenant_it)
print(f"Tenant S3 prefix: {prefix}")
print(f"Format: tenants/{tenant_it}/")
print("\nExample S3 keys under this prefix:")
print(f"  - {prefix}doc123/architecture.pdf")
print(f"  - {prefix}doc456/security_plan.docx")
print(f"  - {prefix}doc789/backup_procedures.txt")
print("\nSecurity: IAM policy blocks access to other tenants' prefixes")

if OFFLINE or not AWS_ENABLED:
    print("\n⚠️ AWS S3 not available - demonstrating logic")
    print("   In production with AWS enabled:")
    print("   1. Construct key: f'tenants/{uuid}/{doc_id}/filename'")
    print("   2. Upload: s3.put_object(Bucket=..., Key=key, Body=content)")
    print("   3. IAM policy enforces prefix isolation at AWS level")
    print("   4. Isolation: Tenant cannot access other prefixes")

# Expected: Demonstrated tenant-scoped S3 prefix construction

## Section 10: Decision Card - Choosing the Right Strategy

### Decision Flowchart

**Step 1: What's the cost of a data leak?**

- **> ₹10Cr** (legal, healthcare, financial trading) → **Separate Database** (99.999% isolation)
- **₹1-10Cr** (general enterprise data) → **Namespace Isolation** (99.95% isolation)
- **< ₹1Cr** (low-sensitivity data) → **Row-Level Security** (99.9% isolation)

**Step 2: Does compliance require physical separation?**

- **YES** (HIPAA, PCI-DSS Level 1, defense contracts) → **Separate Database or Cluster-per-Tenant**
- **NO** (GDPR, SOX, general compliance) → **Namespace Isolation** sufficient

**Step 3: How many tenants?**

- **< 10 tenants** → **Separate Database** (manageable operational complexity)
- **10-100 tenants** → **Namespace Isolation** (best balance)
- **> 100 tenants** → **Hybrid Approach** (tier by risk)

**Step 4: What's your team's expertise?**

- **Senior PostgreSQL DBAs** → **RLS** (can write correct policies)
- **Mid-level engineers, strong on APIs** → **Namespace Isolation** (simpler conceptually)
- **Junior team or small team** → **Separate Database** (least risk of misconfiguration)

---

### Example Deployments

**Small GCC (10 business units, 500 users, 5M docs):**
- Strategy: RLS
- Monthly: ₹8L
- Per tenant: ₹80K/month
- Isolation: 99.9%

**Medium GCC (50 business units, 5K users, 50M docs):**
- Strategy: Namespace Isolation
- Monthly: ₹18L
- Per tenant: ₹36K/month
- Isolation: 99.95%

**Large GCC (100 business units, 20K users, 200M docs):**
- Strategy: Hybrid (5 separate DB, 95 namespace)
- Monthly: ₹35L
- Per tenant: ₹35K/month
- Isolation: 99.999% for high-risk, 99.95% for standard

---

### When to Upgrade

**From RLS to Namespace:**
- Security incident (leak detected in testing)
- Compliance requirement changed
- Performance degradation (> 50 tenants)
- Cost: ₹10L one-time migration

**From Namespace to Separate DB:**
- High-value tenant demands physical separation
- Regulatory requirement (HIPAA, PCI-DSS)
- Noisy neighbor problem
- Cost: ₹5L one-time migration per tenant

**Key Insight:** Start with namespace isolation for most GCCs. Upgrade specific high-risk tenants to separate DB as needed.

## Section 11: Common Failures & Fixes

### Failure 1: Forgot to Set Tenant Context

**Code:**
```python
# BUG: No set_tenant_context() called
results = db.query("SELECT * FROM documents WHERE title LIKE '%contract%'")
```

**Impact:** User complains "System shows no documents." Support escalates.

**Fix:**
```python
@require_tenant_context
def query_documents(tenant_id, pattern):
    # Decorator verifies tenant context set before allowing query
    ...
```

---

### Failure 2: RLS Policy Had Exception Clause

**Code:**
```sql
CREATE POLICY tenant_isolation ON documents
USING (
    tenant_id = current_setting('app.tenant_id')::uuid
    OR current_user = 'admin'  -- Exception for admin!
);
```

**Impact:** Admin user accidentally queries all tenants when debugging.

**Fix:** Remove exception - NO users bypass RLS. Create audited admin view instead.

---

### Failure 3: Namespace Constructed from User Input

**Code:**
```python
user_provided_tenant = request.get('tenant_id')  # From URL param!
namespace = f'tenant-{user_provided_tenant}'
```

**Impact:** Attacker changes URL: `?tenant_id=legal` and sees Legal's documents.

**Fix:**
```python
# Get tenant_id from JWT (not user input)
tenant_id = jwt.decode(token)['tenant_id']
tenant = registry.get_tenant(tenant_id)
if not tenant:
    raise Unauthorized()
```

---

### Failure 4: Performance Degradation (50+ Tenants)

**Symptom:** Query latency increases:
- 10 tenants: 50ms
- 50 tenants: 150ms
- 100 tenants: 400ms

**Why:** Index size growth, connection pool contention, cache thrashing.

**Fix:** Shard by tenant groups:
- DB1: Tenants 1-25
- DB2: Tenants 26-50
- DB3: Tenants 51-75

Now each DB has 25 tenants. Latency back to 80ms.

---

### Failure 5: Single Encryption Key for All Tenants

**Code:**
```python
KEY = os.environ['ENCRYPTION_KEY']
encrypted_doc = encrypt(document, KEY)
```

**Impact:** Key compromised → ALL tenants' data decryptable.

**Fix:** Tenant-specific encryption keys:
```python
tenant_key = get_tenant_encryption_key(tenant_id)  # From AWS Secrets Manager
encrypted_doc = encrypt(document, tenant_key)
```

**Cost:** ₹200/month per tenant (AWS KMS key cost). Worth it for breach containment.

## Section 12: Summary & Next Steps

### What You've Learned

✅ **Three isolation strategies:**
- PostgreSQL RLS (99.9%, ₹5L/month, cost-efficient)
- Pinecone Namespace (99.95%, ₹15L/month, balanced)
- Separate Database (99.999%, ₹50L/month, highest security)

✅ **Defense-in-depth:** Multiple layers protect even if one fails

✅ **Security testing:** 1,000+ adversarial queries to catch leaks before production

✅ **Audit logging:** Track every access for incident investigation

✅ **Real failures:** CVE-2022-1552, namespace typos, performance degradation

✅ **Decision criteria:** Choose strategy based on cost, compliance, scale, expertise

---

### Key Takeaways

**1. No isolation strategy is 100% reliable.**
- RLS: 99.9% (policy bugs possible)
- Namespace: 99.95% (typos cause failures)
- Separate DB: 99.999% (hardware failure only)

**That's why you need:**
- Multiple layers (defense-in-depth)
- Continuous testing (daily security tests)
- Audit logging (detect breaches fast)
- Incident response (know what to do when leak happens)

**2. Never trust user input for tenant_id.**
- Always extract from verified JWT
- Validate tenant exists in registry
- Set tenant context before EVERY query

**3. Most GCCs use hybrid approach.**
- High-risk tenants (Finance, Legal): Separate DB
- Standard tenants (HR, Marketing, IT): Namespace
- Cost savings: 63% vs. all separate DBs

**4. Performance matters at scale.**
- RLS slows down at > 50 tenants
- Shard by tenant groups (25 per DB)
- Monitor latency and plan for sharding

---

### Next Steps

**1. Set up local development environment:**
- Install PostgreSQL, Redis (local)
- Sign up for Pinecone (free tier)
- Run example code with real services

**2. Implement security tests in your codebase:**
- Copy `CrossTenantSecurityTests` class
- Add to CI/CD pipeline (run on every deployment)
- Block deployment if any test fails

**3. Add audit logging to all queries:**
- Log: tenant_id, user_id, action, resource, result
- Store in centralized logging (CloudWatch, Datadog)
- Set up alerts for cross-tenant queries

**4. Choose isolation strategy for your GCC:**
- Calculate cost of data leak
- Check compliance requirements
- Count tenants (current + 2-year projection)
- Use decision card to choose

**5. Plan for incident response:**
- Write incident response playbook
- Practice incident drills (tabletop exercises)
- Know who to notify (legal, compliance, CISO)

---

### Module M11.4 Preview: Rate Limiting & Resource Quotas

Now you have:
- ✅ Tenant identities (M11.2 Registry)
- ✅ Data isolation (M11.3 Database Security)

Next challenge: **Fair resource usage**
- What if one tenant runs 10,000 queries/sec?
- How do you prevent "noisy neighbor" problem?
- How do you enforce per-tenant quotas (CPU, memory, API calls)?

In M11.4, you'll implement:
- Token bucket rate limiting (per-tenant)
- Resource quotas (CPU, memory, storage)
- Throttling and backpressure
- Fair queueing across tenants

**See you in M11.4!**