# L3 M13.3: Cost Optimization Strategies

## Learning Arc

**Purpose:** Implement cost attribution and optimization for multi-tenant RAG platforms serving enterprise organizations. This module teaches you how to track per-tenant usage, calculate costs with volume discounts, generate CFO-ready invoices, and detect cost anomalies.

**Concepts Covered:**
- Cost Attribution Layers (Direct Costs, Overhead Costs, Allocation Methods)
- Multi-Component Cost Formula (LLM + Storage + Compute + Vector Operations)
- Volume Discount Tiers (0%, 15%, 30%, 40% based on usage)
- Usage Metering with Prometheus/StatsD
- Chargeback vs. Showback models
- Cost Anomaly Detection (>50% month-over-month spikes)
- Platform Economics and ROI calculation
- CFO-ready reporting and invoice generation

**After Completing This Notebook:**
- You will understand the four cost attribution layers and allocation methods
- You can implement usage metering tracking queries, storage, compute, and vector operations per tenant
- You will build a cost calculation engine applying multi-component formula with overhead allocation
- You can generate chargeback reports producing CFO-ready monthly invoices with cost breakdowns
- You will detect cost anomalies alerting on >50% cost spikes with root cause analysis
- You understand when to use cost attribution vs. simpler alternatives (usage caps, flat fees)
- You can validate cost attribution accuracy against actual cloud bills (¬±10% tolerance)

**Context in Track L3.M13:**
This module builds on M13.1 (caching strategies) and M13.2 (query optimization) by adding financial visibility to your performance optimizations. It prepares you for M13.4 (infrastructure scaling) by quantifying the cost impact of scaling decisions.

In [None]:
# Environment Setup
import os
import sys

# Add src to path for imports
if '../src' not in sys.path:
    sys.path.insert(0, '../src')
if '..' not in sys.path:
    sys.path.insert(0, '..')

# This module does NOT require external AI services
# It uses local processing with optional infrastructure (Prometheus, StatsD, PostgreSQL)
# All functionality works out-of-the-box using in-memory storage

print("‚úì Environment configured for L3 M13.3: Cost Optimization Strategies")
print("  ‚Üí No external AI services required (OpenAI, Anthropic, etc.)")
print("  ‚Üí Running in in-memory mode (perfect for learning)")
print("  ‚Üí Optional: Enable Prometheus/StatsD/PostgreSQL in .env for production")

## 1. Cost Attribution Layers

Cost attribution has four layers:

**Layer 1: Direct Costs** (measurable, tenant-specific usage)
- LLM API calls: \$0.002 per 1K tokens (~1 query)
- Storage: \$0.023 per GB/month
- Compute: \$0.05 per pod-hour
- Vector Operations: \$0.0001 per operation

**Layer 2: Overhead Costs** (shared platform expenses)
- Platform team salaries (DevOps, SRE)
- Monitoring tools (Prometheus, Grafana, PagerDuty)
- Shared infrastructure (load balancers, API gateways)
- Typical allocation: 15-25% of direct costs

**Layer 3: Allocation Methods**
- Usage-based: Overhead proportional to usage (our approach)
- Headcount-based: Overhead based on team size
- Revenue-based: Overhead based on business unit revenue

**Layer 4: Billing Models**
- **Chargeback:** Actual billing with budget transfers (real money moves)
- **Showback:** Transparency reporting without billing (visibility only)

**Multi-Component Cost Formula:**
```
Direct Costs + (20% Overhead) - Volume Discounts = Final Cost
```

In [None]:
# Import cost optimization components
from l3_m13_cost_optimization_strategies import (
    TenantUsageMetering,
    CostCalculationEngine,
    ChargebackReportGenerator,
    CostAnomalyDetector,
    UsageMetrics,
    VolumeDiscountTier,
    validate_cost_attribution
)
from datetime import datetime, timedelta

# Display cost constants
print("Cost Constants (as of 2025):")
print(f"  LLM API: ${TenantUsageMetering.LLM_COST_PER_QUERY} per query")
print(f"  Storage: ${TenantUsageMetering.STORAGE_COST_PER_GB} per GB/month")
print(f"  Compute: ${TenantUsageMetering.COMPUTE_COST_PER_POD_HOUR} per pod-hour")
print(f"  Vector Ops: ${TenantUsageMetering.VECTOR_COST_PER_OP} per operation")
print(f"  USD to INR: {TenantUsageMetering.USD_TO_INR}")
print(f"  Overhead Rate: {CostCalculationEngine.DEFAULT_OVERHEAD_RATE*100:.0f}%")

# Expected: Cost constants displayed

## 2. Volume Discount Tiers

Volume discounts incentivize platform usage and reward high-volume tenants:

| Tier | Monthly Queries | Discount |
|------|----------------|----------|
| 0    | < 10K          | 0%       |
| 1    | 10K - 100K     | 15%      |
| 2    | 100K - 1M      | 30%      |
| 3    | > 1M           | 40%      |

**Why volume discounts matter:**
- Rewards strategic tenants who drive platform adoption
- Improves platform economics (higher volume ‚Üí lower cost per query)
- Large tenants subsidize platform development ‚Üí everyone benefits
- Encourages consolidation (better to have 1 tenant @ 1M queries than 100 @ 10K)

In [None]:
# Display volume discount tiers
print("Volume Discount Tiers:")
for tier in VolumeDiscountTier:
    min_q, max_q, discount = tier.value
    max_display = f"{max_q:,}" if max_q != float('inf') else "‚àû"
    print(f"  {tier.name}: {min_q:,} - {max_display} queries ‚Üí {discount*100:.0f}% discount")

# Expected: Tier table displayed

## 3. Usage Metering

Track per-tenant usage with minimal latency overhead (<5ms):

**Metering Events:**
- **Queries:** LLM API calls (count)
- **Storage:** Document repository size (GB)
- **Compute:** Processing time (pod-hours)
- **Vector Operations:** Embedding searches/inserts (count)

**Singleton Pattern:**
- Shared metering instance across service workers
- Labels: `tenant_id`, `event_type`, `timestamp`
- Error handling: Metering failures don't impact query execution

In [None]:
# Initialize usage metering
metering = TenantUsageMetering()

# Record usage for Finance tenant
metering.record_query("finance", 100_000)
metering.record_storage("finance", 200.0)
metering.record_compute("finance", 500.0)
metering.record_vector_operation("finance", 500_000)

# Record usage for Legal tenant
metering.record_query("legal", 50_000)
metering.record_storage("legal", 100.0)
metering.record_compute("legal", 200.0)
metering.record_vector_operation("legal", 250_000)

# Record usage for HR tenant
metering.record_query("hr", 5_000)
metering.record_storage("hr", 20.0)
metering.record_compute("hr", 50.0)
metering.record_vector_operation("hr", 25_000)

# Display usage summary
print("Recorded Usage for 3 Tenants:")
for tenant_id, usage in metering.get_all_usage().items():
    print(f"\n{tenant_id.upper()}:")
    print(f"  Queries: {usage.query_count:,}")
    print(f"  Storage: {usage.storage_gb:.1f} GB")
    print(f"  Compute: {usage.compute_pod_hours:.1f} pod-hours")
    print(f"  Vector Ops: {usage.vector_operations:,}")

# Expected: Usage summary for 3 tenants

## 4. Cost Calculation Engine

Calculate tenant costs with multi-component formula:

**Steps:**
1. Calculate direct costs (LLM + Storage + Compute + Vector)
2. Add overhead (20% of direct costs)
3. Determine volume discount tier based on query count
4. Apply discount to (direct + overhead)
5. Calculate cost per query

**Formula:**
```python
direct_total = llm + storage + compute + vector
overhead = direct_total * 0.20
discount = (direct_total + overhead) * discount_rate
final_cost = direct_total + overhead - discount
```

In [None]:
# Initialize cost calculation engine
cost_engine = CostCalculationEngine()

# Calculate costs for all tenants
print("Cost Breakdown by Tenant:\n")

for tenant_id, usage in metering.get_all_usage().items():
    breakdown = cost_engine.calculate_tenant_cost(tenant_id, usage)
    
    print(f"{tenant_id.upper()}:")
    print(f"  Direct Costs: ‚Çπ{breakdown.direct_total:.2f}")
    print(f"    - LLM: ‚Çπ{breakdown.llm_cost:.2f}")
    print(f"    - Storage: ‚Çπ{breakdown.storage_cost:.2f}")
    print(f"    - Compute: ‚Çπ{breakdown.compute_cost:.2f}")
    print(f"    - Vector: ‚Çπ{breakdown.vector_cost:.2f}")
    print(f"  Overhead ({breakdown.overhead_rate*100:.0f}%): ‚Çπ{breakdown.overhead_cost:.2f}")
    print(f"  Volume Discount ({breakdown.volume_discount_rate*100:.0f}%): -‚Çπ{breakdown.volume_discount_amount:.2f}")
    print(f"  FINAL COST: ‚Çπ{breakdown.final_cost:.2f}")
    print(f"  Cost per Query: ‚Çπ{breakdown.cost_per_query:.4f}")
    print()

# Expected: Cost breakdown for 3 tenants with different discount tiers

## 5. Real-World Example: Finance Department

Let's walk through the Finance department's cost calculation in detail:

**Monthly Usage:**
- 100,000 queries
- 200 GB storage
- 500 pod-hours compute
- 500,000 vector operations

**Cost Calculation:**
- LLM: 100K √ó \$0.002 = \$200 = ‚Çπ16,600
- Storage: 200 GB √ó \$0.023 = \$4.60 = ‚Çπ382
- Compute: 500 √ó \$0.05 = \$25 = ‚Çπ2,075
- Vector: 500K √ó \$0.0001 = \$50 = ‚Çπ4,150
- **Direct Total: ‚Çπ23,207**
- Overhead (20%): ‚Çπ4,641
- Subtotal: ‚Çπ27,848
- Volume Discount (15% @ TIER_1): -‚Çπ4,177
- **Final Cost: ‚Çπ23,671**
- **Cost per Query: ‚Çπ0.24**

**ROI Analysis:**
- Finance saves ‚Çπ12 Cr/year in manual research costs
- Platform cost: ‚Çπ2.84 L/year
- **ROI: 42x return on investment**

In [None]:
# Detailed Finance department analysis
finance_usage = metering.get_tenant_usage("finance")
finance_breakdown = cost_engine.calculate_tenant_cost("finance", finance_usage)

print("Finance Department - Detailed Cost Analysis")
print("="*60)
print(f"Monthly Usage:")
print(f"  Queries: {finance_usage.query_count:,}")
print(f"  Storage: {finance_usage.storage_gb:.1f} GB")
print(f"  Compute: {finance_usage.compute_pod_hours:.1f} pod-hours")
print(f"  Vector Ops: {finance_usage.vector_operations:,}")
print()
print("Cost Breakdown:")
print(f"  LLM API: ‚Çπ{finance_breakdown.llm_cost:.2f} ({finance_usage.query_count:,} √ó ${TenantUsageMetering.LLM_COST_PER_QUERY})")
print(f"  Storage: ‚Çπ{finance_breakdown.storage_cost:.2f} ({finance_usage.storage_gb:.1f} GB √ó ${TenantUsageMetering.STORAGE_COST_PER_GB})")
print(f"  Compute: ‚Çπ{finance_breakdown.compute_cost:.2f} ({finance_usage.compute_pod_hours:.1f} √ó ${TenantUsageMetering.COMPUTE_COST_PER_POD_HOUR})")
print(f"  Vector Ops: ‚Çπ{finance_breakdown.vector_cost:.2f} ({finance_usage.vector_operations:,} √ó ${TenantUsageMetering.VECTOR_COST_PER_OP})")
print(f"  Direct Total: ‚Çπ{finance_breakdown.direct_total:.2f}")
print(f"  Overhead (20%): ‚Çπ{finance_breakdown.overhead_cost:.2f}")
print(f"  Volume Discount (15%): -‚Çπ{finance_breakdown.volume_discount_amount:.2f}")
print(f"  FINAL COST: ‚Çπ{finance_breakdown.final_cost:.2f}")
print(f"  Cost per Query: ‚Çπ{finance_breakdown.cost_per_query:.4f}")
print()
print("ROI Analysis:")
annual_cost = finance_breakdown.final_cost * 12
savings_per_year = 12_00_00_000  # ‚Çπ12 Cr
roi = savings_per_year / annual_cost
print(f"  Annual Platform Cost: ‚Çπ{annual_cost:,.0f}")
print(f"  Annual Savings: ‚Çπ{savings_per_year:,}")
print(f"  ROI: {roi:.1f}x")

# Expected: Detailed Finance breakdown with ROI

## 6. Chargeback Report Generation

Generate CFO-ready invoices with detailed line items:

**Invoice Components:**
- Invoice ID (unique per tenant per month)
- Billing period
- Line items (LLM, Storage, Compute, Vector)
- Subtotal (direct costs)
- Overhead charge
- Volume discount
- Total amount
- Cost per query metric

**Format:** JSON (can be extended to PDF with ReportLab)

In [None]:
# Initialize report generator
report_generator = ChargebackReportGenerator(cost_engine)

# Generate invoice for Finance
finance_invoice = report_generator.generate_monthly_invoice("finance", finance_usage)

print("CFO-Ready Invoice - Finance Department")
print("="*60)
print(f"Invoice ID: {finance_invoice['invoice_id']}")
print(f"Tenant: {finance_invoice['tenant_id']}")
print(f"Billing Period: {finance_invoice['billing_period']}")
print()
print("Line Items:")
for item in finance_invoice['line_items']:
    print(f"  {item['description']:20s} {str(item['quantity']):>15s}  {item['cost']:>12s}")
print()
print(f"Subtotal:        {finance_invoice['subtotal']:>12s}")
print(f"Overhead:        {finance_invoice['overhead']:>12s}")
print(f"Discount:        {finance_invoice['discount']:>12s}")
print(f"TOTAL:           {finance_invoice['total']:>12s}")
print()
print(f"Cost per Query:  {finance_invoice['cost_per_query']:>12s}")

# Expected: Formatted CFO invoice

## 7. Platform Summary Report

Generate platform-wide cost summary across all tenants:

**Metrics:**
- Total tenant count
- Total platform cost (sum of all tenant costs)
- Total queries across all tenants
- Average cost per query (platform efficiency)
- Top tenants by cost (identify major consumers)
- Cost distribution (top 10% vs. bottom 50%)

In [None]:
# Generate platform summary
platform_summary = report_generator.generate_platform_summary(metering.get_all_usage())

print("Platform-Wide Cost Summary")
print("="*60)
print(f"Tenant Count: {platform_summary['tenant_count']}")
print(f"Total Cost: ‚Çπ{platform_summary['total_cost_inr']:,.2f}")
print(f"Total Queries: {platform_summary['total_queries']:,}")
print(f"Average Cost per Query: ‚Çπ{platform_summary['avg_cost_per_query']:.4f}")
print()
print("Top Tenants by Cost:")
for i, tenant in enumerate(platform_summary['top_tenants'], 1):
    print(f"  {i}. {tenant['tenant_id']:10s}  ‚Çπ{tenant['cost']:>10,.2f}  ({tenant['queries']:>8,} queries)")
print()
print("Cost Distribution:")
top_10_pct = platform_summary['cost_distribution']['top_10_percent']
top_50_pct = platform_summary['cost_distribution']['top_50_percent']
total = platform_summary['total_cost_inr']
print(f"  Top 10% of tenants: ‚Çπ{top_10_pct:,.2f} ({top_10_pct/total*100:.1f}%)")
print(f"  Top 50% of tenants: ‚Çπ{top_50_pct:,.2f} ({top_50_pct/total*100:.1f}%)")

# Expected: Platform summary with distribution

## 8. Cost Anomaly Detection

Detect cost spikes and anomalies:

**Alert Triggers:**
- >50% month-over-month cost increase
- Requires at least 1 historical data point

**Root Cause Analysis:**
- Query surge (new feature, data leak)
- Storage growth (bulk upload, migration)
- Compute spike (inefficient queries)
- Vector ops increase (excessive re-indexing)

**Escalation Path:**
1. Platform team alerted
2. Contact tenant owner
3. Verify usage is legitimate
4. If spike continues, escalate to CTO/CFO

In [None]:
# Initialize anomaly detector
anomaly_detector = CostAnomalyDetector()

# Simulate 3 months of Finance usage
print("Simulating 3 Months of Finance Usage:")
print("="*60)

# Month 1: Normal (100K queries)
month1_cost = finance_breakdown.final_cost
anomaly1 = anomaly_detector.check_anomaly("finance", month1_cost, finance_usage)
print(f"Month 1: ‚Çπ{month1_cost:,.0f} - {anomaly1 if anomaly1 else 'No anomaly (baseline)'}")

# Month 2: Small increase (120K queries)
now = datetime.utcnow()
month2_usage = UsageMetrics(
    tenant_id="finance",
    query_count=120_000,
    storage_gb=220.0,
    compute_pod_hours=550.0,
    vector_operations=600_000,
    period_start=now.isoformat(),
    period_end=(now + timedelta(days=30)).isoformat()
)
month2_breakdown = cost_engine.calculate_tenant_cost("finance", month2_usage)
month2_cost = month2_breakdown.final_cost
anomaly2 = anomaly_detector.check_anomaly("finance", month2_cost, month2_usage)
print(f"Month 2: ‚Çπ{month2_cost:,.0f} - {anomaly2 if anomaly2 else 'No anomaly (+20% growth)'}")

# Month 3: SPIKE (250K queries - >50% increase)
month3_usage = UsageMetrics(
    tenant_id="finance",
    query_count=250_000,
    storage_gb=500.0,
    compute_pod_hours=1200.0,
    vector_operations=1_500_000,
    period_start=now.isoformat(),
    period_end=(now + timedelta(days=30)).isoformat()
)
month3_breakdown = cost_engine.calculate_tenant_cost("finance", month3_usage)
month3_cost = month3_breakdown.final_cost
anomaly3 = anomaly_detector.check_anomaly("finance", month3_cost, month3_usage)

print(f"Month 3: ‚Çπ{month3_cost:,.0f} - ANOMALY DETECTED!")
if anomaly3:
    print()
    print("üö® COST SPIKE ALERT")
    print(f"  Tenant: {anomaly3['tenant_id']}")
    print(f"  Previous Cost: ‚Çπ{anomaly3['previous_cost']:,.0f}")
    print(f"  Current Cost: ‚Çπ{anomaly3['current_cost']:,.0f}")
    print(f"  Change: +{anomaly3['change_percent']:.1f}% (threshold: {anomaly3['threshold_percent']:.0f}%)")
    print(f"  Root Cause Hints:")
    for hint in anomaly3['root_cause_hints']:
        print(f"    - {hint}")
    print(f"  Action: {anomaly3['action_required']}")

# Display cost trend
print()
print("Cost Trend (Last 3 Months):")
trend = anomaly_detector.get_cost_trend("finance")
for i, cost in enumerate(trend, 1):
    print(f"  Month {i}: ‚Çπ{cost:,.0f}")

# Expected: Anomaly detected in Month 3 with alerts

## 9. Migration Cost Estimation

Help tenants estimate costs before bulk uploads:

**Scenario:** Legal wants to upload 5,000 contracts to RAG platform

**Pre-Upload Questions:**
- How many documents?
- Average document size?
- Monthly storage cost impact?

**Warning Threshold:** ‚Çπ50K/month storage

**Benefit:** Prevents surprise invoices, gets tenant approval upfront

In [None]:
# Estimate migration cost for Legal's contract upload
print("Migration Cost Estimation - Legal Department")
print("="*60)
print("Scenario: Upload 5,000 contracts to RAG platform")
print()

# Estimate for 5,000 contracts @ 2 MB each
estimate1 = cost_engine.estimate_migration_cost(
    num_documents=5_000,
    avg_doc_size_mb=2.0
)

print("Estimate 1 (Normal Upload):")
print(f"  Documents: {estimate1['num_documents']:,}")
print(f"  Total Size: {estimate1['total_size_gb']:.2f} GB")
print(f"  Monthly Storage Cost: ${estimate1['monthly_storage_cost_usd']:.2f} (‚Çπ{estimate1['monthly_storage_cost_inr']:.2f})")
print(f"  Warning: {estimate1['warning'] if estimate1['warning'] else 'None - cost acceptable'}")
print()

# Estimate for 50,000 contracts @ 5 MB each (large migration)
estimate2 = cost_engine.estimate_migration_cost(
    num_documents=50_000,
    avg_doc_size_mb=5.0
)

print("Estimate 2 (Large Migration):")
print(f"  Documents: {estimate2['num_documents']:,}")
print(f"  Total Size: {estimate2['total_size_gb']:.2f} GB")
print(f"  Monthly Storage Cost: ${estimate2['monthly_storage_cost_usd']:.2f} (‚Çπ{estimate2['monthly_storage_cost_inr']:.2f})")
print(f"  Warning: {estimate2['warning'] if estimate2['warning'] else 'None'}")
print()
print("Recommendation:")
if estimate2['warning']:
    print("  üö® Get tenant approval before proceeding with large migration")
    print("  üìß Send cost estimate to Legal department head")
    print("  ‚úÖ Proceed only after written approval")
else:
    print("  ‚úì Cost acceptable - proceed with upload")

# Expected: Two estimates with warning for large migration

## 10. Cost Attribution Validation

Monthly reconciliation to ensure accuracy:

**Process:**
1. Sum all tenant costs (attributed total)
2. Compare to actual cloud bill (AWS/Azure/GCP)
3. Calculate variance percentage
4. If variance > 10%, investigate missing cost components

**Acceptable Variance:** ¬±10%

**Common Missing Components:**
- Vector database operations (forgot to track)
- Network egress costs
- Load balancer charges
- API gateway fees

**Goal:** Ensure all costs are captured (no budget deficit)

In [None]:
# Monthly cost attribution validation
print("Monthly Cost Attribution Validation")
print("="*60)

# Calculate total attributed costs
all_usage = metering.get_all_usage()
total_attributed = 0.0
tenant_breakdown = []

for tenant_id, usage in all_usage.items():
    breakdown = cost_engine.calculate_tenant_cost(tenant_id, usage)
    total_attributed += breakdown.final_cost
    tenant_breakdown.append((tenant_id, breakdown.final_cost))

print("Attributed Costs:")
for tenant_id, cost in tenant_breakdown:
    print(f"  {tenant_id:10s}: ‚Çπ{cost:>10,.2f}")
print(f"  {'TOTAL':10s}: ‚Çπ{total_attributed:>10,.2f}")
print()

# Scenario 1: Accurate attribution (variance < 10%)
actual_bill_accurate = total_attributed * 1.05  # 5% variance
validation1 = validate_cost_attribution(total_attributed, actual_bill_accurate)

print("Scenario 1: Accurate Attribution")
print(f"  Attributed Total: ‚Çπ{validation1['total_attributed_cost']:,.2f}")
print(f"  Actual Cloud Bill: ‚Çπ{validation1['actual_cloud_bill']:,.2f}")
print(f"  Variance: {validation1['variance_percent']:.2f}%")
print(f"  Status: {validation1['status']}")
print(f"  Message: {validation1['message']}")
print()

# Scenario 2: Missing cost components (variance > 10%)
actual_bill_missing = total_attributed * 1.20  # 20% variance (missing vector ops)
validation2 = validate_cost_attribution(total_attributed, actual_bill_missing)

print("Scenario 2: Missing Cost Components")
print(f"  Attributed Total: ‚Çπ{validation2['total_attributed_cost']:,.2f}")
print(f"  Actual Cloud Bill: ‚Çπ{validation2['actual_cloud_bill']:,.2f}")
print(f"  Variance: {validation2['variance_percent']:.2f}%")
print(f"  Status: {validation2['status']} ‚ùå")
print(f"  Message: {validation2['message']}")
print()
print("Investigation Checklist:")
print("  ‚ñ° Are vector database operations tracked?")
print("  ‚ñ° Are network egress costs included?")
print("  ‚ñ° Are load balancer charges captured?")
print("  ‚ñ° Are API gateway fees accounted for?")
print("  ‚ñ° Run monthly reconciliation to identify gaps")

# Expected: Two validation scenarios (pass and fail)

## 11. Key Takeaways

**What You Learned:**

1. **Cost Attribution Layers:** Direct costs, overhead, allocation methods, billing models
2. **Multi-Component Formula:** LLM + Storage + Compute + Vector + Overhead - Discounts
3. **Volume Discounts:** 0% ‚Üí 15% ‚Üí 30% ‚Üí 40% based on query volume
4. **Usage Metering:** Track per-tenant usage with <5ms latency overhead
5. **Cost Calculation:** Apply formula with accurate cost constants
6. **Chargeback Reports:** Generate CFO-ready invoices with line items
7. **Anomaly Detection:** Alert on >50% cost spikes with root cause hints
8. **Migration Estimation:** Prevent surprise costs with pre-upload estimates
9. **Attribution Validation:** Monthly reconciliation ensures ¬±10% accuracy

**When to Use Cost Attribution:**
- ‚úÖ 10-50 tenants with stable usage
- ‚úÖ CFO enforces chargeback culture
- ‚úÖ Platform > 6 months old
- ‚úÖ Need ROI proof for budget justification

**When NOT to Use:**
- ‚ùå < 10 tenants (manual tracking sufficient)
- ‚ùå Platform < 6 months old (wait for stability)
- ‚ùå Showback culture with no accountability
- ‚ùå Single tenant > 80% of usage (everyone knows who pays)

**Platform Economics:**
- Cost per query decreases with volume (economies of scale)
- Volume discounts incentivize growth and consolidation
- Chargeback drives optimization and cost discipline
- ROI: ‚Çπ10 Cr platform ‚Üí ‚Çπ36 Cr value ‚Üí 3.6x return

**Next Steps:**
1. Implement usage metering in your RAG platform
2. Calculate costs with volume discounts
3. Generate monthly invoices for CFO
4. Set up anomaly detection alerts
5. Run monthly reconciliation (validate ¬±10% accuracy)
6. Optimize based on cost insights (cache high-cost queries, archive old data)

**Continue to M13.4:** Infrastructure Scaling (quantify cost impact of scaling decisions)

In [None]:
print("üéì L3 M13.3: Cost Optimization Strategies - COMPLETE")
print()
print("You have successfully learned:")
print("  ‚úì Cost attribution layers and allocation methods")
print("  ‚úì Multi-component cost formula with volume discounts")
print("  ‚úì Usage metering for queries, storage, compute, vector ops")
print("  ‚úì Chargeback report generation (CFO-ready invoices)")
print("  ‚úì Cost anomaly detection (>50% spikes)")
print("  ‚úì Migration cost estimation")
print("  ‚úì Attribution validation (monthly reconciliation)")
print()
print("Next Module: M13.4 - Infrastructure Scaling")
print("  ‚Üí Quantify cost impact of scaling decisions")
print("  ‚Üí Auto-scaling policies based on cost thresholds")
print("  ‚Üí Right-sizing infrastructure for cost efficiency")