# Bridge L3.M7.2 → L3.M7.3 Readiness Validation

## Purpose

This bridge validates that **M7.2 (APM Complete)** successfully delivered production-grade observability before advancing to **M7.3 (Business Metrics)**. M7.2 focused on technical performance monitoring—flame graphs, memory profiling, query optimization. M7.3 shifts to **business-level questions**: Which features drive retention? What's the cost-per-user? How do we detect satisfaction drops before churn?

Without validated APM infrastructure and cost tracking from M7.2, M7.3's cohort analysis would lack the foundational data layer needed to answer executive questions.

## Concepts Covered

**Delta from M7.2:**
- Validating APM infrastructure is live (not just installed)
- Confirming user feedback collection exists (prerequisite for satisfaction metrics)
- Verifying cost tracking foundations (OpenAI, Pinecone)

**New for M7.3 (call-forward):**
- RAG-specific quality metrics (accuracy, satisfaction scoring)
- Cohort segmentation (enterprise vs. free tier, query categories)
- Business KPI translation (revenue-per-query, cost-per-active-user)

## After Completing This Notebook

You will be able to:
- Verify Datadog APM is actively profiling your application (flame graphs visible)
- Confirm Prometheus/Grafana dashboards are receiving live metrics
- Validate user feedback mechanisms are capturing ratings for analysis
- Demonstrate cost tracking for LLM and vector database services
- Articulate the business questions M7.3 will address (satisfaction drop investigation)

## Context in Track

**Bridge:** L3.M7.2 → L3.M7.3  
**Prerequisite:** Module 7.2 (APM Complete)  
**Next Module:** Module 7.3 (Custom Business Metrics)  

**Run Locally (Windows):**
```powershell
powershell -c "$env:PYTHONPATH='$PWD'; jupyter notebook"
```

**Run Locally (macOS/Linux):**
```bash
jupyter notebook
```

---

## Section 1: Recap — What M7.2 Shipped

Module 7.2 (APM Complete) delivered comprehensive application performance monitoring:

### 1. Datadog APM Integration
- Installed `ddtrace` with API keys configured
- Continuous profiling at 5% sampling
- Maintained <3% CPU overhead

### 2. Function-Level Bottleneck Analysis
- Flame graphs identifying specific performance issues
- Discovered `serialize_prompt()` requiring 750ms optimization
- Found O(n²) loop at line 187

### 3. Memory Profiling
- Heap growth tracking enabled
- Identified unbounded `document_cache` dictionary
- Implemented LRU eviction policies

### 4. Database Query Optimization
- APM visualization of SQL queries
- Exposed N+1 query problems
- Optimized to single JOIN queries → 10x speedup

---

## Section 2: Readiness Check #1 — Datadog Flame Graphs

**Requirement:** Datadog UI must display flame graphs from the past hour.

**What to verify:**
- Datadog APM is actively receiving traces
- Flame graph visualization is available
- Recent profiling data (last 60 minutes) exists

### Check Datadog API Keys

This cell verifies Datadog credentials are configured. If keys are missing, the check gracefully skips (acceptable for offline/demo environments). With keys present, manual verification in the Datadog UI is required.

In [None]:
import os

# Check 1: Datadog flame graphs availability
dd_api_key = os.getenv("DD_API_KEY")
dd_app_key = os.getenv("DD_APP_KEY")

if not dd_api_key or not dd_app_key:
    print("⚠️  Skipping (no Datadog keys)")
    print("   To enable: Set DD_API_KEY and DD_APP_KEY environment variables")
else:
    print("✓ Datadog keys configured")
    print("  Manual step: Open Datadog UI → APM → Profiling → Flame Graphs (last 1h)")

---

## Section 3: Readiness Check #2 — Grafana/Prometheus Metrics

**Requirement:** Grafana must confirm Prometheus metrics are actively updating.

**What to verify:**
- Prometheus is scraping application metrics
- Grafana dashboards show recent data
- Key metrics (request_count, latency_p95, error_rate) are updating

### Check Prometheus and Grafana Endpoints

This cell checks if Prometheus and Grafana service URLs are configured. No actual HTTP requests are made (offline-safe). Manual dashboard verification is required when services are available.

In [None]:
# Check 2: Prometheus/Grafana metrics availability
prometheus_url = os.getenv("PROMETHEUS_URL", "http://localhost:9090")
grafana_url = os.getenv("GRAFANA_URL", "http://localhost:3000")

print(f"Prometheus endpoint: {prometheus_url}")
print(f"Grafana endpoint: {grafana_url}")
print("⚠️  Manual verification required:")
print("   Open Grafana dashboard, verify metrics updated in last 5 minutes")

---

## Section 4: Readiness Check #3 — User Feedback Mechanism

**Requirement:** User feedback capability (thumbs up/down) must exist in the RAG API.

**What to verify:**
- API endpoint accepts feedback (rating/thumbs)
- Feedback is stored for analysis
- Basic schema: query_id, rating, timestamp

### Define Expected Feedback Schema

This cell defines the minimal schema required for user feedback collection. The schema serves as a contract for M7.3's satisfaction analysis. No external calls are made.

In [None]:
# Check 3: User feedback mechanism stub
feedback_schema = {
    "query_id": "str",
    "rating": "int (1=down, 5=up) or bool",
    "timestamp": "datetime"
}

print("✓ Expected feedback schema defined")
print("  Manual step: Verify POST /api/feedback endpoint exists")
print("  Expected behavior: Accepts {query_id, rating}, returns 200 OK")
print("  Storage: Database table 'user_feedback' contains recent entries")

---

## Section 5: Readiness Check #4 — Cost Data Documentation

**Requirement:** Cost data must be documented for OpenAI and Pinecone services.

**What to verify:**
- Cost tracking exists for API usage
- Recent billing data available (OpenAI tokens, Pinecone queries)
- Documentation includes: service, metric, cost/unit, monthly estimate

### Define Cost Tracking Template

This cell establishes the expected structure for cost documentation. M7.3 will use this data for cohort-level cost analysis (cost-per-user, revenue-per-query). Template values are placeholders; actual usage data should exist in your billing dashboard or costs.json file.

In [None]:
# Check 4: Cost data documentation stub
cost_template = {
    "openai": {"tokens_used": 0, "cost_per_1k": 0.002, "monthly_est": 0},
    "pinecone": {"queries": 0, "cost_per_1k": 0.001, "monthly_est": 0}
}

print("✓ Cost tracking template defined")
print("  Manual step: Verify costs.json or billing dashboard exists")
print("  Expected: Contains actual usage data from last 30 days")

---

## Section 6: Call-Forward — What M7.3 Will Introduce

Module 7.3 (Custom Business Metrics) builds on M7.2's APM foundation to answer critical business questions.

### 1. RAG-Specific Quality Metrics
- **Answer Accuracy Tracking**: Per-query correctness measurement
- **Satisfaction Scores**: User rating aggregation and trending
- **Automated Degradation Alerts**: Threshold-based notifications when quality drops

### 2. Cohort Analysis
- **Cost Segmentation by User Type**: Enterprise vs. free tier usage patterns
- **Feature Usage Patterns**: Which RAG features drive the most engagement
- **Retention by Query Category**: Identify high-value query types

### 3. Executive Dashboards
- **Business Language Translation**: Convert technical metrics to KPIs
- **Revenue per Query**: Monetization efficiency tracking
- **Cost per Active User**: Unit economics for sustainable growth

### Driving Question for M7.3
**Scenario**: Satisfaction score dropped from 4.2 → 3.8 in the past week.  
**Investigation**: Which query types are problematic? Which user cohorts are affected? What's the cost impact?

---

## Pass Criteria

All 4 readiness checks must pass (or show documented workarounds):
1. ✓ Datadog flame graphs visible (last 1 hour)
2. ✓ Grafana/Prometheus metrics updating (last 5 minutes)
3. ✓ User feedback endpoint functional
4. ✓ Cost data documented (OpenAI + Pinecone)

**Next Step**: Proceed to Module 7.3 to implement custom business metrics and cohort analysis.