# Bridge L3.M5.3 → L3.M5.4 Readiness Validation

## Purpose

You've built data quality systems in M5.3—now M5.4 shifts focus to **vector index resilience**. This bridge validates that your quality pipeline is operational and your vector infrastructure has the scale and metadata schema needed for backup/restore, blue-green deployments, and health monitoring.

**Why it matters:** Without verified quality metrics, sufficient vector count, and tracking infrastructure, M5.4's index management features cannot be meaningfully tested or deployed.

## Concepts Covered (Delta Only)

- **Readiness validation** for operational transitions (not new implementations)
- **Infrastructure prerequisites** for index management (vector count, metadata schema)
- **Graceful degradation** when services are unavailable (offline-friendly checks)

## After Completing This Bridge

You will be able to:
- ✓ Verify that M5.3 quality systems are actively logging metrics
- ✓ Confirm your Pinecone index has sufficient scale (≥5,000 vectors) for M5.4 testing
- ✓ Validate required metadata fields (`document_id`, `version`) are present
- ✓ Check that Prometheus is tracking index size metrics
- ✓ Identify gaps in infrastructure before starting M5.4

## Context in Track

**Bridge L3.M5.3 → L3.M5.4**  
**From:** Data Quality & Validation  
**To:** Vector Index Management

This bridge sits between quality assurance and infrastructure operations, ensuring the foundation is solid before adding backup, deployment, and monitoring capabilities.

---

## Run Locally (Windows-First)

```powershell
# PowerShell
$env:PYTHONPATH="$PWD"; jupyter notebook
```

```bash
# macOS/Linux
PYTHONPATH=$PWD jupyter notebook
```

**Optional environment variables:**
```powershell
$env:PINECONE_API_KEY="your-key"
$env:PINECONE_INDEX="your-index-name"
$env:QUALITY_METRICS_PATH="./quality_metrics.json"
$env:PROMETHEUS_URL="http://localhost:9090"
```

---

## Recap: What M5.3 (Data Quality & Validation) Shipped

Before validating readiness, review what the previous module delivered:

### 1. Quality Scoring Algorithms
Systems detecting corrupted text and OCR failures with **>80% accuracy** using character distribution analysis.

### 2. Duplicate Detection at Scale
MinHash/LSH technology identifying near-duplicates across millions of chunks with **<5% false positive rate**.

### 3. Drift Monitoring
Statistical tests (Chi-square) that alert when document distributions change meaningfully, catching corpus shifts before impact.

### 4. Grafana Quality Dashboards
Real-time visibility into metrics surfaced in **under 30 seconds**.

---

## Readiness Check #1: Quality Validation Pipeline Active

**Requirement:** Quality validation actively running with recent metrics logged.

This verifies M5.3 systems are operational and producing the quality metrics that will inform M5.4's index health decisions.

Check for quality metrics file at the configured path. Skips gracefully if absent (offline-friendly).

In [None]:
import os
from datetime import datetime, timedelta

# Check for quality metrics log/database
METRICS_PATH = os.getenv("QUALITY_METRICS_PATH", "./quality_metrics.json")

if not os.path.exists(METRICS_PATH):
    print("⚠️ Skipping (no quality metrics file found)")
    print(f"   Expected: {METRICS_PATH}")
else:
    # Expected: Recent metrics within last 24h
    # Expected: quality_score, duplicate_rate, drift_score present
    print("✓ Quality metrics file found")
    print(f"  Location: {METRICS_PATH}")
    # In production: verify timestamp < 24h, required fields present

---

## Readiness Check #2: Minimum Vector Count in Pinecone

**Requirement:** Minimum 5,000 vectors in Pinecone for meaningful M5.4 testing.

⚠️ **Note:** Backup/restore operations can overwhelm free Pinecone tier. Use subset testing first.

Connect to Pinecone and query index statistics. Skips if API key is absent (offline-friendly).

In [None]:
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_INDEX = os.getenv("PINECONE_INDEX", "rag-index")

if not PINECONE_API_KEY:
    print("⚠️ Skipping (no PINECONE_API_KEY)")
    print("   Set PINECONE_API_KEY to validate vector count")
else:
    try:
        # Stub: In production, connect and query index stats
        # from pinecone import Pinecone
        # pc = Pinecone(api_key=PINECONE_API_KEY)
        # index = pc.Index(PINECONE_INDEX)
        # stats = index.describe_index_stats()
        # vector_count = stats.total_vector_count
        
        # Expected: vector_count >= 5000
        print(f"✓ Pinecone configuration loaded")
        print(f"  Index: {PINECONE_INDEX}")
        print(f"  Expected: ≥5,000 vectors for M5.4 backup/restore testing")
    except Exception as e:
        print(f"⚠️ Pinecone check failed: {e}")

---

## Readiness Check #3: Required Metadata Fields

**Requirement:** All vectors include `document_id` and `version` metadata fields.

These fields enable targeted backup/restore operations and blue-green deployment version tracking in M5.4.

Query a sample vector to validate metadata schema. Skips if Pinecone is unavailable (offline-friendly).

In [None]:
if not PINECONE_API_KEY:
    print("⚠️ Skipping (no PINECONE_API_KEY)")
else:
    try:
        # Stub: Query sample vectors and check metadata
        # index = pc.Index(PINECONE_INDEX)
        # sample = index.query(vector=[0]*1536, top_k=1, include_metadata=True)
        # metadata = sample['matches'][0]['metadata']
        # assert 'document_id' in metadata
        # assert 'version' in metadata
        
        # Expected: All vectors have 'document_id' and 'version' in metadata
        print("✓ Metadata schema check configured")
        print("  Required fields: document_id, version")
    except Exception as e:
        print(f"⚠️ Metadata check failed: {e}")

---

## Readiness Check #4: Prometheus Metrics Tracking Index Size

**Requirement:** Prometheus actively tracking `rag_documents_in_index` metric.

This metric feeds M5.4's index health monitoring and alerting for capacity planning.

Query Prometheus for the index size metric. Skips if Prometheus is unreachable (offline-friendly).

In [None]:
PROMETHEUS_URL = os.getenv("PROMETHEUS_URL", "http://localhost:9090")

try:
    # Stub: Query Prometheus for the metric
    # import requests
    # response = requests.get(f"{PROMETHEUS_URL}/api/v1/query",
    #                        params={'query': 'rag_documents_in_index'},
    #                        timeout=2)
    # if response.ok and response.json()['data']['result']:
    #     metric_value = response.json()['data']['result'][0]['value'][1]
    #     print(f"✓ Metric value: {metric_value}")
    
    # Expected: Metric 'rag_documents_in_index' exists and is being tracked
    print(f"✓ Prometheus endpoint configured")
    print(f"  URL: {PROMETHEUS_URL}")
    print(f"  Expected metric: rag_documents_in_index")
except Exception as e:
    print(f"⚠️ Skipping (Prometheus not accessible): {e}")

---

## Call-Forward: What M5.4 (Vector Index Management) Will Introduce

Having validated that your data quality systems are operational and your vector infrastructure is ready, you're now prepared to tackle the next critical challenge: **infrastructure resilience**.

### The Problem
RAG systems face three major operational risks:
- **Data loss** from accidental deletions or corruption
- **Downtime** during index updates or schema migrations  
- **Performance degradation** that goes undetected until users complain

### What M5.4 Delivers

#### 1. Automated Backup and Restore
- **Nightly backups** with verification checksums
- **Recovery in minutes** rather than hours
- Enables confident experimentation and rollback

#### 2. Blue-Green Deployments
- **Zero-downtime** index migrations
- **Instant version switching** between old and new indices
- **Rollback capability** if issues are detected

#### 3. Index Health Monitoring
- **Automated alerts** for query latency spikes
- **Index size tracking** to prevent capacity issues
- **Corruption detection** before it impacts production

### Driving Question
*How do we ensure our RAG infrastructure is resilient, recoverable, and always available?*

M5.4 will answer this by implementing the operational safety net every production RAG system requires.

---

**Next Step:** Proceed to Module 5.4 to implement Vector Index Management.