# Bridge M4.1 → M4.2: Cost & Vendor Readiness

**Purpose:** Validate readiness for M4.2 (Beyond Pinecone Free Tier) by checking M4.1 deliverables and creating cost model.

**Scope:** Artifact validation + local cost calculations (no external API calls).

---

## Section 1: Recap — What Hybrid Search Shipped

M4.1 delivered a production-grade hybrid search system with:

✓ **Dual-index retrieval** — BM25 sparse + Pinecone dense vectors, synchronized updates  
✓ **Weighted & RRF merging** — Alpha-weighted (0.5 default, 0.0-1.0 tunable) + Reciprocal Rank Fusion  
✓ **Dynamic alpha selection** — Query-dependent: exact codes (0.2), technical terms (0.4), natural language (0.7)  
✓ **5 critical failures debugged** — Index sync bugs, alpha tuning issues, tokenization mismatches, memory overflow, RRF ranking problems  

**Key Achievement:** Handles both "SKU-A1234" exact matches AND "how to secure user data" semantic queries.

## Section 2: Check — Alpha Tuning Evidence

**Requirement:** Document showing test results for alpha={0.0, 0.3, 0.5, 0.7, 1.0} with precision scores.

**Validation:** Check if `alpha_results.csv` exists; create stub if missing.

In [None]:
import os
import pandas as pd

# Check for alpha_results.csv
alpha_file = "alpha_results.csv"

if os.path.exists(alpha_file):
    df = pd.read_csv(alpha_file)
    print("✓ Alpha tuning results found!")
    print(f"\n{df.to_string(index=False)}")
else:
    print("⚠️ alpha_results.csv not found. Creating stub...")
    
    # Create stub with expected structure
    stub_data = {
        "alpha": [0.0, 0.3, 0.5, 0.7, 1.0],
        "query_type": ["exact_code", "technical", "mixed", "natural", "semantic"],
        "precision": [0.95, 0.85, 0.78, 0.82, 0.70],
        "notes": [
            "Pure BM25 - best for codes",
            "Sparse-heavy - good for technical",
            "Balanced - default",
            "Dense-heavy - good for natural language",
            "Pure semantic - worst for exact matches"
        ]
    }
    
    df = pd.DataFrame(stub_data)
    df.to_csv(alpha_file, index=False)
    print(f"✓ Created {alpha_file} with example data")
    print(f"\n{df.to_string(index=False)}")

# Expected: Table with alpha values 0.0-1.0, query types, precision scores

## Section 3: Check — 5 Common Failures Documented

**Requirement:** Document listing all 5 failures from M4.1:
1. Index synchronization bugs
2. Alpha tuning producing poor results
3. Tokenization mismatches
4. Memory overflow at scale
5. RRF counterintuitive ranking

**Validation:** Assert `failures_log.md` exists or create stub.

In [None]:
failures_file = "failures_log.md"

if os.path.exists(failures_file):
    print(f"✓ Failures log found: {failures_file}")
    with open(failures_file, 'r') as f:
        preview = f.read()[:500]
        print(f"\nPreview:\n{preview}...")
else:
    print("⚠️ failures_log.md not found. Creating stub...")
    
    stub_content = """# M4.1 Hybrid Search: 5 Common Failures

## 1. Index Synchronization Bugs
- **Issue:** BM25 and Pinecone indices out of sync after document updates
- **Symptom:** Results missing from one index but present in the other
- **Fix:** Implement atomic batch updates with rollback on failure

## 2. Alpha Tuning Producing Poor Results
- **Issue:** Default alpha=0.5 not optimal for all query types
- **Symptom:** Exact code searches return irrelevant semantic results
- **Fix:** Dynamic alpha selection based on query pattern detection

## 3. Tokenization Mismatches
- **Issue:** BM25 tokenizer differs from embedding model tokenizer
- **Symptom:** Queries tokenize differently across systems
- **Fix:** Standardize preprocessing pipeline for both indices

## 4. Memory Overflow at Scale
- **Issue:** In-memory BM25 index exceeds RAM at 1M+ documents
- **Symptom:** OOM errors during index rebuild
- **Fix:** Use disk-backed BM25 index (Redis/Elasticsearch) or chunked processing

## 5. RRF Counterintuitive Ranking
- **Issue:** Lower-ranked results from both systems beat high-ranked single-system results
- **Symptom:** Mediocre results ranked higher than excellent single-source results
- **Fix:** Add score threshold filtering before RRF merge
"""
    
    with open(failures_file, 'w') as f:
        f.write(stub_content)
    
    print(f"✓ Created {failures_file}")
    print(f"\nContains all 5 failures from M4.1 bridge requirements")

# Expected: Markdown file with 5 failure types, symptoms, fixes

## Section 4: Check — Decision Card Screenshot/Text

**Requirement:** M4.1 Decision Card with "USE WHEN" criteria for reference in vendor evaluation.

**Validation:** Check for `decision_card/` directory or files matching `*decision*.{png,jpg,txt,md}`.

In [None]:
from pathlib import Path
import glob

# Check for decision card artifacts
decision_card_dir = Path("decision_card")
decision_files = glob.glob("*decision*.*") + glob.glob("decision_card/*")

if decision_card_dir.exists():
    files = list(decision_card_dir.glob("*"))
    print(f"✓ Decision card directory found with {len(files)} file(s)")
    for f in files:
        print(f"  - {f.name}")
elif decision_files:
    print(f"✓ Decision card file(s) found:")
    for f in decision_files:
        print(f"  - {f}")
else:
    print("⚠️ No decision card artifacts found. Creating reference...")
    
    decision_card_dir.mkdir(exist_ok=True)
    
    reference_content = """# M4.1 Hybrid Search Decision Card

## USE WHEN:

✓ **Exact match queries mixed with semantic search**  
  Example: "SKU-A1234" + "how to secure user data"

✓ **Domain has specialized terminology**  
  Technical terms, product codes, jargon that embeddings miss

✓ **Query distribution unknown upfront**  
  Need system to handle both keyword and natural language

✓ **Precision matters more than pure recall**  
  Better to return 5 perfect results than 20 mediocre ones

## DON'T USE WHEN:

✗ Pure semantic search sufficient (no exact match requirements)  
✗ Operational complexity outweighs benefits  
✗ Sub-100ms latency critical (dual-index adds 20-50ms)  
✗ Budget constraints prevent dual infrastructure

## KEY CRITERIA:

- Query diversity: >30% queries benefit from both indices
- Alpha tuning: Can identify optimal alpha for your domain (0.3-0.7 typical)
- Infrastructure: Can maintain two synchronized indices
- Scale: <5M documents (in-memory BM25) OR disk-backed BM25 available

## VENDOR EVALUATION:

These criteria apply when comparing Pinecone, Weaviate, Qdrant, Milvus:
- Does vendor offer native hybrid search (built-in BM25)?
- If not, can you run external BM25 efficiently?
- Cost of dual-index vs single-index at your scale?
"""
    
    decision_file = decision_card_dir / "M4.1_decision_card.md"
    with open(decision_file, 'w') as f:
        f.write(reference_content)
    
    print(f"✓ Created {decision_file}")
    print("  Note: Reference document for M4.2 vendor comparison")

# Expected: Directory or files containing decision criteria for hybrid search

## Section 5: Mini Cost Model — Local Calculator

**Goal:** Calculate monthly costs for hybrid search at scale: 100K, 500K, 1M, 5M, 10M vectors.

**Components:**
- Vector database (Pinecone, Weaviate, Qdrant, self-hosted)
- BM25 index hosting (in-memory, Redis, managed)
- Embedding API costs (OpenAI/alternatives)
- DevOps overhead for self-hosted

**Output:** `cost_model.csv` with vendor comparison.

In [None]:
# Cost Model Calculator (Local - No API Calls)

# Pricing assumptions (2024 estimates - adjust in README)
PINECONE_POD_COST = 280  # per pod/month for Standard tier
WEAVIATE_CLOUD_BASE = 100  # per 500K vectors
QDRANT_CLOUD_BASE = 90  # per 500K vectors
AWS_EC2_BASE = 60  # t3.xlarge 4vCPU, 16GB RAM/month
BM25_REDIS_COST = 50  # managed Redis for BM25
EMBEDDING_COST_PER_1M = 10  # OpenAI ada-002 per 1M queries
DEVOPS_HOURLY = 50  # DevOps hourly rate
DEVOPS_HOURS_SELFHOST = 20  # hours/month for self-hosted

def calculate_costs(vector_count):
    """Calculate monthly costs for different vendor options."""
    
    # Pinecone - scales with pod count
    if vector_count <= 100000:
        pinecone_cost = 0  # Free tier
    else:
        pods_needed = max(1, vector_count // 1000000)
        pinecone_cost = pods_needed * PINECONE_POD_COST
    
    # Weaviate Cloud - linear scaling
    if vector_count <= 100000:
        weaviate_cost = 0  # Free tier
    else:
        weaviate_cost = (vector_count / 500000) * WEAVIATE_CLOUD_BASE
    
    # Qdrant Cloud - similar to Weaviate
    if vector_count <= 100000:
        qdrant_cost = 0
    else:
        qdrant_cost = (vector_count / 500000) * QDRANT_CLOUD_BASE
    
    # Self-hosted AWS - instance sizing
    if vector_count <= 100000:
        aws_cost = AWS_EC2_BASE  # Single instance
    elif vector_count <= 1000000:
        aws_cost = AWS_EC2_BASE * 2  # Scale up
    elif vector_count <= 5000000:
        aws_cost = AWS_EC2_BASE * 4
    else:
        aws_cost = AWS_EC2_BASE * 8
    
    # BM25 hosting - scales with document count
    if vector_count <= 100000:
        bm25_cost = 0  # In-memory on app server
    elif vector_count <= 1000000:
        bm25_cost = BM25_REDIS_COST
    else:
        bm25_cost = BM25_REDIS_COST * 2  # Larger Redis cluster
    
    # Embedding costs (assume 10K queries/day baseline)
    embedding_cost = (10000 * 30 / 1000000) * EMBEDDING_COST_PER_1M
    
    # DevOps overhead for self-hosted
    devops_cost = DEVOPS_HOURLY * DEVOPS_HOURS_SELFHOST
    
    return {
        "vectors": vector_count,
        "pinecone": pinecone_cost + bm25_cost + embedding_cost,
        "weaviate": weaviate_cost + embedding_cost,  # Native hybrid search
        "qdrant": qdrant_cost + bm25_cost + embedding_cost,
        "self_hosted": aws_cost + bm25_cost + embedding_cost + devops_cost
    }

# Calculate for target scales
scales = [100_000, 500_000, 1_000_000, 5_000_000, 10_000_000]
results = [calculate_costs(s) for s in scales]

# Create DataFrame
cost_df = pd.DataFrame(results)
cost_df['optimal_choice'] = cost_df[['pinecone', 'weaviate', 'qdrant', 'self_hosted']].idxmin(axis=1)

print("✓ Cost Model Generated (Monthly USD)\\n")
print(cost_df.to_string(index=False))

# Save to CSV
cost_df.to_csv("cost_model.csv", index=False)
print(f"\\n✓ Saved to cost_model.csv")

# Expected: Table with 5 scale tiers, 4 vendor options, optimal choice column

## Section 6: Call-Forward — Vendor & Self-Host Readiness

**M4.2 Preview:** Beyond Pinecone Free Tier evaluation includes:

1. **Weaviate** — Open-source with native hybrid search (no separate BM25)
2. **Qdrant** — Rust-based, 10x performance claims, lower memory footprint
3. **Milvus** — Enterprise-scale (billions of vectors), GPU acceleration
4. **Self-Host vs Managed** — Economics of infrastructure + DevOps overhead

**Decision Framework:** When does paying more for managed beat self-hosting at your scale?

In [None]:
# Vendor Comparison Summary (from bridge document)

vendor_summary = """
# Vendor Comparison for M4.2

## Weaviate (Open-Source Native Hybrid)
✓ Built-in hybrid search (BM25 + vector in single system)
✓ Self-host or cloud options
✓ GraphQL API
✗ Learning curve for schema design
→ Best for: Simplifying architecture, avoiding dual-index complexity

## Qdrant (Rust Performance Focus)
✓ Claims 10x faster than competitors
✓ Lower memory footprint
✓ Disk-based indexes for cost savings
✓ Python/REST APIs
✗ Smaller ecosystem than Pinecone
→ Best for: Cost-sensitive high-performance needs

## Milvus (Enterprise Scale)
✓ Handles billions of vectors
✓ GPU acceleration support
✓ Kubernetes-native deployment
✗ Complex setup (requires 5+ services)
✗ Overkill for <50M vectors
→ Best for: Massive scale (50M+ vectors)

## Self-Host vs Managed Decision Factors

| Factor | Managed Service | Self-Hosted |
|--------|----------------|-------------|
| Upfront Cost | Higher monthly fee | Server costs + setup time |
| DevOps Overhead | $0 | 20+ hours/month |
| Scalability | One-click scaling | Manual provisioning |
| Security | Vendor-managed | Your responsibility |
| Customization | Limited | Full control |

**Breakeven Analysis:**
- If managed costs $2,000/month vs self-host $900/month + $1,000 DevOps = $1,900
- Managed wins if DevOps time > 22 hours/month ($2,000 / $50/hour)
- Self-host wins if you have in-house expertise and predictable load

## Key Question for M4.2:
"At MY scale, which vector DB gives best price-performance with hybrid search?"
"""

print(vendor_summary)

# Write summary to file
with open("vendor_comparison_summary.md", 'w') as f:
    f.write(vendor_summary)

print("\\n✓ Saved to vendor_comparison_summary.md")

# Expected: Markdown summary of vendor options from bridge document