# M4.1 ‚Äî Hybrid Search (Sparse + Dense)

## Learning Arc

**Purpose**
This module teaches production-grade hybrid search combining BM25 keyword matching with dense vector embeddings. You'll learn when hybrid search justifies its complexity overhead and when simpler approaches suffice.

**Concepts Covered**
- BM25 sparse retrieval for exact keyword matching
- Dense vector embeddings for semantic understanding
- Alpha weighting to balance sparse and dense scores
- Reciprocal Rank Fusion (RRF) for rank-based merging
- Smart query analysis to auto-tune search strategy
- Cost-benefit analysis of latency, complexity, and accuracy trade-offs

**After Completing**
You will be able to:
- Implement hybrid search with configurable merge strategies
- Tune alpha weights based on query characteristics  
- Decide when hybrid search adds value vs overhead
- Debug common issues (normalization, namespace mismatches)
- Estimate costs and latency for production deployments
- Choose between alpha weighting and RRF for your use case

**Context in Track**
- **Prerequisite**: M1 (Vector Databases), M2 (Cost Optimization)
- **Builds on**: Dense embeddings from M1.4, cost awareness from M2.1
- **Prepares for**: M4.2 (Beyond Free Tier), M4.3 (Portfolio Projects)
- **Real-world**: E-commerce search, technical docs, mixed-content platforms

---
## Section 1: Prereq Check & Reality Check

Before diving into hybrid search, let's verify our environment and set realistic expectations.

### 1.1 Dependency Check

First, we'll check that all required libraries for this module (like `openai`, `pinecone`, and `rank_bm25`) are installed in your environment. If any are missing, a `pip install` command will be suggested.

In [None]:
# Import required libraries
import os
import sys
from dotenv import load_dotenv

# Check library availability
required_libs = {
    'openai': '1.46.0',
    'rank_bm25': '0.2.2',
    'nltk': '3.9.1',
    'numpy': '1.26.4',
    'pinecone': '3.0.0'
}

print("üì¶ Checking Dependencies...\n")
missing = []

for lib, version in required_libs.items():
    try:
        if lib == 'rank_bm25':
            import rank_bm25
            print(f"‚úì {lib} installed")
        elif lib == 'pinecone':
            from pinecone import Pinecone
            print(f"‚úì {lib} installed")
        else:
            __import__(lib)
            print(f"‚úì {lib} installed")
    except ImportError:
        print(f"‚úó {lib} MISSING (required: {version})")
        missing.append(lib)

if missing:
    print(f"\n‚ö† Install missing: pip install {' '.join(missing)}")
else:
    print("\n‚úì All dependencies installed!")

### 1.2 API Key & Environment Setup

This cell loads your `.env` file and verifies your `OPENAI_API_KEY` and `PINECONE_API_KEY`. The notebook will run in **"FULL"** mode if keys are found, or **"STUB"** mode (with limited features) if they are missing.

In [None]:
# Load environment variables
load_dotenv()

print("üîë Checking API Keys...\n")

openai_key = os.getenv('OPENAI_API_KEY', '')
pinecone_key = os.getenv('PINECONE_API_KEY', '')

# Check OpenAI
if openai_key and openai_key.startswith('sk-'):
    print(f"‚úì OpenAI API key found ({openai_key[:8]}...)")
    OPENAI_OK = True
else:
    print("‚úó OpenAI API key missing or invalid")
    OPENAI_OK = False

# Check Pinecone
if pinecone_key and len(pinecone_key) > 10:
    print(f"‚úì Pinecone API key found ({pinecone_key[:8]}...)")
    PINECONE_OK = True
else:
    print("‚úó Pinecone API key missing")
    PINECONE_OK = False

# Set mode
if OPENAI_OK and PINECONE_OK:
    MODE = "FULL"
    print("\nüöÄ Running in FULL mode with API access")
else:
    MODE = "STUB"
    print("\nüîß Running in STUB mode (limited functionality)")
    print("   ‚Üí BM25 and smart_alpha will work")
    print("   ‚Üí Dense/Pinecone features will use stubs")

### 1.3 Reality Check: When to Use Hybrid Search

Before we build, it's critical to know *why*. Hybrid search adds complexity. This "reality check" outlines the specific scenarios where it provides significant value (e.g., mixed queries, product SKUs) and when it's better to avoid it (e.g., small datasets, pure conversational search).

In [None]:
# Reality Check: When to use hybrid search
print("üìä Reality Check: Hybrid Search Trade-offs\n")
print("="*60)

print("\n‚úÖ USE HYBRID SEARCH WHEN:")
print("  ‚Ä¢ Mixed queries: natural language + technical terms/codes")
print("  ‚Ä¢ Product catalogs with SKUs, IDs, or model numbers")
print("  ‚Ä¢ Need 40-60% improvement on exact match accuracy")
print("  ‚Ä¢ Reducing false positives from pure semantic search")

print("\n‚ùå AVOID HYBRID SEARCH WHEN:")
print("  ‚Ä¢ Corpus < 1,000 docs (overhead not justified)")
print("  ‚Ä¢ Need P99 latency < 50ms (hybrid adds 80-120ms)")
print("  ‚Ä¢ 90%+ purely conversational queries (dense-only better)")
print("  ‚Ä¢ Limited resources (2x indexes to maintain)")

print("\nüí∞ COST CONSIDERATIONS:")
print("  ‚Ä¢ Development: 12-16 hours proper implementation")
print("  ‚Ä¢ Scale: In-memory BM25 viable for <100K docs")
print("  ‚Ä¢ Beyond 100K: Need Elasticsearch ($150-500/month)")
print("  ‚Ä¢ Query cost: 2+ API calls per search")

print("\n" + "="*60)
print("\n‚úì Prereqs verified. Ready to proceed!")

In [None]:
print("\nüìù SAVED_SECTION: 1")

---
## Section 2: BM25 vs Dense Recap

Understanding the strengths and weaknesses of each approach is crucial for effective hybrid search.

### 2.1 BM25 vs Dense Comparison Table

Let's compare the two retrieval approaches side-by-side to understand their complementary strengths and weaknesses.

In [None]:
# Comparison table
import pandas as pd

comparison_data = {
    'Aspect': [
        'Algorithm',
        'Strengths',
        'Weaknesses',
        'Best For',
        'Latency',
        'Cost'
    ],
    'BM25 (Sparse)': [
        'Term frequency + IDF + doc length',
        'Exact matches, technical terms, IDs/SKUs',
        'No semantic understanding, synonym issues',
        'Keyword search, product codes, exact phrases',
        '<1ms (in-memory)',
        'Free (in-memory)'
    ],
    'Dense (Embeddings)': [
        'Neural network vector similarity',
        'Semantic understanding, synonyms, context',
        'May miss exact terms, hallucination risk',
        'Natural language, conceptual queries',
        '50-100ms (API + vector DB)',
        '$0.0001/query (embedding cost)'
    ]
}

df = pd.DataFrame(comparison_data)
print("üìä BM25 vs Dense Comparison\\n")
print(df.to_string(index=False))

print("\\n" + "="*80)
print("Key Insight: Hybrid combines BOTH approaches to cover weaknesses!")
print("="*80)

### 2.2 Quick BM25 Demo

Here's a hands-on demo of BM25 in action. This works even in STUB mode (no API keys required) since BM25 runs entirely in-memory.

In [None]:
# Quick BM25 demo (works in STUB mode)
from rank_bm25 import BM25Okapi
from nltk.tokenize import word_tokenize
import nltk

# Download NLTK data if needed
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt', quiet=True)

# Sample documents
docs = [
    "Python is a high-level programming language",
    "Machine learning algorithms use neural networks",
    "Product SKU ABC-12345 is available in stock",
    "Natural language processing enables text understanding",
    "Order number 67890 shipped yesterday"
]

# Tokenize and build BM25 index
tokenized = [word_tokenize(doc.lower()) for doc in docs]
bm25 = BM25Okapi(tokenized)

print("üîç BM25 Examples:\\n")

# Test queries
test_queries = [
    "ABC-12345",           # Exact code match
    "programming python",  # Keywords
    "understanding text"   # Synonym/reorder
]

for query in test_queries:
    tokenized_query = word_tokenize(query.lower())
    scores = bm25.get_scores(tokenized_query)
    top_idx = scores.argmax()
    
    print(f"Query: '{query}'")
    print(f"  ‚Üí Top match: {docs[top_idx][:60]}...")
    print(f"  ‚Üí Score: {scores[top_idx]:.4f}\\n")

In [None]:
print("\nüìù SAVED_SECTION: 2")

---
## Section 3: Index Build (Dense + Sparse)

Let's build both indexes using our HybridSearchEngine class.

### 3.1 Initialize HybridSearchEngine

We'll import and initialize our `HybridSearchEngine` class, which manages both BM25 and dense vector search. The engine auto-detects API keys and adjusts functionality accordingly.

In [None]:
# Import our hybrid search module
import sys
sys.path.insert(0, '..')

from m4_1_hybrid_search import HybridSearchEngine

# Initialize engine
engine = HybridSearchEngine(
    openai_api_key=os.getenv('OPENAI_API_KEY'),
    pinecone_api_key=os.getenv('PINECONE_API_KEY'),
    index_name=os.getenv('PINECONE_INDEX_NAME', 'hybrid-search'),
    namespace='m4-demo'
)

print("‚úì HybridSearchEngine initialized")
print(f"  Mode: {MODE}")
print(f"  Namespace: {engine.namespace}")

### 3.2 Create Sample Dataset

Let's create a sample dataset with a mix of technical content (SKUs, model numbers) and natural language descriptions. This diversity is perfect for demonstrating hybrid search benefits.

In [None]:
# Sample dataset: Mix of technical and natural language
documents = [
    {
        "id": "doc1",
        "text": "Python is a versatile programming language used for web development, data science, and automation.",
        "metadata": {"category": "programming", "difficulty": "beginner"}
    },
    {
        "id": "doc2",
        "text": "Machine learning algorithms like neural networks can recognize patterns in large datasets.",
        "metadata": {"category": "ai", "difficulty": "advanced"}
    },
    {
        "id": "doc3",
        "text": "Product SKU ABC-12345 is a wireless keyboard with RGB backlight. Model number KBD-2024.",
        "metadata": {"category": "product", "in_stock": True}
    },
    {
        "id": "doc4",
        "text": "Natural language processing enables computers to understand human text and speech.",
        "metadata": {"category": "ai", "difficulty": "intermediate"}
    },
    {
        "id": "doc5",
        "text": "Order #67890 contains 3 items: laptop, mouse, and headphones. Shipped via FedEx.",
        "metadata": {"category": "order", "shipped": True}
    },
    {
        "id": "doc6",
        "text": "Deep learning models require GPU acceleration for efficient training on large corpora.",
        "metadata": {"category": "ai", "difficulty": "advanced"}
    },
    {
        "id": "doc7",
        "text": "Monitor model MON-4K-27 has 3840x2160 resolution. Part number DISPLAY-789.",
        "metadata": {"category": "product", "in_stock": False}
    },
    {
        "id": "doc8",
        "text": "JavaScript frameworks like React enable building interactive user interfaces for web applications.",
        "metadata": {"category": "programming", "difficulty": "intermediate"}
    }
]

print(f"üìö Created {len(documents)} sample documents")
print(f"   Categories: {set(d['metadata']['category'] for d in documents)}")
print(f"\\nSample document:")
print(f"  ID: {documents[0]['id']}")
print(f"  Text: {documents[0]['text'][:60]}...")
print(f"  Metadata: {documents[0]['metadata']}")

### 3.3 Build Both Indexes

Now we'll build both the BM25 (sparse) and Pinecone (dense) indexes. BM25 builds instantly in-memory, while dense indexing requires embedding generation and Pinecone upsert (10-30 seconds in FULL mode).

In [None]:
# Step 1: Build BM25 index (works in all modes)
print("üî® Building BM25 index...")
engine.add_documents(documents)
print("‚úì BM25 index built!\\n")

# Step 2: Build dense index (requires API keys)
if MODE == "FULL":
    print("üî® Building dense index (Pinecone)...")
    print("   This will generate embeddings and upsert to Pinecone...")
    print("   (This may take 10-30 seconds)\\n")
    
    try:
        engine.upsert_to_pinecone(batch_size=4)
        print("\\n‚úì Dense index built!")
    except Exception as e:
        print(f"\\n‚ö† Dense indexing failed: {e}")
        print("   Falling back to STUB mode for dense search")
        MODE = "STUB"
else:
    print("üîß STUB mode: Skipping Pinecone upsert")
    print("   Dense search will return empty results")

In [None]:
print("\nüìù SAVED_SECTION: 3")

---
## Section 4: Alpha Tuning (0.2 / 0.5 / 0.8)

Alpha controls the weight between dense and sparse retrieval:
- `alpha = 1.0`: Pure dense (semantic only)
- `alpha = 0.5`: Equal weighting
- `alpha = 0.0`: Pure sparse (BM25 only)

Let's test different alpha values on various query types.

### 4.1 Smart Alpha Detection

The `smart_alpha()` function analyzes query patterns (SKU codes, technical terms, natural language) and automatically suggests the optimal alpha value. Let's see it in action.

In [None]:
# Test queries with different characteristics
test_queries = [
    {
        "query": "ABC-12345",
        "type": "Exact SKU",
        "expected": "Should favor BM25 (low alpha)"
    },
    {
        "query": "understanding human language",
        "type": "Natural language",
        "expected": "Should favor dense (high alpha)"
    },
    {
        "query": "GPU training models",
        "type": "Mixed technical + concepts",
        "expected": "Balanced alpha"
    }
]

print("üß™ Testing Smart Alpha Detection\\n")
print("="*80)

for test in test_queries:
    alpha = engine.smart_alpha(test["query"])
    print(f"\\nQuery: '{test['query']}'")
    print(f"Type: {test['type']}")
    print(f"Smart Alpha: {alpha:.2f}")
    print(f"Expected: {test['expected']}")
    
print("\\n" + "="*80)

### 4.2 Alpha Comparison Test

Let's test the same query with different alpha values (0.2, 0.5, 0.8) to see how the weighting affects result rankings. Lower alpha favors BM25, higher favors dense embeddings.

In [None]:
# Compare alpha values on BM25-only query (STUB mode compatible)
print("üìä Alpha Comparison on BM25 Search\\n")
print("Query: 'keyboard RGB backlight'\\n")

query = "keyboard RGB backlight"
alphas = [0.2, 0.5, 0.8]

# Get BM25 results for comparison
bm25_results = engine.search_bm25(query, top_k=3)

print("BM25 Results (baseline):")
for i, result in enumerate(bm25_results, 1):
    print(f"  {i}. [{result['id']}] {result['text'][:50]}... (score={result['score']:.4f})")

print("\\n" + "-"*80)

if MODE == "FULL":
    print("\\nHybrid Results with Different Alphas:\\n")
    
    for alpha in alphas:
        print(f"Alpha = {alpha} ({'sparse' if alpha < 0.4 else 'balanced' if alpha < 0.7 else 'dense'}):")
        results = engine.hybrid_search_alpha(query, top_k=3, alpha=alpha)
        
        for i, result in enumerate(results, 1):
            print(f"  {i}. [{result['id']}] {result['text'][:50]}... (score={result['score']:.4f})")
        print()
else:
    print("\\n‚ö† STUB mode: Dense features unavailable")
    print("   In FULL mode, you would see how alpha affects ranking!")

In [None]:
print("\nüìù SAVED_SECTION: 4")

---
## Section 5: RRF Merge Demo

Reciprocal Rank Fusion (RRF) is an alternative to alpha weighting that:
- Doesn't require score normalization
- Is more robust to score scale differences
- Reduces need for tuning

Formula: `rrf_score = sum(1 / (k + rank + 1))`

### 5.1 Manual RRF Calculation

Let's manually calculate RRF scores to understand the algorithm. RRF uses rank positions (not scores) to combine results, which makes it more robust than alpha weighting.

In [None]:
# Demonstrate RRF calculation manually
print("üî¢ RRF Calculation Example\\n")
print("="*80)

# Simulate two ranked lists
dense_ranks = ["doc2", "doc4", "doc6", "doc1"]  # Dense retrieval results
sparse_ranks = ["doc3", "doc1", "doc2", "doc5"]  # BM25 results

k = 60  # RRF constant

print("Dense ranking:  ", dense_ranks)
print("Sparse ranking: ", sparse_ranks)
print(f"\\nRRF constant k = {k}\\n")

# Calculate RRF scores
rrf_scores = {}

print("Calculating RRF scores:\\n")

for rank, doc_id in enumerate(dense_ranks):
    score = 1.0 / (k + rank + 1)
    rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + score
    print(f"  {doc_id} (dense rank {rank}): +{score:.6f}")

print()

for rank, doc_id in enumerate(sparse_ranks):
    score = 1.0 / (k + rank + 1)
    old_score = rrf_scores.get(doc_id, 0)
    rrf_scores[doc_id] = old_score + score
    print(f"  {doc_id} (sparse rank {rank}): +{score:.6f} ‚Üí total={rrf_scores[doc_id]:.6f}")

# Sort by RRF score
sorted_docs = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)

print(f"\\nFinal RRF Ranking:\\n")
for i, (doc_id, score) in enumerate(sorted_docs, 1):
    in_both = doc_id in dense_ranks and doc_id in sparse_ranks
    print(f"  {i}. {doc_id}: {score:.6f} {'‚úì (in both)' if in_both else ''}")

print("\\n" + "="*80)
print("\\nKey Insight: Documents in BOTH lists get boosted!")

### 5.2 Alpha vs RRF Side-by-Side

Now let's compare alpha-weighted and RRF merge strategies on the same query. Note how they may produce different rankings despite using the same underlying searches.

In [None]:
# Compare Alpha vs RRF on actual query
print("‚öñÔ∏è  Alpha vs RRF Comparison\\n")
print("="*80)

query = "machine learning algorithms"

print(f"Query: '{query}'\\n")

# BM25 only
print("BM25 Results:")
bm25_results = engine.search_bm25(query, top_k=3)
for i, r in enumerate(bm25_results, 1):
    print(f"  {i}. [{r['id']}] {r['text'][:55]}...")

if MODE == "FULL":
    # Alpha weighted
    print("\\nAlpha Weighted (alpha=0.5):")
    alpha_results = engine.hybrid_search_alpha(query, top_k=3, alpha=0.5)
    for i, r in enumerate(alpha_results, 1):
        print(f"  {i}. [{r['id']}] {r['text'][:55]}...")
    
    # RRF
    print("\\nRRF Merged:")
    rrf_results = engine.hybrid_search_rrf(query, top_k=3, k=60)
    for i, r in enumerate(rrf_results, 1):
        print(f"  {i}. [{r['id']}] {r['text'][:55]}...")
    
    print("\\n" + "="*80)
    print("\\nNote: RRF and Alpha may produce different rankings!")
    print("RRF is generally more stable and requires less tuning.")
else:
    print("\\n‚ö† STUB mode: RRF requires dense search")
    print("   Enable FULL mode to see RRF in action!")

In [None]:
print("\nüìù SAVED_SECTION: 5")

---
## Section 6: When NOT to Use Hybrid (Cost/Latency/Complexity)

Hybrid search isn't always the answer. Let's analyze when the overhead isn't justified.

### 6.1 Cost-Benefit Scenarios

Let's analyze four realistic scenarios to determine when hybrid search justifies its overhead. Each scenario considers corpus size, traffic, and query characteristics.

In [None]:
# Cost-Benefit Analysis
print("üí∞ Hybrid Search Cost-Benefit Analysis\\n")
print("="*80)

scenarios = [
    {
        "name": "Small Documentation Site",
        "corpus_size": 500,
        "queries_per_day": 100,
        "query_type": "90% natural language",
        "recommendation": "‚ùå SKIP HYBRID",
        "reason": "Corpus too small, dense-only sufficient"
    },
    {
        "name": "E-commerce Product Catalog",
        "corpus_size": 50000,
        "queries_per_day": 10000,
        "query_type": "Mix: 40% SKU/codes, 60% natural",
        "recommendation": "‚úÖ USE HYBRID",
        "reason": "Exact match critical, high query diversity"
    },
    {
        "name": "Real-time Chat Support",
        "corpus_size": 5000,
        "queries_per_day": 5000,
        "query_type": "80% conversational",
        "recommendation": "‚ùå SKIP HYBRID",
        "reason": "P50 latency < 50ms required, dense sufficient"
    },
    {
        "name": "Technical Documentation",
        "corpus_size": 20000,
        "queries_per_day": 2000,
        "query_type": "Mix: 50% API names, 50% concepts",
        "recommendation": "‚úÖ USE HYBRID",
        "reason": "Exact API names + semantic understanding needed"
    }
]

for scenario in scenarios:
    print(f"\\n{scenario['name']}:")
    print(f"  Corpus: {scenario['corpus_size']:,} docs")
    print(f"  Traffic: {scenario['queries_per_day']:,} queries/day")
    print(f"  Query mix: {scenario['query_type']}")
    print(f"  {scenario['recommendation']}")
    print(f"  ‚Üí {scenario['reason']}")

print("\\n" + "="*80)

### 6.2 Complexity & Performance Analysis

Hybrid search doubles maintenance complexity (two indexes to sync). Let's break down the latency overhead and cost implications in detail.

In [None]:
# Complexity and Maintenance Burden
print("üîß Complexity & Maintenance Analysis\\n")
print("="*80)

print("\\nüìà COMPLEXITY INCREASES:\\n")
complexity_factors = [
    ("Two indexes to maintain", "Every doc update ‚Üí 2 writes"),
    ("Sync challenges", "Consistency between BM25 and vector DB"),
    ("Alpha tuning", "Requires query analysis and testing"),
    ("Performance monitoring", "Track both systems separately"),
    ("Cost optimization", "Balance API calls vs accuracy")
]

for factor, impact in complexity_factors:
    print(f"  ‚Ä¢ {factor}")
    print(f"    ‚Üí {impact}")

print("\\n‚è±Ô∏è  LATENCY OVERHEAD:\\n")
print("  Dense-only:   ~50-80ms")
print("  + BM25:       +5-10ms (negligible)")
print("  + Merge:      +10-20ms (normalization + combine)")
print("  + Network:    +20-30ms (additional variance)")
print("  ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ")
print("  Total Hybrid: ~85-140ms")

print("\\nüíµ COST BREAKDOWN (per 1000 queries):\\n")
print("  Dense-only:")
print("    Embeddings:  1000 √ó $0.0001 = $0.10")
print("    Pinecone:    ~$0.05")
print("    Total:       $0.15")
print("\\n  Hybrid:")
print("    Embeddings:  1000 √ó $0.0001 = $0.10")
print("    Pinecone:    ~$0.05")
print("    BM25 (mem):  $0.00")
print("    Total:       $0.15 (same cost!)")
print("\\n  Note: Cost similar, but complexity DOUBLES")

print("\\n" + "="*80)
print("\\n‚úÖ Decision Rule: Use hybrid ONLY if accuracy gain > 30%")

In [None]:
print("\nüìù SAVED_SECTION: 6")

---
## Section 7: Troubleshooting

Common issues and solutions when implementing hybrid search.

### 7.1 Common Issues & Solutions

Here's a comprehensive troubleshooting guide covering the most common hybrid search implementation issues, their symptoms, and proven solutions.

In [None]:
# Common Issues and Solutions
print("üîç Hybrid Search Troubleshooting Guide\\n")
print("="*80)

issues = [
    {
        "problem": "BM25 scores dominating hybrid results",
        "symptoms": "Alpha weighting seems ineffective, all results favor keyword matches",
        "solution": "Normalize scores BEFORE merging. Check max_bm25 != 0 handling.",
        "code": "scores_norm = scores / max(scores) if max(scores) > 0 else scores"
    },
    {
        "problem": "Dense search returns nothing",
        "symptoms": "Hybrid search = pure BM25, Pinecone shows 0 results",
        "solution": "Check: (1) Index exists, (2) Namespace matches, (3) Vectors uploaded",
        "code": "index.describe_index_stats() # Check vector count per namespace"
    },
    {
        "problem": "RRF results same as BM25",
        "symptoms": "RRF merge appears to ignore dense results",
        "solution": "Dense search may be failing silently. Verify API keys and quota.",
        "code": "try: dense_results = search_dense() except Exception as e: log(e)"
    },
    {
        "problem": "Smart alpha always returns same value",
        "symptoms": "Query analysis not detecting patterns correctly",
        "solution": "Review regex patterns. Test with clear examples (SKUs, natural text).",
        "code": "print(smart_alpha('ABC-123'))  # Should be low (~0.3)"
    },
    {
        "problem": "High latency (>200ms P50)",
        "symptoms": "Hybrid search too slow for production",
        "solution": "Parallelize dense + sparse search. Cache embeddings. Use async.",
        "code": "asyncio.gather(search_dense(), search_sparse())"
    },
    {
        "problem": "Different results every query",
        "symptoms": "Non-deterministic rankings for same query",
        "solution": "Pinecone approximate search varies. Use exact match for testing.",
        "code": "index.query(..., include_metadata=True, exact=True)"
    }
]

for i, issue in enumerate(issues, 1):
    print(f"\\n{i}. {issue['problem']}")
    print(f"   Symptoms: {issue['symptoms']}")
    print(f"   Solution: {issue['solution']}")
    print(f"   Code: {issue['code']}")

print("\\n" + "="*80)

### 7.2 Diagnostic Tests

Let's run a quick diagnostic suite to verify your hybrid search engine is configured correctly. This will check BM25, smart alpha, dense search (if available), and document counts.

In [None]:
# Quick Diagnostic Test
print("ü©∫ Running Diagnostics...\\n")

# Test 1: BM25 working?
try:
    test_bm25 = engine.search_bm25("test", top_k=1)
    print("‚úì BM25 index functional")
except Exception as e:
    print(f"‚úó BM25 error: {e}")

# Test 2: Smart alpha working?
try:
    alpha_natural = engine.smart_alpha("how does this work")
    alpha_code = engine.smart_alpha("SKU-12345")
    
    if alpha_natural > alpha_code:
        print(f"‚úì Smart alpha working (natural={alpha_natural:.2f}, code={alpha_code:.2f})")
    else:
        print(f"‚ö† Smart alpha may have issues (natural={alpha_natural:.2f}, code={alpha_code:.2f})")
except Exception as e:
    print(f"‚úó Smart alpha error: {e}")

# Test 3: Dense search configured?
if MODE == "FULL":
    try:
        test_dense = engine.search_dense("test", top_k=1)
        if test_dense:
            print(f"‚úì Dense search returning results ({len(test_dense)} found)")
        else:
            print("‚ö† Dense search returning empty (check namespace/vectors)")
    except Exception as e:
        print(f"‚úó Dense search error: {e}")
else:
    print("‚äò Dense search not configured (STUB mode)")

# Test 4: Document count
print(f"\\nüìä Indexed documents: {len(engine.documents)}")
print(f"   BM25 tokens: {len(engine.tokenized_docs)} docs tokenized")

if MODE == "FULL" and engine.index:
    try:
        stats = engine.index.describe_index_stats()
        namespace_count = stats.namespaces.get(engine.namespace, {}).get('vector_count', 0)
        print(f"   Pinecone vectors: {namespace_count} in namespace '{engine.namespace}'")
    except:
        print("   Pinecone stats unavailable")

print("\\n‚úì Diagnostics complete!")

In [None]:
print("\nüìù SAVED_SECTION: 7")

---
## üéâ Notebook Complete!

You've completed M4.1 - Hybrid Search! 

**Key Takeaways:**
1. ‚úÖ Hybrid combines BM25 + dense for better coverage
2. ‚úÖ Alpha weighting (0.0-1.0) balances sparse vs dense
3. ‚úÖ RRF is more robust and requires less tuning
4. ‚úÖ Use hybrid when query diversity demands both approaches
5. ‚úÖ Skip hybrid if corpus < 1K or queries 90%+ natural language

**Next Steps:**
- Test with your own data
- Monitor accuracy improvements
- Profile latency and costs
- Consider async implementation for production

See `README.md` for architecture details and `tests_hybrid_merge.py` for test examples.