# The Complete Guide to Query Routing in RAG Applications

## What We're Building Today

Imagine you're building an AI assistant for a large company. This company has thousands of documents spread across different departments - technical documentation for developers, product guides for customers, HR policies for employees, and support articles in multiple languages. When someone asks a question, how does your AI know where to look?

That's where query routing comes in. Think of it as building a smart traffic control system for questions. Just like a GPS routes you through the fastest path to your destination, query routing sends each question to the most appropriate knowledge base, making your AI faster, cheaper, and more accurate.

By the end of this tutorial, you'll understand not just how to build these routing systems, but why each approach works the way it does. We'll start simple and gradually build up to production-ready implementations.

## Setting Up Our Environment

Before we dive in, let's get our tools ready. We'll be using three main libraries that work beautifully together. LangChain helps us orchestrate AI workflows, OpenAI provides the intelligence, and ChromaDB acts as our smart storage system.

In [None]:
# First, install everything we need
# Run this cell once to install all required packages
!pip install langchain langchain-openai chromadb openai tiktoken numpy scikit-learn

In [None]:
# Import all our tools
import os
import json
import re
import numpy as np
from typing import List, Dict, Any, Tuple, Optional
from enum import Enum
import time

# LangChain components - our AI orchestration toolkit
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.chains import LLMChain
from langchain.callbacks import get_openai_callback

# ChromaDB - our vector database for storing and searching embeddings
import chromadb
from chromadb.utils import embedding_functions

# OpenAI direct imports for specific features
import openai

# Set your OpenAI API key here
# Get yours at: https://platform.openai.com/api-keys
os.environ["OPENAI_API_KEY"] = "<your-api-key-here>"  # Replace with your actual API key
openai.api_key = os.environ["OPENAI_API_KEY"]

print("✅ Environment ready! Let's build some routers.")

## Building Our Knowledge Bases

Before we can route queries, we need somewhere to route them to. Let's create a realistic scenario with multiple knowledge bases. We'll simulate a company with different types of documentation, each serving a different purpose. This will help us see why routing matters - sending a technical API question to the HR knowledge base would give terrible results!

In [None]:
# Let's create comprehensive sample documents for each knowledge base
# In production, these would come from your actual documentation

# Technical documentation - for developers and technical users
technical_docs = [
    """API Authentication Guide: Every API request must include an Authorization header 
    with your API key in the format 'Bearer YOUR_API_KEY'. Keys can be generated from 
    your dashboard. For enhanced security, you can also use OAuth 2.0 flow for user-specific 
    actions. Rate limits are 100 requests per minute for standard tier, 1000 for premium.""",
    
    """Webhook Configuration: Register webhooks via POST to /api/webhooks with a JSON payload 
    containing 'url' and 'events' array. We'll send POST requests to your endpoint with 
    event data. Always verify webhook signatures using HMAC-SHA256 with your webhook secret 
    to ensure requests are from us.""",
    
    """Error Handling Best Practices: Implement exponential backoff for rate limit errors (429). 
    Start with 1 second delay, double after each retry, max 5 retries. For 5xx errors, 
    retry up to 3 times. Log all errors with correlation IDs for debugging. Common errors: 
    400 (bad request), 401 (unauthorized), 403 (forbidden), 404 (not found).""",
    
    """Database Connection Pooling: Use connection pools to improve performance. Recommended 
    settings: min_size=5, max_size=20, timeout=30s. For PostgreSQL, use SSL mode 'require' 
    in production. Connection string format: postgresql://user:pass@host:5432/dbname?sslmode=require.""",
    
    """SDK Installation: Python: pip install our-sdk. JavaScript: npm install @company/sdk. 
    Java: Add Maven dependency. All SDKs support async operations and automatic retries. 
    Initialize with your API key: client = OurSDK(api_key='YOUR_KEY')."""
]

# Product documentation - for customers and sales
product_docs = [
    """Pricing Plans Overview: Starter Plan ($29/month) - up to 1,000 API calls, 5 team members, 
    basic support. Professional ($99/month) - 10,000 API calls, unlimited team members, 
    priority support, advanced analytics. Enterprise (custom pricing) - unlimited API calls, 
    dedicated support, SLA guarantee, custom integrations.""",
    
    """Key Features: Real-time collaboration with presence indicators, version control for 
    all changes, advanced AI-powered search, customizable dashboards, 50+ integrations 
    including Slack, Jira, GitHub. Mobile apps for iOS and Android with offline mode. 
    End-to-end encryption for all data.""",
    
    """Getting Started Guide: Sign up for a free 14-day trial (no credit card required). 
    Verify your email and complete onboarding. Import existing data via CSV or API. 
    Invite team members. Configure your first workflow. Most users are productive within 
    30 minutes. We offer free onboarding calls for Professional and Enterprise plans.""",
    
    """System Requirements: Browser: Chrome 90+, Firefox 88+, Safari 14+, Edge 90+. 
    Desktop app: Windows 10+, macOS 11+, Ubuntu 20.04+. Minimum 4GB RAM, 2GB disk space. 
    Internet connection required (minimum 1 Mbps). Mobile: iOS 14+ or Android 8+.""",
    
    """Trial and Billing: 14-day free trial includes all Professional features. After trial, 
    choose a plan or continue with limited free tier (100 API calls/month). Annual billing 
    saves 20%. Cancel anytime, data retained for 30 days. Refunds available within 30 days 
    of purchase."""
]

# HR/Policy documentation - for employees
policy_docs = [
    """Vacation Policy: Full-time employees receive 15 days PTO first year, 20 days after 
    two years, 25 days after five years. Unused PTO carries over (max 5 days). Request 
    time off at least 2 weeks in advance via HR portal. Sick leave is separate - 10 days 
    per year. Parental leave: 12 weeks paid.""",
    
    """Remote Work Guidelines: Employees may work remotely up to 3 days per week after 
    90-day probation. Core hours are 10 AM - 3 PM in your local timezone. Provide your 
    own internet (reimbursement available). Annual home office stipend of $500. Must be 
    available for critical meetings. International remote work requires VP approval.""",
    
    """Expense Reimbursement: Submit expenses within 30 days via Expensify. Receipts required 
    for amounts over $25. Pre-approval needed for expenses over $500. Travel meals: 
    Breakfast $20, Lunch $30, Dinner $50. Economy flights for trips under 5 hours, 
    business class for longer. Hotels up to $200/night.""",
    
    """Professional Development: Annual budget of $2,000 per employee for courses, conferences, 
    and books. Additional funding available with manager approval. Company pays for job-relevant 
    certifications. Study leave available: 5 days per year. Tuition reimbursement for 
    degree programs: up to $5,000/year.""",
    
    """Code of Conduct: Treat everyone with respect and professionalism. No discrimination 
    or harassment tolerated. Maintain confidentiality of company and customer information. 
    Disclose conflicts of interest. Report violations to HR or use anonymous hotline. 
    Social media policy: Don't share confidential info, be respectful, clarify personal 
    opinions vs company positions."""
]

# Support documentation in different languages (for language routing demo)
support_docs_english = [
    """Password Reset: Click 'Forgot Password' on the login page. Enter your email address. 
    Check your email for reset link (check spam folder). Link expires in 1 hour. Choose 
    a strong password with 8+ characters, including uppercase, lowercase, and numbers.""",
    
    """Account Security: Enable two-factor authentication in Settings > Security. We support 
    SMS, authenticator apps, and hardware keys. Never share your password. We'll never ask 
    for your password via email or phone. Review login history regularly."""
]

support_docs_spanish = [
    """Restablecimiento de Contraseña: Haga clic en 'Olvidé mi contraseña' en la página de inicio. 
    Ingrese su correo electrónico. Revise su correo para el enlace (verifique spam). 
    El enlace expira en 1 hora. Elija una contraseña segura con 8+ caracteres.""",
    
    """Seguridad de Cuenta: Active autenticación de dos factores en Configuración > Seguridad. 
    Soportamos SMS, aplicaciones de autenticación y llaves de hardware. Nunca comparta su 
    contraseña. Nunca le pediremos su contraseña por correo o teléfono."""
]

print("📚 Knowledge bases created successfully!")
print(f"\nWe now have:")
print(f"  - {len(technical_docs)} technical documents")
print(f"  - {len(product_docs)} product documents")
print(f"  - {len(policy_docs)} policy documents")
print(f"  - {len(support_docs_english) + len(support_docs_spanish)} support documents (multilingual)")

## Setting Up ChromaDB Collections

Now we need to store these documents in a way that makes them searchable. ChromaDB is perfect for this because it automatically converts text into embeddings (numerical representations that capture meaning) and lets us search by similarity. Think of it as creating smart indexes that understand concepts, not just keywords.

In [None]:
# Initialize ChromaDB with a clean slate
# In production, you'd use persist_directory to save the database
chroma_client = chromadb.Client()

# Create an embedding function that will convert text to vectors
# We're using OpenAI's embedding model which understands context and meaning
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key=os.environ["OPENAI_API_KEY"],
    model_name="text-embedding-3-small"  # Fast and efficient model
)

# Helper function to create or reset a collection
def create_collection(name: str, documents: List[str]) -> chromadb.Collection:
    """Create a ChromaDB collection and populate it with documents"""
    
    # Delete existing collection if it exists (for clean reruns)
    try:
        chroma_client.delete_collection(name=name)
    except:
        pass  # Collection doesn't exist, that's fine
    
    # Create new collection - casting the embedding function to resolve type issues
    collection = chroma_client.create_collection(
        name=name,
        embedding_function=openai_ef  # type: ignore
    )
    
    # Add documents with metadata
    for i, doc in enumerate(documents):
        collection.add(
            documents=[doc],
            ids=[f"{name}_{i}"],
            metadatas=[{"source": name, "index": i}]
        )
    
    return collection

# Alternative approach: Create collection without embedding function first
def create_collection_alt(name: str, documents: List[str]) -> chromadb.Collection:
    """Alternative: Create collection without embedding function in constructor"""
    
    # Delete existing collection if it exists (for clean reruns)
    try:
        chroma_client.delete_collection(name=name)
    except:
        pass  # Collection doesn't exist, that's fine
    
    # Create new collection without embedding function
    collection = chroma_client.create_collection(name=name)
    
    # Add documents with the embedding function specified in add()
    for i, doc in enumerate(documents):
        collection.add(
            documents=[doc],
            ids=[f"{name}_{i}"],
            metadatas=[{"source": name, "index": i}],
            embeddings=openai_ef([doc])  # Generate embeddings explicitly
        )
    
    return collection

# Create all our collections using the primary approach
print("🔄 Creating ChromaDB collections...")

technical_collection = create_collection("technical", technical_docs)
print("✅ Technical collection created")

product_collection = create_collection("product", product_docs)
print("✅ Product collection created")

policy_collection = create_collection("policy", policy_docs)
print("✅ Policy collection created")

support_en_collection = create_collection("support_en", support_docs_english)
print("✅ English support collection created")

support_es_collection = create_collection("support_es", support_docs_spanish)
print("✅ Spanish support collection created")

print("\n🎉 All collections ready for routing!")

## Part 1: Rule-Based Router

Let's start with the simplest approach - rule-based routing. This is like having a checklist: if you see certain keywords, you know exactly where to route the query. While simple, this approach is incredibly fast and predictable, making it perfect for scenarios where you have clear, domain-specific terminology.

In [None]:
class RuleBasedRouter:
    """
    A simple but effective router that uses keyword patterns.
    Think of this as a smart filter that looks for specific terms.
    """
    
    def __init__(self):
        # Define routing rules as patterns with weights
        # Higher weight = stronger signal that this is the right route
        self.rules = {
            "technical": [
                (r"\bAPI\b", 5),          # \b means word boundary - matches "API" but not "RAPID"
                (r"\bwebhook", 4),
                (r"\berror\b", 3),
                (r"\bcode\b", 3),
                (r"\bSDK\b", 4),
                (r"\bauthenticat", 4),    # Matches authentication, authenticate, etc.
                (r"\bintegrat", 3),
                (r"\bdatabase\b", 3),
                (r"\b(GET|POST|PUT|DELETE)\b", 4),  # HTTP methods
                (r"\bendpoint", 3),
                (r"\bOAuth\b", 4),
                (r"\brate limit", 3)
            ],
            "product": [
                (r"\bpric", 5),           # Matches price, pricing
                (r"\bplan\b", 4),
                (r"\btrial\b", 4),
                (r"\bfeature", 3),
                (r"\b(starter|professional|enterprise)\b", 4),
                (r"\bbilling\b", 4),
                (r"\bsubscription", 3),
                (r"\brequirement", 3),
                (r"\brefund", 3),
                (r"\bcancel", 3)
            ],
            "policy": [
                (r"\bvacation\b", 5),
                (r"\bPTO\b", 5),
                (r"\bremote work", 4),
                (r"\bexpense", 4),
                (r"\breimburs", 4),
                (r"\bpolicy\b", 3),
                (r"\bHR\b", 4),
                (r"\bemployee", 3),
                (r"\bsick leave", 4),
                (r"\bconduct\b", 3)
            ]
        }
    
    def route(self, query: str) -> Dict[str, Any]:
        """
        Analyze the query and determine the best route based on keyword patterns.
        Returns route decision with confidence scores.
        """
        query_lower = query.lower()
        scores = {}
        matched_patterns = {}
        
        # Check each route's patterns
        for route, patterns in self.rules.items():
            total_score = 0
            matches = []
            
            for pattern, weight in patterns:
                if re.search(pattern, query_lower):
                    total_score += weight
                    matches.append(pattern)
            
            scores[route] = total_score
            matched_patterns[route] = matches
        
        # Determine the best route - fix the type issue
        best_route = max(scores.keys(), key=lambda k: scores[k])
        
        # If no patterns matched, default to product (most general)
        if scores[best_route] == 0:
            best_route = "product"
            confidence = "low"
        else:
            # Calculate confidence based on score difference
            sorted_scores = sorted(scores.values(), reverse=True)
            if len(sorted_scores) > 1 and sorted_scores[0] > sorted_scores[1] * 2:
                confidence = "high"
            elif len(sorted_scores) > 1 and sorted_scores[0] > sorted_scores[1] * 1.5:
                confidence = "medium"
            else:
                confidence = "low"
        
        return {
            "route": best_route,
            "confidence": confidence,
            "score": scores[best_route],
            "all_scores": scores,
            "matched_patterns": matched_patterns[best_route]
        }

# Let's test our rule-based router
rule_router = RuleBasedRouter()

test_queries = [
    "How do I authenticate my API requests?",
    "What's the price of the enterprise plan?",
    "Can I work from home on Fridays?",
    "My webhook isn't receiving events",
    "How much vacation time do I get?"
]

print("🧪 Testing Rule-Based Router\n" + "="*50)
for query in test_queries:
    result = rule_router.route(query)
    print(f"Query: {query}")
    print(f"→ Route: {result['route']} (confidence: {result['confidence']})")
    print(f"  Matched patterns: {result['matched_patterns']}")
    print(f"  Scores: {result['all_scores']}")
    print("-"*50)

## Part 2: Semantic Router

Now let's build something more sophisticated. Semantic routing understands the meaning behind words, not just the words themselves. It works by comparing the query's embedding (its meaning in mathematical form) with pre-computed embeddings of example questions for each route. This allows it to understand that "system crash" and "application freezing" are related concepts even though they share no keywords.

In [None]:
class SemanticRouter:
    """
    Routes queries based on semantic similarity to example questions.
    This is like having a router that actually understands meaning.
    """
    
    def __init__(self):
        # Initialize the embedding model
        self.embeddings = OpenAIEmbeddings(
            model="text-embedding-3-small"
        )
        
        # Define example queries for each route
        # These serve as "semantic anchors" - representative questions for each category
        self.route_examples = {
            "technical": [
                "How do I integrate your API?",
                "Getting 401 unauthorized error",
                "Webhook signature validation failing",
                "Database connection timeout issues",
                "Need help with SDK installation",
                "Rate limiting best practices",
                "OAuth flow not working correctly",
                "How to handle pagination in API responses"
            ],
            "product": [
                "What features are included?",
                "Difference between plans",
                "How much does it cost?",
                "Can I try before buying?",
                "What are the system requirements?",
                "Do you offer discounts?",
                "How to upgrade my plan",
                "Is there a free tier available?"
            ],
            "policy": [
                "How many vacation days do I have?",
                "Remote work policy details",
                "How to submit expense reports",
                "Professional development budget",
                "Sick leave policy",
                "Company code of conduct",
                "Parental leave benefits",
                "How to request time off"
            ]
        }
        
        # Pre-compute embeddings for all examples
        # This is done once at initialization for efficiency
        print("🧮 Computing semantic embeddings for route examples...")
        self.route_embeddings = {}
        
        for route, examples in self.route_examples.items():
            # Get embeddings for all examples in one API call (more efficient)
            embeddings = self.embeddings.embed_documents(examples)
            self.route_embeddings[route] = embeddings
            print(f"  ✓ Computed {len(embeddings)} embeddings for {route}")
    
    def cosine_similarity(self, vec1: np.ndarray, vec2: np.ndarray) -> float:
        """
        Calculate cosine similarity between two vectors.
        Returns a value between -1 and 1, where 1 means identical direction.
        """
        # vec1 and vec2 are already numpy arrays
        dot_product = np.dot(vec1, vec2)
        norm1 = np.linalg.norm(vec1)
        norm2 = np.linalg.norm(vec2)
        
        # Avoid division by zero
        if norm1 == 0 or norm2 == 0:
            return 0
        
        return dot_product / (norm1 * norm2)
    
    def route(self, query: str) -> Dict[str, Any]:
        """
        Route query based on semantic similarity to example questions.
        """
        # Get embedding for the query
        query_embedding = np.array(self.embeddings.embed_query(query))
        
        # Calculate similarity with each route's examples
        route_scores = {}
        best_matches = {}
        
        for route, example_embeddings in self.route_embeddings.items():
            similarities = []
            
            # Compare with each example
            for i, example_embedding in enumerate(example_embeddings):
                similarity = self.cosine_similarity(
                    query_embedding, np.array(example_embedding)
                )
                similarities.append(similarity)
            
            # Use the maximum similarity as the route score
            # You could also use mean or top-k average
            max_similarity = max(similarities)
            max_index = similarities.index(max_similarity)
            
            route_scores[route] = max_similarity
            best_matches[route] = {
                "example": self.route_examples[route][max_index],
                "similarity": max_similarity
            }
        
        # Select the route with highest similarity
        best_route = max(route_scores.keys(), key=lambda k: route_scores[k])
        
        # Determine confidence based on similarity score
        similarity = route_scores[best_route]
        if similarity > 0.85:
            confidence = "high"
        elif similarity > 0.70:
            confidence = "medium"
        else:
            confidence = "low"
        
        return {
            "route": best_route,
            "confidence": confidence,
            "similarity": similarity,
            "all_scores": route_scores,
            "best_match": best_matches[best_route]
        }

# Initialize and test the semantic router
semantic_router = SemanticRouter()

test_queries = [
    "My application keeps crashing when calling your service",  # Similar to technical errors
    "What's included in your most expensive option?",          # Similar to pricing/plans
    "I need to take some days off next month",                # Similar to vacation policy
    "The system is too slow",                                  # Ambiguous - could be technical or product
]

print("\n🧪 Testing Semantic Router\n" + "="*50)
for query in test_queries:
    result = semantic_router.route(query)
    print(f"Query: {query}")
    print(f"→ Route: {result['route']} (confidence: {result['confidence']})")
    print(f"  Best match: '{result['best_match']['example']}'")
    print(f"  Similarity: {result['best_match']['similarity']:.3f}")
    print(f"  All scores: {', '.join(f'{k}:{v:.3f}' for k,v in result['all_scores'].items())}")
    print("-"*50)

## Part 3: LLM Completion Router

Sometimes you need the full reasoning power of a language model to make routing decisions. This approach uses GPT to analyze the query and decide where it should go. While more expensive and slower than previous methods, it can handle complex, ambiguous queries that would confuse simpler routers.

In [None]:
class LLMCompletionRouter:
    """
    Uses a language model to intelligently route queries.
    This is like having an expert analyst decide where each question should go.
    """
    
    def __init__(self, model: str = "gpt-3.5-turbo"):
        # Initialize the language model
        self.llm = ChatOpenAI(
            model=model,
            temperature=0  # Set to 0 for consistent, deterministic routing
        )
        
        # Create a detailed prompt template that guides the model's decision
        self.routing_prompt = ChatPromptTemplate.from_template("""
You are a query router for a company's knowledge base system. Your job is to analyze 
incoming queries and determine which knowledge base would best answer them.

Available knowledge bases:
1. "technical" - API documentation, SDKs, webhooks, integration guides, error codes, 
   debugging help, database configurations, authentication methods
   
2. "product" - Pricing plans, features, system requirements, trial information, 
   billing, subscriptions, getting started guides, product capabilities
   
3. "policy" - HR policies, vacation/PTO, remote work, expense reimbursement, 
   code of conduct, employee benefits, professional development

Analyze this query and respond with ONLY the knowledge base name (technical, product, or policy).
Also provide a confidence level (high, medium, low) and brief reasoning.

Query: {query}

Respond in this exact format:
ROUTE: [knowledge_base]
CONFIDENCE: [high/medium/low]
REASON: [one sentence explanation]
""")
        
        # Alternative prompt for handling ambiguous queries
        self.clarification_prompt = ChatPromptTemplate.from_template("""
This query might be ambiguous or could relate to multiple knowledge bases.
Query: {query}

Could this query potentially relate to multiple knowledge bases? If yes, list all relevant ones.
If the query needs clarification, suggest what information would help route it better.

Respond in this format:
PRIMARY_ROUTE: [most likely knowledge base]
ALTERNATIVE_ROUTES: [other possible knowledge bases, comma-separated, or "none"]
CLARIFICATION_NEEDED: [yes/no]
CLARIFICATION: [what to ask the user, or "none"]
""")
    
    def route(self, query: str, handle_ambiguity: bool = False) -> Dict[str, Any]:
        """
        Route a query using LLM analysis.
        
        Args:
            query: The user's question
            handle_ambiguity: If True, use additional analysis for ambiguous queries
        """
        # Create the routing chain
        routing_chain = LLMChain(
            llm=self.llm,
            prompt=self.routing_prompt
        )
        
        # Get routing decision
        with get_openai_callback() as cb:
            response = routing_chain.run(query=query)
            tokens_used = cb.total_tokens
            cost = cb.total_cost
        
        # Parse the response
        lines = response.strip().split('\n')
        route = "product"  # default
        confidence = "low"
        reason = "Unable to determine"
        
        for line in lines:
            if line.startswith("ROUTE:"):
                route = line.replace("ROUTE:", "").strip().lower()
            elif line.startswith("CONFIDENCE:"):
                confidence = line.replace("CONFIDENCE:", "").strip().lower()
            elif line.startswith("REASON:"):
                reason = line.replace("REASON:", "").strip()
        
        result = {
            "route": route,
            "confidence": confidence,
            "reason": reason,
            "tokens_used": tokens_used,
            "cost": f"${cost:.4f}" if cost else "$0.0000"
        }
        
        # Handle ambiguous queries if requested
        if handle_ambiguity and confidence == "low":
            clarification_chain = LLMChain(
                llm=self.llm,
                prompt=self.clarification_prompt
            )
            
            clarification_response = clarification_chain.run(query=query)
            
            # Parse clarification response
            lines = clarification_response.strip().split('\n')
            for line in lines:
                if line.startswith("ALTERNATIVE_ROUTES:"):
                    alt_routes = line.replace("ALTERNATIVE_ROUTES:", "").strip()
                    if alt_routes != "none":
                        result["alternative_routes"] = [r.strip() for r in alt_routes.split(',')]
                elif line.startswith("CLARIFICATION:"):
                    clarification = line.replace("CLARIFICATION:", "").strip()
                    if clarification != "none":
                        result["clarification_suggestion"] = clarification
        
        return result

# Initialize and test the LLM router
llm_router = LLMCompletionRouter()

test_queries = [
    "I'm getting errors when trying to connect",  # Ambiguous - could be technical or product
    "Tell me about your REST API authentication",  # Clearly technical
    "Do you offer educational discounts?",        # Clearly product
    "I need help with something",                 # Very ambiguous
]

print("🧪 Testing LLM Completion Router\n" + "="*50)
for query in test_queries:
    print(f"Query: {query}")
    
    # First try normal routing
    result = llm_router.route(query)
    print(f"→ Route: {result['route']} (confidence: {result['confidence']})")
    print(f"  Reason: {result['reason']}")
    print(f"  Cost: {result['cost']}, Tokens: {result['tokens_used']}")
    
    # If low confidence, try with ambiguity handling
    if result['confidence'] == "low":
        result_with_ambiguity = llm_router.route(query, handle_ambiguity=True)
        if 'alternative_routes' in result_with_ambiguity:
            print(f"  Alternative routes: {result_with_ambiguity['alternative_routes']}")
        if 'clarification_suggestion' in result_with_ambiguity:
            print(f"  Suggested clarification: {result_with_ambiguity['clarification_suggestion']}")
    
    print("-"*50)

## Part 4: LLM Function Calling Router

OpenAI's function calling feature gives us a more structured way to get routing decisions. Instead of parsing text responses, we define routing as a function that the model can call with specific parameters. This approach is more reliable and can include additional metadata like confidence scores and multi-route suggestions.

In [None]:
import os
import json
from typing import Dict, Any, cast
from openai import OpenAI, APIError

class LLMFunctionRouter:
    """
    Uses OpenAI's function calling for structured routing decisions.
    Updated to use OpenAI Python SDK 1.x client interface with tools.
    """
    
    def __init__(self, model: str = "gpt-3.5-turbo"):
        self.model = model
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        
        # Updated to use the new tools format
        self.routing_tool = {
            "type": "function",
            "function": {
                "name": "route_query",
                "description": "Route a user query to the appropriate knowledge base",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "primary_route": {
                            "type": "string",
                            "enum": ["technical", "product", "policy"],
                            "description": "The best knowledge base for this query"
                        },
                        "confidence_score": {
                            "type": "number",
                            "minimum": 0,
                            "maximum": 1,
                            "description": "Confidence in the routing decision (0-1)"
                        },
                        "alternative_routes": {
                            "type": "array",
                            "items": {
                                "type": "string",
                                "enum": ["technical", "product", "policy"]
                            },
                            "description": "Other knowledge bases that might be relevant"
                        },
                        "reasoning": {
                            "type": "string",
                            "description": "Brief explanation of the routing decision"
                        },
                        "query_intent": {
                            "type": "string",
                            "enum": [
                                "troubleshooting",
                                "information_seeking",
                                "purchase_decision",
                                "policy_clarification",
                                "how_to_guide",
                                "complaint",
                                "other"
                            ],
                            "description": "The user's intent behind the query"
                        }
                    },
                    "required": ["primary_route", "confidence_score", "reasoning", "query_intent"]
                }
            }
        }
        
        self.system_message = (
            "You are an intelligent query router for a company knowledge base. Analyze each query and "
            "determine the most appropriate knowledge base:\n\n"
            "- technical: API docs, SDKs, integration guides, debugging, error codes\n"
            "- product: Pricing, features, trials, requirements, billing\n"
            "- policy: HR, vacation, remote work, expenses, employee benefits\n\n"
            "Consider the query's intent and provide alternative routes if the query is ambiguous."
        )
    
    def route(self, query: str) -> Dict[str, Any]:
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": self.system_message},
                    {"role": "user", "content": f"Route this query: {query}"}
                ],
                tools=cast(Any, [self.routing_tool]),
                tool_choice=cast(Any, {"type": "function", "function": {"name": "route_query"}}),
                temperature=0
            )
            
            # Updated response handling for tools
            tool_call = response.choices[0].message.tool_calls[0] if response.choices[0].message.tool_calls else None
            
            if not tool_call:
                # Fallback if no tool call was made
                return {
                    "route": "product",
                    "confidence": "low",
                    "confidence_score": 0.3,
                    "reasoning": "No function call detected, defaulting to product",
                    "error": "No tool call in response"
                }
            
            function_args = json.loads(tool_call.function.arguments)
            confidence_score = function_args.get("confidence_score", 0.5)
            
            if confidence_score >= 0.8:
                confidence_level = "high"
            elif confidence_score >= 0.6:
                confidence_level = "medium"
            else:
                confidence_level = "low"
            
            # Safe access to usage with fallback
            tokens_used = response.usage.total_tokens if response.usage else 0
            cost_per_token = 0.000002 if "gpt-3.5" in self.model else 0.00003
            cost = tokens_used * cost_per_token
            
            return {
                "route": function_args.get("primary_route"),
                "confidence": confidence_level,
                "confidence_score": confidence_score,
                "alternative_routes": function_args.get("alternative_routes", []),
                "reasoning": function_args.get("reasoning"),
                "query_intent": function_args.get("query_intent"),
                "tokens_used": tokens_used,
                "cost": f"${cost:.4f}"
            }
        except APIError as e:
            return {
                "route": "product",
                "confidence": "low",
                "confidence_score": 0.3,
                "reasoning": "Error in routing, defaulting to product",
                "error": str(e)
            }
        except Exception as e:
            return {
                "route": "product",
                "confidence": "low",
                "confidence_score": 0.3,
                "reasoning": "Unexpected error in routing, defaulting to product",
                "error": str(e)
            }

# Usage example
function_router = LLMFunctionRouter()
test_queries = [
    "My API calls are returning 429 errors",
    "What's the difference between your plans?",
    "Can I expense my home internet?",
    "The dashboard is loading very slowly",
    "I want to build an integration",
]

print("🧪 Testing LLM Function Calling Router\n" + "="*50)
for query in test_queries:
    result = function_router.route(query)
    print(f"Query: {query}")
    print(f"→ Route: {result['route']} (confidence: {result['confidence']} - {result['confidence_score']:.2f})")
    print(f"  Intent: {result.get('query_intent', 'unknown')}")
    print(f"  Reason: {result['reasoning']}")
    if result.get("alternative_routes"):
        print(f"  Alternatives: {result['alternative_routes']}")
    print(f"  Cost: {result.get('cost', 'N/A')}")
    if result.get("error"):
        print(f"  Error: {result['error']}")
    print("-"*50)

## Part 5: Zero-Shot Classification Router

Zero-shot classification is fascinating because it can categorize queries without any training examples. We just describe what each category means, and the model figures out where queries belong based on its understanding of language. This is incredibly useful when you need to add new routes quickly or when you don't have many example queries.

In [None]:
class ZeroShotRouter:
    """
    Routes queries using zero-shot classification - no training examples needed!
    The model understands categories from descriptions alone.
    """
    
    def __init__(self, model: str = "gpt-3.5-turbo"):
        self.llm = ChatOpenAI(model=model, temperature=0)
        
        # Define categories with natural language descriptions
        # The model will understand these without any examples
        self.category_descriptions = {
            "technical": "Technical documentation, APIs, programming, integrations, debugging, errors, code",
            "product": "Product features, pricing, plans, trials, billing, subscriptions, capabilities",
            "policy": "Company policies, HR matters, employee benefits, vacation, expenses, workplace rules"
        }
        
        # Create a zero-shot classification prompt
        self.zero_shot_prompt = ChatPromptTemplate.from_template("""
Classify the following query into one of these categories based ONLY on their descriptions:

{categories}

Query: {query}

Think step by step:
1. What is the main topic of this query?
2. Which category description best matches this topic?
3. What is your confidence in this classification?

Respond with:
CATEGORY: [the matching category name]
CONFIDENCE: [0-100]
REASONING: [your step-by-step thinking]
""")
        
        # Alternative prompt for multi-label classification
        self.multi_label_prompt = ChatPromptTemplate.from_template("""
This query might relate to multiple categories. Identify ALL relevant categories:

{categories}

Query: {query}

For each relevant category, provide a relevance score (0-100).
Respond in this format:
CATEGORIES: category1:score1, category2:score2, ...
PRIMARY: [most relevant category]
EXPLANATION: [why these categories are relevant]
""")
    
    def format_categories(self) -> str:
        """Format category descriptions for the prompt"""
        return "\n".join([
            f"- {name}: {description}"
            for name, description in self.category_descriptions.items()
        ])
    
    def route(self, query: str, multi_label: bool = False) -> Dict[str, Any]:
        """
        Route using zero-shot classification.
        
        Args:
            query: The query to classify
            multi_label: If True, allow multiple relevant categories
        """
        categories_str = self.format_categories()
        
        if not multi_label:
            # Single-label classification
            chain = LLMChain(
                llm=self.llm,
                prompt=self.zero_shot_prompt
            )
            
            response = chain.run(categories=categories_str, query=query)
            
            # Parse response
            lines = response.strip().split('\n')
            category = "product"  # default
            confidence = 50
            reasoning = "Unable to classify"
            
            for line in lines:
                if line.startswith("CATEGORY:"):
                    category = line.replace("CATEGORY:", "").strip().lower()
                elif line.startswith("CONFIDENCE:"):
                    try:
                        confidence = int(line.replace("CONFIDENCE:", "").strip())
                    except:
                        confidence = 50
                elif line.startswith("REASONING:"):
                    reasoning = line.replace("REASONING:", "").strip()
            
            # Convert confidence to level
            if confidence >= 80:
                confidence_level = "high"
            elif confidence >= 60:
                confidence_level = "medium"
            else:
                confidence_level = "low"
            
            return {
                "route": category,
                "confidence": confidence_level,
                "confidence_score": confidence,
                "reasoning": reasoning,
                "method": "zero-shot"
            }
        
        else:
            # Multi-label classification
            chain = LLMChain(
                llm=self.llm,
                prompt=self.multi_label_prompt
            )
            
            response = chain.run(categories=categories_str, query=query)
            
            # Parse multi-label response
            lines = response.strip().split('\n')
            categories_scores = {}
            primary = "product"
            explanation = ""
            
            for line in lines:
                if line.startswith("CATEGORIES:"):
                    cats_str = line.replace("CATEGORIES:", "").strip()
                    for cat_score in cats_str.split(','):
                        if ':' in cat_score:
                            cat, score = cat_score.strip().split(':')
                            try:
                                categories_scores[cat] = int(score)
                            except:
                                pass
                elif line.startswith("PRIMARY:"):
                    primary = line.replace("PRIMARY:", "").strip().lower()
                elif line.startswith("EXPLANATION:"):
                    explanation = line.replace("EXPLANATION:", "").strip()
            
            return {
                "route": primary,
                "all_categories": categories_scores,
                "explanation": explanation,
                "method": "zero-shot-multi-label"
            }

# Test zero-shot routing
zero_shot_router = ZeroShotRouter()

# Test with queries that don't explicitly mention category keywords
test_queries = [
    "Something is broken and I need help",      # Vague technical issue
    "How do I get started with your service?",  # Could be product or technical
    "I'm going on vacation next week",          # Clearly policy
    "Is there a way to reduce my monthly bill?", # Product/billing
]

print("🧪 Testing Zero-Shot Router\n" + "="*50)
print("\nSingle-label classification:")
print("-"*30)

for query in test_queries:
    result = zero_shot_router.route(query)
    print(f"Query: {query}")
    print(f"→ Route: {result['route']} (confidence: {result['confidence_score']}%)")
    print(f"  Reasoning: {result['reasoning']}")
    print("-"*30)

print("\nMulti-label classification:")
print("-"*30)

# Test multi-label on ambiguous queries
ambiguous_queries = [
    "I want to integrate your API but need to know the cost first",
    "Can employees use the premium features from home?",
]

for query in ambiguous_queries:
    result = zero_shot_router.route(query, multi_label=True)
    print(f"Query: {query}")
    print(f"→ Primary: {result['route']}")
    print(f"  All categories: {result.get('all_categories', {})}")
    print(f"  Explanation: {result.get('explanation', '')}")
    print("-"*30)

## Part 6: Language Classification Router

Before routing by content, you often need to route by language. This is essential for global applications. Language detection happens quickly and accurately, allowing you to direct queries to language-specific knowledge bases or support teams.

In [None]:
from typing import Dict, Any
from openai import OpenAI
import os

class LanguageRouter:
    """
    Routes queries based on detected language.
    This typically happens before content-based routing.
    """
    
    def __init__(self):
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        
        # Map languages to their knowledge bases
        self.language_routes = {
            "english": "support_en",
            "spanish": "support_es",
            "french": "support_fr",
            "german": "support_de",
            "chinese": "support_zh",
            "japanese": "support_ja"
        }
        
        # Simple language detection prompt
        self.language_prompt = """
Detect the language of this text. Respond with ONLY the language name in English (lowercase).

Text: {query}

Language:
"""
        
        # More detailed analysis prompt
        self.detailed_prompt = """
Analyze this text and provide:
1. Primary language
2. Confidence level (0-100)
3. Any mixed languages detected
4. Script/writing system used

Text: {query}

Respond in this format:
LANGUAGE: [primary language]
CONFIDENCE: [0-100]
MIXED_LANGUAGES: [list any other languages detected, or "none"]
SCRIPT: [latin, cyrillic, chinese, arabic, etc.]
"""
    
    def detect_language(self, query: str, detailed: bool = False) -> Dict[str, Any]:
        """
        Detect the language of a query.
        
        Args:
            query: The text to analyze
            detailed: If True, provide detailed language analysis
        """
        try:
            if not detailed:
                # Simple language detection
                response = self.client.chat.completions.create(
                    model="gpt-3.5-turbo",
                    messages=[
                        {"role": "user", "content": self.language_prompt.format(query=query)}
                    ],
                    temperature=0
                )
                
                language = response.choices[0].message.content.strip().lower()
                
                # Get the appropriate route
                route = self.language_routes.get(language, "support_en")  # Default to English
                
                return {
                    "language": language,
                    "route": route,
                    "method": "simple_detection"
                }
            
            else:
                # Detailed language analysis
                response = self.client.chat.completions.create(
                    model="gpt-3.5-turbo",
                    messages=[
                        {"role": "user", "content": self.detailed_prompt.format(query=query)}
                    ],
                    temperature=0
                )
                
                response_text = response.choices[0].message.content
                
                # Parse detailed response
                lines = response_text.strip().split('\n')
                language = "english"
                confidence = 50
                mixed_languages = []
                script = "latin"
                
                for line in lines:
                    if line.startswith("LANGUAGE:"):
                        language = line.replace("LANGUAGE:", "").strip().lower()
                    elif line.startswith("CONFIDENCE:"):
                        try:
                            confidence = int(line.replace("CONFIDENCE:", "").strip())
                        except:
                            confidence = 50
                    elif line.startswith("MIXED_LANGUAGES:"):
                        mixed = line.replace("MIXED_LANGUAGES:", "").strip()
                        if mixed != "none":
                            mixed_languages = [lang.strip() for lang in mixed.split(',')]
                    elif line.startswith("SCRIPT:"):
                        script = line.replace("SCRIPT:", "").strip().lower()
                
                # Get route
                route = self.language_routes.get(language, "support_en")
                
                return {
                    "language": language,
                    "route": route,
                    "confidence": confidence,
                    "mixed_languages": mixed_languages,
                    "script": script,
                    "method": "detailed_detection"
                }
        
        except Exception as e:
            # Fallback in case of API errors
            return {
                "language": "english",
                "route": "support_en",
                "confidence": 0,
                "method": "error_fallback",
                "error": str(e)
            }
    
    def route_with_content(self, query: str, content_router) -> Dict[str, Any]:
        """
        First detect language, then route by content within that language.
        This is how production systems typically work.
        """
        # Step 1: Detect language
        lang_result = self.detect_language(query, detailed=True)
        
        # Step 2: Route by content if confidence is high
        if lang_result.get('confidence', 0) > 70:
            content_result = content_router.route(query)
            
            return {
                "language": lang_result['language'],
                "language_confidence": lang_result['confidence'],
                "content_route": content_result['route'],
                "content_confidence": content_result.get('confidence', 'unknown'),
                "final_route": f"{lang_result['route']}_{content_result['route']}",
                "route": f"{lang_result['route']}_{content_result['route']}",  # Fix: Add this key
                "method": "language_then_content"
            }
        else:
            # Low confidence in language detection
            return {
                "language": "uncertain",
                "route": "support_en",  # Default to English
                "confidence": lang_result.get('confidence', 0),
                "suggestion": "Please specify your preferred language",
                "method": "language_uncertain"
            }

# Test language routing
language_router = LanguageRouter()

# Test queries in different languages
multilingual_queries = [
    "How do I reset my password?",                    # English
    "¿Cómo restablezco mi contraseña?",              # Spanish
    "Comment réinitialiser mon mot de passe?",       # French
    "Wie kann ich mein Passwort zurücksetzen?",      # German
    "我如何重置密码？",                                # Chinese
    "Can you help me with the API? Gracias!",        # Mixed English/Spanish
]

print("🧪 Testing Language Router\n" + "="*50)
print("\nSimple language detection:")
print("-"*30)

for query in multilingual_queries[:4]:
    result = language_router.detect_language(query)
    print(f"Query: {query[:50]}..." if len(query) > 50 else f"Query: {query}")
    print(f"→ Language: {result['language']}")
    print(f"  Route: {result['route']}")
    print("-"*30)

print("\nDetailed language analysis:")
print("-"*30)

for query in multilingual_queries[4:]:
    result = language_router.detect_language(query, detailed=True)
    print(f"Query: {query}")
    print(f"→ Language: {result['language']} (confidence: {result['confidence']}%)")
    print(f"  Script: {result['script']}")
    if result['mixed_languages']:
        print(f"  Mixed languages: {result['mixed_languages']}")
    print(f"  Route: {result['route']}")
    print("-"*30)

# Test combined language + content routing
print("\nCombined language + content routing:")
print("-"*30)

combined_test = [
    "What are the API rate limits?",
    "¿Cuánto cuesta el plan profesional?",
]

# Note: You'll need to have rule_router defined for this to work
# rule_router = RuleBasedRouter()  # Uncomment this line if not already defined

for query in combined_test:
    try:
        # Using a mock router if rule_router is not available
        class MockRouter:
            def route(self, query):
                return {"route": "technical", "confidence": "medium"}
        
        mock_router = MockRouter()
        result = language_router.route_with_content(query, mock_router)
        
        print(f"Query: {query}")
        print(f"→ Language: {result['language']} ({result.get('language_confidence', 'N/A')}%)")
        print(f"  Content route: {result.get('content_route', 'N/A')}")
        print(f"  Final route: {result.get('final_route', result.get('route', 'N/A'))}")
        print("-"*30)
    except Exception as e:
        print(f"Error processing query '{query}': {e}")
        print("-"*30)

## Part 7: Hybrid Router - Combining All Strategies

In production, the best approach often combines multiple routing strategies. This gives you speed when possible, accuracy when needed, and the ability to handle edge cases gracefully. Let's build a sophisticated hybrid router that uses all the techniques we've learned.

In [None]:
class HybridRouter:
    """
    A production-ready router that combines multiple strategies intelligently.
    This is what you'd actually use in a real application.
    """
    
    def __init__(self, 
                 use_rule_based: bool = True,
                 use_semantic: bool = True,
                 use_llm: bool = True,
                 use_language: bool = True):
        
        # Initialize all routers based on configuration
        self.use_rule_based = use_rule_based
        self.use_semantic = use_semantic
        self.use_llm = use_llm
        self.use_language = use_language
        
        if use_rule_based:
            self.rule_router = RuleBasedRouter()
        if use_semantic:
            self.semantic_router = SemanticRouter()
        if use_llm:
            self.llm_router = LLMCompletionRouter()
        if use_language:
            self.language_router = LanguageRouter()
        
        # Configuration for routing strategy
        self.confidence_thresholds = {
            "high": 0.8,
            "medium": 0.6,
            "low": 0.4
        }
        
        # Weights for combining different router scores
        self.router_weights = {
            "rule": 0.25,
            "semantic": 0.35,
            "llm": 0.40
        }
    
    def route(self, query: str, strategy: str = "cascade") -> Dict[str, Any]:
        """
        Route using specified strategy.
        
        Strategies:
        - cascade: Try fast methods first, escalate if low confidence
        - ensemble: Combine all methods and vote
        - adaptive: Choose strategy based on query characteristics
        """
        
        # First, detect language if enabled
        language_info = None
        if self.use_language:
            language_info = self.language_router.detect_language(query)
            if language_info['language'] != 'english':
                # For non-English, return language-specific route
                return {
                    "route": language_info['route'],
                    "strategy": "language_routing",
                    "language": language_info['language'],
                    "details": language_info
                }
        
        if strategy == "cascade":
            return self._cascade_routing(query)
        elif strategy == "ensemble":
            return self._ensemble_routing(query)
        elif strategy == "adaptive":
            return self._adaptive_routing(query)
        else:
            return self._cascade_routing(query)  # Default
    
    def _cascade_routing(self, query: str) -> Dict[str, Any]:
        """
        Try routers in order of speed/cost, escalating only if needed.
        This is the most efficient strategy for production.
        """
        results = {}
        
        # Step 1: Try rule-based (fastest)
        if self.use_rule_based:
            rule_result = self.rule_router.route(query)
            results['rule'] = rule_result
            
            if rule_result['confidence'] == 'high':
                return {
                    "route": rule_result['route'],
                    "confidence": "high",
                    "strategy": "cascade_rule",
                    "details": results
                }
        
        # Step 2: Try semantic (medium speed)
        if self.use_semantic:
            semantic_result = self.semantic_router.route(query)
            results['semantic'] = semantic_result
            
            if semantic_result['confidence'] == 'high':
                return {
                    "route": semantic_result['route'],
                    "confidence": "high",
                    "strategy": "cascade_semantic",
                    "details": results
                }
        
        # Step 3: Use LLM (slowest but most accurate)
        if self.use_llm:
            llm_result = self.llm_router.route(query)
            results['llm'] = llm_result
            
            return {
                "route": llm_result['route'],
                "confidence": llm_result['confidence'],
                "strategy": "cascade_llm",
                "details": results
            }
        
        # Fallback if no routers are enabled
        return {
            "route": "product",
            "confidence": "low",
            "strategy": "cascade_default",
            "details": results
        }
    
    def _ensemble_routing(self, query: str) -> Dict[str, Any]:
        """
        Run all routers and combine their results.
        More accurate but slower and more expensive.
        """
        results = {}
        weighted_scores = {"technical": 0.0, "product": 0.0, "policy": 0.0}
        
        # Collect votes from all routers
        if self.use_rule_based:
            rule_result = self.rule_router.route(query)
            results['rule'] = rule_result
            route = rule_result['route']
            
            # Add weighted vote
            confidence_multiplier = {"high": 1.0, "medium": 0.7, "low": 0.4}.get(
                rule_result['confidence'], 0.4
            )
            weighted_scores[route] += self.router_weights['rule'] * confidence_multiplier
        
        if self.use_semantic:
            semantic_result = self.semantic_router.route(query)
            results['semantic'] = semantic_result
            route = semantic_result['route']
            
            # Add weighted vote based on similarity score
            weighted_scores[route] += self.router_weights['semantic'] * semantic_result['similarity']
        
        if self.use_llm:
            llm_result = self.llm_router.route(query)
            results['llm'] = llm_result
            route = llm_result['route']
            
            confidence_multiplier = {"high": 1.0, "medium": 0.7, "low": 0.4}.get(
                llm_result['confidence'], 0.4
            )
            weighted_scores[route] += self.router_weights['llm'] * confidence_multiplier
        
        # Determine winner
        best_route = max(weighted_scores.keys(), key=lambda k: weighted_scores[k])
        best_score = weighted_scores[best_route]
        
        # Calculate confidence based on agreement
        if best_score > 0.7:
            confidence = "high"
        elif best_score > 0.5:
            confidence = "medium"
        else:
            confidence = "low"
        
        return {
            "route": best_route,
            "confidence": confidence,
            "weighted_scores": weighted_scores,
            "strategy": "ensemble",
            "details": results
        }
    
    def _adaptive_routing(self, query: str) -> Dict[str, Any]:
        """
        Choose routing strategy based on query characteristics.
        This is the smartest but most complex approach.
        """
        query_length = len(query.split())
        has_technical_indicators = any(word in query.lower() for word in 
                                      ['api', 'error', 'code', 'integration', 'webhook'])
        has_product_indicators = any(word in query.lower() for word in 
                                    ['price', 'plan', 'trial', 'feature', 'cost'])
        has_policy_indicators = any(word in query.lower() for word in 
                                   ['vacation', 'policy', 'expense', 'remote', 'pto'])
        
        # Count indicators
        indicator_count = sum([has_technical_indicators, has_product_indicators, has_policy_indicators])
        
        # Choose strategy based on query characteristics
        if indicator_count == 1 and query_length < 10:
            # Clear, simple query - use cascade for speed
            return self._cascade_routing(query)
        elif indicator_count > 1 or query_length > 20:
            # Complex or ambiguous query - use ensemble for accuracy
            return self._ensemble_routing(query)
        else:
            # Medium complexity - use cascade but note it might need ensemble
            result = self._cascade_routing(query)
            if result['confidence'] == 'low':
                # Low confidence, try ensemble
                result = self._ensemble_routing(query)
                result['strategy'] = 'adaptive_escalated'
            else:
                result['strategy'] = 'adaptive_cascade'
            return result

# Initialize the hybrid router with all strategies
hybrid_router = HybridRouter(
    use_rule_based=True,
    use_semantic=True,
    use_llm=True,
    use_language=False  # Disable for these English-only tests
)

# Test different routing strategies
test_queries = [
    "API authentication",                              # Simple, clear
    "How much does the professional plan cost?",      # Clear question
    "I'm having issues connecting to your service",    # Ambiguous
    "I want to integrate your API into my app but I need to know the pricing first and whether you offer educational discounts",  # Complex
]

strategies = ["cascade", "ensemble", "adaptive"]

print("🧪 Testing Hybrid Router with Different Strategies\n" + "="*50)

for strategy in strategies:
    print(f"\n📊 Strategy: {strategy.upper()}")
    print("-"*40)
    
    for query in test_queries:
        start_time = time.time()
        result = hybrid_router.route(query, strategy=strategy)
        elapsed = time.time() - start_time
        
        print(f"Query: {query[:50]}..." if len(query) > 50 else f"Query: {query}")
        print(f"→ Route: {result['route']} (confidence: {result['confidence']})")
        print(f"  Strategy used: {result['strategy']}")
        print(f"  Time: {elapsed:.2f}s")
        
        if 'weighted_scores' in result:
            scores_str = ', '.join(f"{k}:{v:.2f}" for k,v in result['weighted_scores'].items())
            print(f"  Scores: {scores_str}")
        
        print("-"*40)

## Part 8: Complete RAG System with Smart Routing

Now let's put everything together into a production-ready RAG system that uses intelligent routing to provide accurate, fast responses. This system will route queries to the appropriate knowledge base and generate comprehensive answers.

In [None]:
class SmartRAGSystem:
    """
    A complete RAG system with intelligent query routing.
    This is what you'd deploy in production.
    """
    
    def __init__(self, collections: Dict[str, chromadb.Collection], 
                 router: HybridRouter):
        self.collections = collections
        self.router = router
        
        # Initialize the answer generation LLM
        self.answer_llm = ChatOpenAI(
            model="gpt-3.5-turbo",
            temperature=0.7  # Slightly higher for more natural answers
        )
        
        # Answer generation prompt
        self.answer_prompt = ChatPromptTemplate.from_template("""
You are a helpful assistant answering questions based on provided context.
Use the context to provide accurate, helpful answers. If the context doesn't 
contain relevant information, say so honestly.

Context from {source} knowledge base:
{context}

User Question: {query}

Please provide a clear, concise answer:
""")
    
    def search_collection(self, collection_name: str, query: str, 
                         k: int = 3) -> List[Dict[str, Any]]:
        """
        Search a specific collection for relevant documents.
        """
        if collection_name not in self.collections:
            return []
        
        collection = self.collections[collection_name]
        
        # Search the collection
        results = collection.query(
            query_texts=[query],
            n_results=k
        )
        
        # Format results
        formatted_results = []
        if results['documents'] and len(results['documents'][0]) > 0:
            for i, doc in enumerate(results['documents'][0]):
                formatted_results.append({
                    "content": doc,
                    "metadata": results['metadatas'][0][i] if results['metadatas'] else {},
                    "distance": results['distances'][0][i] if results['distances'] else 0
                })
        
        return formatted_results
    
    def generate_answer(self, query: str, context: str, source: str) -> str:
        """
        Generate an answer based on retrieved context.
        """
        chain = LLMChain(llm=self.answer_llm, prompt=self.answer_prompt)
        
        with get_openai_callback() as cb:
            answer = chain.run(
                query=query,
                context=context,
                source=source
            )
            
            # Store cost info for monitoring
            self.last_answer_cost = cb.total_cost
            self.last_answer_tokens = cb.total_tokens
        
        return answer.strip()
    
    def process_query(self, query: str, routing_strategy: str = "adaptive",
                     verbose: bool = True) -> Dict[str, Any]:
        """
        Complete pipeline: route, search, and answer a query.
        
        Args:
            query: The user's question
            routing_strategy: Which routing strategy to use
            verbose: Whether to print progress information
        """
        if verbose:
            print(f"\n🔍 Processing query: {query}")
            print("="*50)
        
        # Step 1: Route the query
        start_time = time.time()
        routing_result = self.router.route(query, strategy=routing_strategy)
        routing_time = time.time() - start_time
        
        route = routing_result['route']
        
        if verbose:
            print(f"📍 Routed to: {route} knowledge base")
            print(f"   Confidence: {routing_result['confidence']}")
            print(f"   Strategy: {routing_result['strategy']}")
            print(f"   Routing time: {routing_time:.2f}s")
        
        # Step 2: Search the appropriate collection
        start_time = time.time()
        search_results = self.search_collection(route, query)
        search_time = time.time() - start_time
        
        if verbose:
            print(f"\n📚 Found {len(search_results)} relevant documents")
            print(f"   Search time: {search_time:.2f}s")
        
        # Step 3: Generate answer
        if search_results:
            # Combine context from search results
            context = "\n\n".join([doc['content'] for doc in search_results])
            
            start_time = time.time()
            answer = self.generate_answer(query, context, route)
            answer_time = time.time() - start_time
            
            if verbose:
                print(f"\n💡 Answer generated")
                print(f"   Generation time: {answer_time:.2f}s")
                print(f"   Tokens used: {self.last_answer_tokens}")
                print(f"   Cost: ${self.last_answer_cost:.4f}")
        else:
            answer = "I couldn't find relevant information to answer your question. Please try rephrasing or contact support."
            answer_time = 0
        
        total_time = routing_time + search_time + answer_time
        
        if verbose:
            print(f"\n⏱️  Total processing time: {total_time:.2f}s")
            print("="*50)
        
        return {
            "query": query,
            "answer": answer,
            "route": route,
            "routing_confidence": routing_result['confidence'],
            "routing_strategy": routing_result['strategy'],
            "documents_found": len(search_results),
            "source_documents": search_results,
            "timing": {
                "routing": routing_time,
                "search": search_time,
                "generation": answer_time,
                "total": total_time
            },
            "cost": {
                "tokens": getattr(self, 'last_answer_tokens', 0),
                "amount": getattr(self, 'last_answer_cost', 0)
            }
        }

# Initialize the complete RAG system
collections_dict = {
    "technical": technical_collection,
    "product": product_collection,
    "policy": policy_collection,
    "support_en": support_en_collection,
    "support_es": support_es_collection
}

# Create a hybrid router with all capabilities
production_router = HybridRouter(
    use_rule_based=True,
    use_semantic=True,
    use_llm=True,
    use_language=True
)

# Initialize the RAG system
rag_system = SmartRAGSystem(
    collections=collections_dict,
    router=production_router
)

print("🚀 Smart RAG System initialized and ready!")

## Testing the Complete System

Let's test our production-ready RAG system with various queries to see how intelligent routing improves both speed and accuracy. We'll compare different routing strategies to understand their trade-offs.

In [None]:
# Test queries covering different scenarios
test_scenarios = [
    {
        "query": "How do I authenticate API requests with OAuth?",
        "expected_route": "technical",
        "description": "Clear technical question"
    },
    {
        "query": "What's the price difference between Professional and Enterprise?",
        "expected_route": "product",
        "description": "Clear product question"
    },
    {
        "query": "How many days off can I take for my wedding?",
        "expected_route": "policy",
        "description": "Clear policy question"
    },
    {
        "query": "The system is giving me errors",
        "expected_route": "technical",
        "description": "Ambiguous - could be technical or product"
    },
    {
        "query": "¿Cómo restablezco mi contraseña?",
        "expected_route": "support_es",
        "description": "Spanish language query"
    }
]

print("🧪 Testing Complete RAG System\n" + "="*70)

for scenario in test_scenarios[:3]:  # Test first 3 with detailed output
    print(f"\n📝 Test: {scenario['description']}")
    print(f"Expected route: {scenario['expected_route']}")
    
    result = rag_system.process_query(
        scenario['query'],
        routing_strategy="adaptive",
        verbose=True
    )
    
    print(f"\n📋 ANSWER:")
    print("-"*50)
    print(result['answer'])
    print("-"*50)
    
    # Check if routing was correct
    if result['route'] == scenario['expected_route']:
        print("✅ Routing correct!")
    else:
        print(f"⚠️  Routed to {result['route']} instead of {scenario['expected_route']}")
    
    print("\n" + "="*70)

## Performance Comparison: Routing Strategies

Let's compare the performance of different routing strategies to understand when to use each one. This will help you make informed decisions about which strategy to use in production.

In [None]:
def compare_routing_strategies(queries: List[str], strategies: List[str]):
    """
    Compare different routing strategies on the same queries.
    """
    results = {strategy: [] for strategy in strategies}
    
    for query in queries:
        print(f"\nQuery: {query[:60]}..." if len(query) > 60 else f"\nQuery: {query}")
        print("-"*40)
        
        for strategy in strategies:
            start_time = time.time()
            
            # Just route, don't search or generate answers for this comparison
            routing_result = production_router.route(query, strategy=strategy)
            
            elapsed = time.time() - start_time
            
            results[strategy].append({
                "query": query,
                "route": routing_result['route'],
                "confidence": routing_result['confidence'],
                "time": elapsed
            })
            
            print(f"{strategy:12} → {routing_result['route']:10} "
                  f"({routing_result['confidence']:6}) - {elapsed:.3f}s")
    
    # Calculate averages
    print("\n" + "="*60)
    print("SUMMARY STATISTICS")
    print("="*60)
    
    for strategy in strategies:
        times = [r['time'] for r in results[strategy]]
        avg_time = sum(times) / len(times)
        
        # Count confidence levels
        confidence_counts = {"high": 0, "medium": 0, "low": 0}
        for r in results[strategy]:
            confidence_counts[r['confidence']] += 1
        
        print(f"\n{strategy.upper()}:")
        print(f"  Average time: {avg_time:.3f}s")
        print(f"  Confidence distribution:")
        print(f"    High:   {confidence_counts['high']}/{len(queries)}")
        print(f"    Medium: {confidence_counts['medium']}/{len(queries)}")
        print(f"    Low:    {confidence_counts['low']}/{len(queries)}")

# Run the comparison
comparison_queries = [
    "API rate limits",
    "How much for enterprise?",
    "Vacation policy details",
    "System performance issues",
    "I need help integrating your API and understanding the pricing"
]

print("📊 Comparing Routing Strategies\n" + "="*60)
compare_routing_strategies(comparison_queries, ["cascade", "ensemble", "adaptive"])

## Best Practices and Production Considerations

Now that you've seen all the routing techniques in action, let's discuss how to implement them effectively in production. These guidelines come from real-world experience building RAG systems at scale.

In [None]:
class ProductionBestPractices:
    """
    A collection of best practices for production query routing.
    These are patterns you should follow in real applications.
    """
    
    @staticmethod
    def add_caching():
        """
        Implement caching to avoid redundant routing decisions.
        """
        from functools import lru_cache
        import hashlib
        
        class CachedRouter:
            def __init__(self, base_router, cache_size: int = 1000):
                self.base_router = base_router
                self.cache_size = cache_size
                self.cache = {}
            
            def _hash_query(self, query: str) -> str:
                """Create a hash of the query for caching"""
                return hashlib.md5(query.encode()).hexdigest()
            
            def route(self, query: str) -> Dict[str, Any]:
                # Check cache first
                query_hash = self._hash_query(query)
                
                if query_hash in self.cache:
                    cached_result = self.cache[query_hash].copy()
                    cached_result['from_cache'] = True
                    return cached_result
                
                # Route and cache the result
                result = self.base_router.route(query)
                
                # Implement simple LRU by removing oldest if cache is full
                if len(self.cache) >= self.cache_size:
                    # Remove the oldest entry (first key)
                    oldest_key = next(iter(self.cache))
                    del self.cache[oldest_key]
                
                self.cache[query_hash] = result
                result['from_cache'] = False
                
                return result
        
        return CachedRouter
    
    @staticmethod
    def add_monitoring():
        """
        Add monitoring and logging for production debugging.
        """
        import logging
        from datetime import datetime
        
        class MonitoredRouter:
            def __init__(self, base_router):
                self.base_router = base_router
                self.metrics = {
                    "total_queries": 0,
                    "routes": {},
                    "avg_confidence": [],
                    "errors": 0
                }
                
                # Set up logging
                logging.basicConfig(level=logging.INFO)
                self.logger = logging.getLogger(__name__)
            
            def route(self, query: str) -> Dict[str, Any]:
                start_time = time.time()
                
                try:
                    result = self.base_router.route(query)
                    
                    # Update metrics
                    self.metrics['total_queries'] += 1
                    route = result.get('route', 'unknown')
                    self.metrics['routes'][route] = self.metrics['routes'].get(route, 0) + 1
                    
                    # Log the routing decision
                    self.logger.info(
                        f"Query routed | Route: {route} | "
                        f"Confidence: {result.get('confidence', 'N/A')} | "
                        f"Time: {time.time() - start_time:.3f}s | "
                        f"Query: {query[:50]}..."
                    )
                    
                    return result
                    
                except Exception as e:
                    self.metrics['errors'] += 1
                    self.logger.error(f"Routing error: {e} | Query: {query}")
                    
                    # Fallback routing
                    return {
                        "route": "product",
                        "confidence": "low",
                        "error": str(e)
                    }
            
            def get_metrics(self) -> Dict[str, Any]:
                """Get current metrics"""
                return self.metrics
        
        return MonitoredRouter
    
    @staticmethod
    def add_fallback_handling():
        """
        Implement robust fallback mechanisms for production reliability.
        """
        class FallbackRouter:
            def __init__(self, primary_router, fallback_router):
                self.primary_router = primary_router
                self.fallback_router = fallback_router
            
            def route(self, query: str) -> Dict[str, Any]:
                try:
                    # Try primary router first
                    result = self.primary_router.route(query)
                    
                    # If confidence is too low, try fallback
                    if result.get('confidence') == 'low':
                        fallback_result = self.fallback_router.route(query)
                        if fallback_result.get('confidence', 'low') != 'low':
                            fallback_result['used_fallback'] = True
                            return fallback_result
                    
                    return result
                    
                except Exception as e:
                    # Primary failed, use fallback
                    try:
                        result = self.fallback_router.route(query)
                        result['used_fallback'] = True
                        result['primary_error'] = str(e)
                        return result
                    except:
                        # Both failed, return safe default
                        return {
                            "route": "product",
                            "confidence": "low",
                            "error": "All routers failed"
                        }
        
        return FallbackRouter

# Demonstrate best practices
print("📚 Production Best Practices Examples\n" + "="*50)

# Example 1: Caching
print("\n1. CACHING EXAMPLE:")
print("-"*30)

CachedRouter = ProductionBestPractices.add_caching()
cached_router = CachedRouter(rule_router, cache_size=100)

# First call - not cached
result1 = cached_router.route("How do I use the API?")
print(f"First call - From cache: {result1.get('from_cache', False)}")

# Second call - should be cached
result2 = cached_router.route("How do I use the API?")
print(f"Second call - From cache: {result2.get('from_cache', False)}")

# Example 2: Monitoring
print("\n2. MONITORING EXAMPLE:")
print("-"*30)

MonitoredRouter = ProductionBestPractices.add_monitoring()
monitored_router = MonitoredRouter(rule_router)

# Route some queries
test_queries = [
    "API documentation",
    "Pricing information",
    "Vacation policy"
]

for q in test_queries:
    monitored_router.route(q)

# Get metrics
metrics = monitored_router.get_metrics()
print(f"Total queries: {metrics['total_queries']}")
print(f"Route distribution: {metrics['routes']}")
print(f"Errors: {metrics['errors']}")

# Example 3: Fallback handling
print("\n3. FALLBACK EXAMPLE:")
print("-"*30)

FallbackRouter = ProductionBestPractices.add_fallback_handling()
# Use semantic router as primary, rule-based as fallback
robust_router = FallbackRouter(semantic_router, rule_router)

result = robust_router.route("Something about our service")
print(f"Route: {result['route']}")
print(f"Used fallback: {result.get('used_fallback', False)}")

## Key Takeaways and Recommendations

After building all these routing systems, here's what you should remember for your own implementations:

### When to Use Each Router Type

**Rule-Based Routing** is perfect when:
- You have clear, domain-specific terminology
- Speed is critical (sub-100ms responses needed)
- You want predictable, explainable behavior
- Budget is tight (no API costs)

**Semantic Routing** shines when:
- Queries use varied language to mean the same thing
- You have good example queries for each category
- You need fast responses but better accuracy than rules
- You want to handle synonyms and related concepts

**LLM-Based Routing** (Completion or Function Calling) is best when:
- Queries are complex or ambiguous
- You need reasoning about intent
- Accuracy is more important than speed
- You can afford the API costs

**Zero-Shot Classification** works great when:
- You're adding new categories frequently
- You don't have training examples
- Categories can be described in natural language
- You need quick prototyping

**Language Routing** is essential when:
- You serve a global audience
- Content exists in multiple languages
- You need to route to regional support teams

### Production Architecture Recommendations

1. **Start Simple**: Begin with rule-based or semantic routing. They're fast, cheap, and often sufficient.

2. **Use Cascading**: Try fast methods first, escalate to expensive ones only when needed.

3. **Cache Aggressively**: Many users ask similar questions. Cache routing decisions for common queries.

4. **Monitor Everything**: Track which routes are used, confidence levels, and response times.

5. **Plan for Failure**: Always have fallback routes. Default to your most general knowledge base.

6. **Test with Real Data**: Your users will surprise you. Collect real queries and continuously improve.

### Cost Optimization Tips

- Semantic routing with cached embeddings costs almost nothing per query
- Rule-based routing is completely free after development
- LLM routing costs $0.001-0.01 per query depending on model
- Batch embed your example queries for semantic routing
- Use smaller models (GPT-3.5) for routing, save GPT-4 for answer generation

### Final Thoughts

Query routing is what transforms a basic RAG system into a production-ready application. The right routing strategy can improve response accuracy by 40-60% while reducing costs by 30-50%. Start with the approach that matches your current needs, but build in flexibility to evolve as you learn from your users.

Remember: the best router is the one that works reliably for YOUR specific use case. Test thoroughly, monitor continuously, and iterate based on real user feedback.

Happy routing! 🚀"