# Technique 1: Multiple Query Retrieval

## The Problem

Users often phrase queries in ways that don't perfectly match document content:
- Too specific: "How much does it cost to register a construction company?"
- Too vague: "business setup"
- Wrong terminology: "business license" vs "business registration"

**Result:** Miss relevant documents due to query formulation!

## The Solution

Generate **multiple variations** of the user's query, search with all of them, and combine results.

**How it works:**
1. LLM generates 3-5 query variations
2. Each variation retrieves documents
3. Deduplicate and rank combined results

**Difficulty:** ⭐⭐☆☆☆

## Prerequisites
- Completed Technique 0
- Understanding of query formulation impact

## Step 1: Imports

In [None]:
from utils_openai import (
    setup_openai_api, create_embeddings, create_llm,
    load_msme_data, create_vectorstore, get_baseline_prompt
)
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.load import dumps, loads

print("[OK] Imports successful!")

## Step 2: Setup

In [None]:
api_key = setup_openai_api()
embeddings = create_embeddings(api_key)
llm = create_llm(api_key)
documents, metadatas, ids = load_msme_data("msme.csv")

vectorstore = create_vectorstore(
    documents, metadatas, ids, embeddings,
    collection_name="msme_technique2",
    persist_directory="./chroma_db_technique2"
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
print("[OK] Setup complete!")

## Step 3: Create Query Generation Prompt

This prompt instructs the LLM to generate alternative query formulations:

In [None]:
query_gen_template = """You are an AI assistant helping to improve search results.
Your task is to generate 4 different versions of the given user question.

These variations should:
- Rephrase using different words
- Use different levels of specificity
- Include relevant synonyms
- Maintain the original intent

Provide ONLY the questions, one per line, without numbering or explanation.

Original question: {question}

Alternative questions:"""

query_gen_prompt = ChatPromptTemplate.from_template(query_gen_template)
print("[OK] Query generation prompt ready!")

## Step 4: Build Query Generation Chain

In [None]:
# Chain that generates multiple queries
query_generator = (
    query_gen_prompt
    | llm
    | StrOutputParser()
    | (lambda x: [q.strip() for q in x.split('\n') if q.strip()])
)

print("[OK] Query generator chain created!")

## Step 5: Test Query Generation

In [None]:
test_question = "How do I register a construction business in Nigeria?"

print(f"Original: {test_question}\n")
generated_queries = query_generator.invoke({"question": test_question})

print("Generated variations:")
for i, q in enumerate(generated_queries, 1):
    print(f"{i}. {q}")

## Step 6: Create Deduplication Function

Remove duplicate documents from multiple retrievals:

In [None]:
def get_unique_docs(documents):
    """Remove duplicate documents using content hashing"""
    unique_docs = list(set(dumps(doc) for doc in documents))
    return [loads(doc) for doc in unique_docs]

print("[OK] Deduplication function ready!")

## Step 7: Build Multi-Query RAG Chain

In [None]:
# Complete chain:
# 1. Generate multiple queries
# 2. Retrieve docs for each query (map)
# 3. Deduplicate
# 4. Pass to prompt with original question

multi_query_retrieval = (
    query_generator
    | retriever.map()  # Retrieve for each generated query
    | get_unique_docs  # Remove duplicates
)

prompt = get_baseline_prompt()

multi_query_rag_chain = (
    {"context": multi_query_retrieval, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print("[OK] Multi-query RAG chain ready!")

## Step 8: Test and Compare

In [None]:
question = "What are the financing options for small businesses?"

print(f"Question: {question}\n")
print("="*80)

answer = multi_query_rag_chain.invoke(question)
print(f"\nANSWER:\n{answer}")
print("="*80)

## Results Analysis

### BEFORE (Single Query):
- Retrieves based on ONE query formulation
- Misses docs that use different terminology
- Limited to query's exact phrasing

### AFTER (Multiple Queries):
- Retrieves with 4-5 query variations
- Catches different phrasings and synonyms
- More comprehensive document coverage
- Better recall (finds more relevant docs)

### Trade-offs:
- **Pro:** Better recall, catches terminology variations
- **Con:** More LLM calls (1 for query generation + 1 for answer)
- **Pro:** More robust to poorly phrased queries
- **Con:** Slightly slower due to multiple retrievals

## When to Use This Technique

**Use when:**
- Queries are often vaguely worded
- Documents use varied terminology
- Recall is more important than speed
- Users might not know exact terms

**Avoid when:**
- Queries are already well-formulated
- Speed is critical
- Documents use consistent terminology
- Extra LLM cost is prohibitive

## Exercise

**Task:**
1. Test these vague queries:
   - "business money"
   - "company rules"
   - "get funding"
2. Compare results with baseline single-query RAG
3. Check which query variations the LLM generates
4. Modify the query generation prompt to generate more/fewer variations

**Expected Outcome:**
- Vague queries should work better with multi-query
- You should see diverse query reformulations
- Recall should improve for ambiguous queries


In [None]:
# Your exercise code here

my_vague_query = ""  # Try a vague query

# Compare:
# baseline_answer = baseline_rag_chain.invoke(my_vague_query)
# multi_query_answer = multi_query_rag_chain.invoke(my_vague_query)

**Next:** Technique 2 - Contextual Compression