# Hybrid Retrieval Pipeline Demo

This notebook demonstrates the complete RAG (Retrieval-Augmented Generation) pipeline that combines:
1. Document embedding using SentenceTransformers
2. Vector search using FAISS
3. Re-ranking using TF-IDF + cosine similarity

The pipeline is designed to improve factual accuracy and reduce hallucinations in LLM responses by providing high-quality context retrieval.

In [1]:
import sys
import os
import json
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm

# Add the src directory to the path
sys.path.append(os.path.abspath('..'))

# Import our pipeline components
from src.embedder import Embedder
from src.vector_store import VectorStore
from src.reranker import Reranker
from src.rag_pipeline import RAGPipeline

## 1. Initialize the Pipeline

First, we'll create our RAG pipeline with the default components.

In [2]:
# Initialize the pipeline
pipeline = RAGPipeline(embedding_model="all-MiniLM-L6-v2")
print(f"Pipeline initialized with embedding model: {pipeline.embedder.model_name}")

## 2. Load and Index Documents

Now we'll load our sample documents and index them in the vector store.

In [3]:
# Path to sample documents
docs_path = Path("../data/sample_docs")

# Load documents
documents = pipeline.load_documents(docs_path)
print(f"Loaded {len(documents)} documents")

# Display document IDs and titles
for doc in documents:
    # Extract title from first line
    title = doc['content'].split('\n')[0].strip('# ')
    print(f"Document ID: {doc['id']}, Title: {title}")

In [4]:
# Index the documents
pipeline.index_documents()
print(f"Documents indexed in vector store")

## 3. Query the Pipeline

Let's test our pipeline with some sample queries.

In [5]:
# Define some test queries
test_queries = [
    "What is deep learning?",
    "What are the applications of NLP?",
    "How does machine learning work?",
    "What are the challenges in natural language processing?"
]

# Run each query and collect results
results = []
for query in test_queries:
    print(f"\nProcessing query: '{query}'")
    result = pipeline.query(query, top_k=2, rerank=True)
    results.append(result)
    
    # Print top result
    if result['results']:
        top_doc = result['results'][0]
        print(f"Top result: {top_doc['id']} (Score: {top_doc['score']:.3f})")
        print(f"Timing: {result['timing']['total_ms']:.2f}ms total")
    else:
        print("No results found")

## 4. Analyze Results

Let's analyze the performance of our pipeline.

In [6]:
# Compare with and without reranking
query = "What are the applications of natural language processing?"

# Without reranking
result_no_rerank = pipeline.query(query, top_k=2, rerank=False)

# With reranking
result_with_rerank = pipeline.query(query, top_k=2, rerank=True)

# Display comparison
print("\n=== Without Reranking ===")
for i, doc in enumerate(result_no_rerank['results']):
    print(f"{i+1}. {doc['id']} (Score: {doc['score']:.3f})")

print("\n=== With Reranking ===")
for i, doc in enumerate(result_with_rerank['results']):
    print(f"{i+1}. {doc['id']} (Score: {doc['score']:.3f}, TF-IDF: {doc['tfidf_score']:.3f})")

## 5. Visualize Performance

Let's visualize the timing breakdown of our pipeline.

In [7]:
# Extract timing data
timing_data = pd.DataFrame([
    {
        'query': r['query'],
        'embedding_ms': r['timing']['query_embedding_ms'],
        'vector_search_ms': r['timing']['vector_search_ms'],
        'rerank_ms': r['timing']['rerank_ms'],
        'total_ms': r['timing']['total_ms']
    } for r in results
])

# Plot timing breakdown
plt.figure(figsize=(10, 6))
timing_data.set_index('query')[["embedding_ms", "vector_search_ms", "rerank_ms"]].plot(kind="bar", stacked=True)
plt.title("Query Processing Time Breakdown")
plt.ylabel("Time (ms)")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()

## 6. Save Results to JSON

Let's save one of our query results to a JSON file.

In [8]:
# Select a query result to save
nlp_query_result = next((r for r in results if "NLP" in r['query']), results[0])

# Save to JSON
output_path = Path("../outputs/example_output.json")
with open(output_path, 'w') as f:
    json.dump(nlp_query_result, f, indent=2)
    
print(f"Saved query result to {output_path}")

## 7. Save the Pipeline

Finally, let's save our pipeline for future use.

In [9]:
# Save the pipeline
pipeline_dir = Path("../outputs/saved_pipeline")
pipeline.save(pipeline_dir)
print(f"Pipeline saved to {pipeline_dir}")

# Test loading the pipeline
loaded_pipeline = RAGPipeline.load(pipeline_dir)
print(f"Pipeline loaded successfully with {loaded_pipeline.vector_store.index.ntotal} indexed documents")

## Conclusion

In this notebook, we've demonstrated the complete hybrid retrieval pipeline that combines:

1. **Embedding**: Converting documents and queries to vector representations
2. **Vector Search**: Finding semantically similar documents using FAISS
3. **Reranking**: Improving results with TF-IDF based reranking

This pipeline can be integrated with an LLM to create a full RAG system that reduces hallucinations and improves factual accuracy by providing relevant context for generation.