# Production RAG QA System - Comprehensive Guide

**Complete walkthrough of the production-ready RAG QA system for Well Completion Reports**

## Overview

This notebook provides a comprehensive demonstration of the production RAG (Retrieval-Augmented Generation) QA system designed for querying well completion reports.

### What is this RAG QA System?

A RAG system combines document retrieval with language model generation to answer questions based on a knowledge base. Our system:

1. **Retrieves** relevant document sections from well reports using semantic search
2. **Augments** the LLM prompt with retrieved context
3. **Generates** accurate, source-cited answers using Ollama Llama 3.2 3B

### Architecture

```
Question → Embedding → ChromaDB Retrieval → Context + Question → Ollama LLM → Answer + Sources
```

**Components:**
- **Embeddings:** nomic-embed-text-v1.5 (137M params)
- **Vector Store:** ChromaDB (local, no Docker)
- **LLM:** Ollama Llama 3.2 3B (temperature=0.1 for factual answers)
- **Documents:** 5,258 chunks from 8 wells

### Key Features

- **TOC-Aware Indexing:** 93.1% section type coverage for intelligent filtering
- **Section Filtering:** Query specific sections (casing, geology, drilling, etc.)
- **Source Citation:** Every answer includes source metadata
- **Multi-Well Support:** Query across all 8 wells or filter by specific well
- **Fast Queries:** <10s average latency
- **No Docker Required:** Uses local ChromaDB persistence

### Prerequisites

1. Ollama installed with `llama3.2:3b` model
2. Pre-indexed database at `../chroma_db_toc_aware/`
3. Python packages: `langchain-chroma`, `langchain-huggingface`, `langchain-ollama`

**Expected Runtime:** <10 minutes

In [None]:
# Setup and Imports
import sys
import time
from pathlib import Path
from typing import Dict, Any, List

# Add src to path
project_root = Path('.').absolute().parent
sys.path.insert(0, str(project_root / 'src'))

# Import production RAG QA system
from rag_qa_system import WellReportQASystem, QAResult

print("Imports successful!")
print(f"Project root: {project_root}")

## 1. Quick Start

Initialize the RAG QA system and run a simple query.

In [None]:
# Quick Start: Initialize and query in 3 lines
qa_system = WellReportQASystem(verbose=True)

result = qa_system.query("What is the total depth of Well 5?")

print(f"\nANSWER: {result.answer}")

## 2. Database Statistics

Explore the pre-indexed database contents and metadata coverage.

In [None]:
# Get comprehensive database statistics
stats = qa_system.get_statistics()

print("=" * 80)
print("DATABASE STATISTICS")
print("=" * 80)

print(f"\nTotal documents: {stats['total_documents']:,}")
print(f"Number of wells: {stats['num_wells']}")
print(f"\nWells available:")
for i, well in enumerate(stats['wells'], 1):
    print(f"  {i}. {well}")

print(f"\nSource type distribution:")
for source_type, count in sorted(stats['source_types'].items(), key=lambda x: x[1], reverse=True):
    pct = (count / stats['total_documents'] * 100)
    print(f"  {source_type:20s}: {count:6,} ({pct:5.1f}%)")

print(f"\nSection type distribution (top 15):")
section_items = sorted(stats['section_types'].items(), key=lambda x: x[1], reverse=True)
for section_type, count in section_items[:15]:
    pct = (count / stats['total_documents'] * 100)
    section_label = section_type if section_type else "(none)"
    print(f"  {section_label:20s}: {count:6,} ({pct:5.1f}%)")

# Calculate section type coverage
text_chunks = sum(count for st, count in stats['source_types'].items() if st == 'text_chunk')
chunks_with_section = sum(count for st, count in stats['section_types'].items() if st)
coverage_pct = (chunks_with_section / text_chunks * 100) if text_chunks else 0

print(f"\nTOC-Aware Metadata Coverage:")
print(f"  Text chunks with section_type: {chunks_with_section:,} / {text_chunks:,} ({coverage_pct:.1f}%)")

print("\n" + "=" * 80)

## 3. Query Examples

Demonstrate various query patterns and filtering options.

In [None]:
def display_result(result: QAResult, max_sources: int = 3):
    """
    Display query result with formatted output
    
    Args:
        result: QAResult object
        max_sources: Maximum number of sources to display
    """
    print(f"\n{'='*80}")
    print(f"QUESTION: {result.question}")
    if result.metadata.get('filter'):
        print(f"FILTER: {result.metadata['filter']}")
    print(f"{'='*80}\n")
    
    print(f"ANSWER:")
    print(result.answer)
    
    print(f"\n{'-'*80}")
    print(f"SOURCES ({result.metadata['num_sources']} documents retrieved):")
    print(f"{'-'*80}")
    
    for i, source in enumerate(result.sources[:max_sources], 1):
        print(f"\n[Source {i}]")
        print(f"  Well:         {source['well_name']}")
        print(f"  Source Type:  {source['source_type']}")
        print(f"  Section:      {source['section_title']}")
        print(f"  Section Type: {source['section_type']}")
        print(f"  Page:         {source['page']}")
        print(f"  PDF:          {source['pdf_file']}")
        print(f"  Content:      {source['content'][:200]}...")
    
    if len(result.sources) > max_sources:
        print(f"\n  ... and {len(result.sources) - max_sources} more sources")
    
    print(f"\n{'='*80}\n")

In [None]:
# Example 1: Basic query (no filters)
result1 = qa_system.query("What are the main geological formations encountered?")
display_result(result1)

In [None]:
# Example 2: Well-specific query
result2 = qa_system.query(
    "What is the casing program?",
    filter_metadata={"well_name": "well_5"}
)
display_result(result2)

In [None]:
# Example 3: Section-filtered query
result3 = qa_system.query_with_section_filter(
    question="Describe the drilling operations",
    section_type="drilling"
)
display_result(result3)

In [None]:
# Example 4: Combined filters (well + section)
result4 = qa_system.query_with_section_filter(
    question="What are the casing specifications?",
    section_type="casing",
    well_name="well_7"
)
display_result(result4)

## 4. Source Citation Deep Dive

Examine how to access and use source metadata for citation and verification.

In [None]:
# Access source documents for citation
result = qa_system.query("What is the reservoir pressure?")

print("=" * 80)
print("SOURCE METADATA STRUCTURE")
print("=" * 80)

if result.sources:
    print(f"\nExample source metadata fields:")
    source = result.sources[0]
    for key, value in source.items():
        if key != 'content':
            print(f"  {key:15s}: {value}")
    
    print(f"\nFull content length: {len(source['content'])} characters")
    print(f"Content preview: {source['content'][:300]}...")

print(f"\nAll source fields available:")
print(f"  - content: Full text of retrieved chunk")
print(f"  - well_name: Which well the document is from")
print(f"  - source_type: text_chunk, table, or picture")
print(f"  - section_title: TOC section title")
print(f"  - section_type: Categorized section type (casing, geology, etc.)")
print(f"  - page: Page number in PDF")
print(f"  - pdf_file: Source PDF filename")

print("\n" + "=" * 80)

## 5. Performance Testing

Measure query latency and compare performance with/without filters.

In [None]:
# Performance benchmark: Various query types
test_queries = [
    ("What is the well depth?", None),
    ("Describe the drilling program", None),
    ("What is the reservoir geology?", {"section_type": "geology"}),
    ("Describe the casing program", {"well_name": "well_5"}),
    ("What are the completion details?", {"$and": [{"well_name": "well_7"}, {"section_type": "completion"}]}),
]

print("=" * 80)
print("PERFORMANCE BENCHMARK")
print("=" * 80)

results = []
for query, filter_meta in test_queries:
    start = time.time()
    result = qa_system.query(query, filter_metadata=filter_meta)
    latency = time.time() - start
    
    filter_desc = str(filter_meta) if filter_meta else "None"
    results.append((query, filter_desc, latency, result.metadata['num_sources']))
    
    print(f"\nQuery: {query}")
    print(f"  Filter: {filter_desc}")
    print(f"  Latency: {latency:.2f}s")
    print(f"  Sources: {result.metadata['num_sources']}")

latencies = [r[2] for r in results]

print(f"\n{'='*80}")
print(f"PERFORMANCE SUMMARY")
print(f"{'='*80}")
print(f"  Total queries:    {len(results)}")
print(f"  Average latency:  {sum(latencies)/len(latencies):.2f}s")
print(f"  Min latency:      {min(latencies):.2f}s")
print(f"  Max latency:      {max(latencies):.2f}s")
print(f"  Median latency:   {sorted(latencies)[len(latencies)//2]:.2f}s")
print(f"{'='*80}")

## 6. Use Cases

Practical examples of how to use the RAG QA system for real-world tasks.

In [None]:
# Use Case 1: Technical Question Answering
print("="*80)
print("USE CASE 1: Technical Question Answering")
print("="*80)
print("\nScenario: Engineer needs casing diameter information for Well 5")
print()

result = qa_system.query(
    "What are the inner diameters of the casing strings in Well 5?",
    filter_metadata={"well_name": "well_5"}
)

print(f"Answer: {result.answer}")
print(f"\nSources used: {result.metadata['num_sources']} documents")
print(f"Primary source: {result.sources[0]['pdf_file']} (page {result.sources[0]['page']})")

In [None]:
# Use Case 2: Cross-Well Comparison
print("="*80)
print("USE CASE 2: Cross-Well Comparison")
print("="*80)
print("\nScenario: Compare geological formations across multiple wells")
print()

wells = ["well_5", "well_7"]
comparison = {}

for well in wells:
    result = qa_system.query(
        "What are the main geological formations encountered?",
        filter_metadata={"well_name": well}
    )
    comparison[well] = result.answer

for well, answer in comparison.items():
    print(f"{well.upper()}:")
    print(f"{answer}")
    print()

In [None]:
# Use Case 3: Section-Specific Information Retrieval
print("="*80)
print("USE CASE 3: Section-Specific Information Retrieval")
print("="*80)
print("\nScenario: Extract all drilling-related information for safety analysis")
print()

drilling_queries = [
    "What drilling fluid was used?",
    "Were there any drilling problems?",
    "What was the drilling rate?"
]

for query in drilling_queries:
    result = qa_system.query(
        query,
        filter_metadata={"section_type": "drilling"}
    )
    print(f"Q: {query}")
    print(f"A: {result.answer}")
    print()

## 7. Advanced Features

Explore additional capabilities and customization options.

In [None]:
# Advanced Feature 1: List all available wells
wells = qa_system.list_available_wells()
print("Available wells:")
for i, well in enumerate(wells, 1):
    print(f"  {i}. {well}")

# Advanced Feature 2: Custom temperature for creative vs factual responses
print("\nNote: Current system uses temperature=0.1 for factual answers")
print("For more creative responses, you can initialize with higher temperature:")
print("  qa_system_creative = WellReportQASystem(temperature=0.7)")

# Advanced Feature 3: Adjust retrieval count (top_k)
print("\nNote: Current system retrieves top_k=5 documents")
print("For more comprehensive context, initialize with higher top_k:")
print("  qa_system_detailed = WellReportQASystem(top_k=10)")

## Summary

### Production RAG QA System Capabilities

**Features Demonstrated:**
- **Quick Start:** Simple 3-line initialization and query
- **Database Statistics:** Comprehensive view of indexed content
- **Query Patterns:** Basic, well-specific, section-filtered, combined filters
- **Source Citation:** Full metadata for every retrieved document
- **Performance:** <10s average query latency
- **Use Cases:** Technical Q&A, cross-well comparison, section-specific retrieval

### Key Benefits

1. **Accurate Answers:** LLM grounded in actual well completion reports
2. **Source Transparency:** Every answer includes source citations
3. **Intelligent Filtering:** TOC-aware metadata enables precise queries
4. **Multi-Well Support:** Query across all wells or filter by specific well
5. **Fast Performance:** Optimized retrieval and generation pipeline
6. **Easy Deployment:** No Docker, local ChromaDB, pure Python

### Production Statistics

- **Documents:** 5,258 chunks (text, tables, pictures)
- **Wells:** 8 wells indexed
- **TOC Coverage:** 93.1% section type metadata
- **Query Latency:** <10s average
- **Model Size:** Llama 3.2 3B (runs on CPU)
- **Embedding Size:** 137M params (nomic-embed-text-v1.5)

### Next Steps

1. **Sub-Challenge 1 Submission:** Use this system for RAG-based summarization (50% of grade)
2. **Parameter Extraction:** Integrate with Sub-Challenge 2 (20% of grade)
3. **Agentic Workflow:** Combine with Sub-Challenge 3 (30% of grade)

### Related Notebooks

- `04_interactive_rag_demo.ipynb` - Interactive demo with editable queries
- `06_sub_challenge_1_guide.ipynb` - Grading criteria and evaluation
- `demos/08_toc_extraction_demo.ipynb` - TOC extraction details
- `demos/10_build_toc_database.ipynb` - Database construction process

### Architecture Reference

**Code:** `src/rag_qa_system.py` (394 lines)

**Key Classes:**
- `WellReportQASystem`: Main RAG QA system
- `QAResult`: Query result with answer and sources

**Dependencies:**
- LangChain 1.0+ (langchain-chroma, langchain-huggingface, langchain-ollama)
- ChromaDB (vector store)
- Ollama (LLM inference)

### Sub-Challenge 1 Grading Criteria

This system addresses all Sub-Challenge 1 requirements:

- **Answer Quality (40%):** Factual, grounded in sources
- **Source Citation (30%):** Full metadata for every answer
- **Response Time (20%):** <10s average
- **System Robustness (10%):** Production-ready, error handling

---

**System Status:** Production Ready ✓

**Last Updated:** 2025-11-14