# Interactive RAG System Demo

**Production RAG QA System for Well Completion Reports**

This notebook demonstrates the production-ready RAG QA system with:
- Pre-indexed ChromaDB with 5,258 documents
- TOC-aware metadata (93.1% section type coverage)
- Ollama Llama 3.2 3B integration
- Section-filtered queries
- Source citation

**Prerequisites:**
- Ollama installed with `llama3.2:3b` model
- Pre-indexed database at `../chroma_db_toc_aware/`

**Runtime:** <5 minutes (no indexing required)

In [None]:
# Setup
import sys
from pathlib import Path

# Add src to path
project_root = Path('.').absolute().parent
sys.path.insert(0, str(project_root / 'src'))

# Import production RAG QA system
from rag_qa_system import WellReportQASystem, QAResult

print("Imports successful!")

## 1. Initialize RAG System

Load the production RAG QA system with pre-indexed ChromaDB.

In [None]:
# Initialize RAG system with pre-indexed database
qa_system = WellReportQASystem(
    chroma_dir="../chroma_db_toc_aware",
    collection_name="well_reports_toc_aware",
    llm_model="llama3.2:3b",
    temperature=0.1,
    top_k=5,
    verbose=True
)

print("\n" + "=" * 80)
print("RAG QA System initialized successfully!")
print("=" * 80)

## 2. Database Statistics

View the pre-indexed database contents.

In [None]:
# Get database statistics
stats = qa_system.get_statistics()

print("=" * 80)
print("DATABASE STATISTICS")
print("=" * 80)
print(f"\nTotal documents: {stats['total_documents']}")
print(f"Number of wells: {stats['num_wells']}")
print(f"Wells: {', '.join(stats['wells'])}")

print(f"\nSource type distribution:")
for source_type, count in sorted(stats['source_types'].items()):
    print(f"  {source_type}: {count}")

print(f"\nSection type distribution (top 10):")
section_items = sorted(stats['section_types'].items(), key=lambda x: x[1], reverse=True)
for section_type, count in section_items[:10]:
    print(f"  {section_type}: {count}")

print("\n" + "=" * 80)

## 3. Test Standard Queries

Run example queries to test the RAG system.

In [None]:
def query_and_display(question, filter_metadata=None):
    """
    Query the RAG system and display results
    
    Args:
        question: Question to ask
        filter_metadata: Optional metadata filters
    """
    print(f"\n{'='*80}")
    print(f"QUESTION: {question}")
    if filter_metadata:
        print(f"FILTER: {filter_metadata}")
    print(f"{'='*80}\n")
    
    result = qa_system.query(question, filter_metadata=filter_metadata)
    
    print(f"\nANSWER:")
    print(result.answer)
    
    print(f"\n{'='*80}")
    print(f"SOURCES ({result.metadata['num_sources']} documents):")
    print(f"{'='*80}")
    
    for i, source in enumerate(result.sources[:3], 1):
        print(f"\nSource {i}:")
        print(f"  Well: {source['well_name']}")
        print(f"  Section: {source['section_title']} ({source['section_type']})")
        print(f"  Page: {source['page']}")
        print(f"  Content: {source['content'][:150]}...")
    
    print(f"\n{'='*80}\n")
    return result

In [None]:
# Test Query 1: Well depth
result1 = query_and_display("What is the total depth of Well 5?")

In [None]:
# Test Query 2: Casing program
result2 = query_and_display("Describe the casing program for Well 5")

In [None]:
# Test Query 3: Geological formation
result3 = query_and_display("What is the geological formation in Well 7?")

## 4. Section-Filtered Queries

Use TOC-aware metadata to filter queries by section type.

In [None]:
# Query with section type filter: casing
result_casing = query_and_display(
    "Describe the casing program",
    filter_metadata={"section_type": "casing"}
)

In [None]:
# Query with combined filters: well + section type
result_combined = query_and_display(
    "What are the casing specifications?",
    filter_metadata={
        "$and": [
            {"well_name": "well_5"},
            {"section_type": "casing"}
        ]
    }
)

## 5. Interactive Query Interface

Try your own queries!

In [None]:
# EDIT THIS - Try your own query!
custom_question = "What is the reservoir pressure?"
custom_filter = {"well_name": "well_5"}  # Optional filter

result_custom = query_and_display(custom_question, filter_metadata=custom_filter)

## 6. Performance Benchmarking

Measure query latency.

In [None]:
import time

# Benchmark queries
test_queries = [
    "What is the well depth?",
    "Describe the drilling program",
    "What is the reservoir geology?"
]

print("=" * 80)
print("PERFORMANCE BENCHMARK")
print("=" * 80)

latencies = []
for query in test_queries:
    start = time.time()
    result = qa_system.query(query)
    latency = time.time() - start
    latencies.append(latency)
    print(f"\nQuery: {query}")
    print(f"Latency: {latency:.2f}s")

print(f"\n{'='*80}")
print(f"Average latency: {sum(latencies)/len(latencies):.2f}s")
print(f"Min latency: {min(latencies):.2f}s")
print(f"Max latency: {max(latencies):.2f}s")
print(f"{'='*80}")

## 7. Available Wells

List all wells in the database.

In [None]:
# List available wells
wells = qa_system.list_available_wells()

print("=" * 80)
print(f"AVAILABLE WELLS ({len(wells)} total)")
print("=" * 80)
for i, well in enumerate(wells, 1):
    print(f"{i}. {well}")
print("=" * 80)

## Summary

**Production RAG QA System Features:**

- **Pre-indexed Database:** 5,258 documents from 8 wells
- **TOC-Aware Metadata:** 93.1% section type coverage
- **Section Filtering:** Query specific sections (casing, geology, etc.)
- **Source Citation:** Full metadata for each retrieved document
- **Fast Queries:** <10s average latency
- **No Docker:** Local ChromaDB, no containers needed

**Next Steps:**

1. Try more complex queries
2. Experiment with section type filtering
3. Compare results across different wells
4. Explore parameter extraction (Sub-Challenge 2)

**Related Notebooks:**

- `07_production_rag_qa_demo.ipynb` - Comprehensive guide
- `06_sub_challenge_1_guide.ipynb` - Grading criteria
- `demos/08_toc_extraction_demo.ipynb` - TOC extraction details