# üöÄ vectorDBpipe ‚Äî Omni-RAG Demo

**v0.2.1** | [GitHub](https://github.com/vectordbpipe/vectorDBpipe) | [PyPI](https://pypi.org/project/vectordbpipe/)

This notebook demonstrates the full Omni-RAG architecture:
- ‚úÖ **Tri-Processing Ingestion** ‚Äî Vector, PageIndex, and GraphRAG simultaneously
- ‚úÖ **OmniRouter** ‚Äî Automatic engine selection per query type
- ‚úÖ **4 RAG Engines** ‚Äî Vector, Vectorless, GraphRAG, LangChain Extract
- ‚úÖ **15+ Data Sources** ‚Äî PDF, DOCX, S3, Notion, GitHub, Slack, and more

## üîß Step 1 ‚Äî Install the Package

In [None]:
# Install the latest version
!pip install vectordbpipe==0.2.1 -q
print('‚úÖ vectordbpipe installed!')

## üì¶ Step 2 ‚Äî Verify Imports

In [None]:
import warnings
warnings.filterwarnings('ignore')

import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')

from vectorDBpipe import VDBpipe
print('‚úÖ VDBpipe imported successfully!')

## üìù Step 3 ‚Äî Create Demo Data

We create a small sample text file to demonstrate ingestion. In production, point this at a real PDF, S3 bucket, or Notion page.

In [None]:
import os
os.makedirs('demo_data', exist_ok=True)

sample_text = """
# Q3 2024 Financial Report ‚Äî Acme Corporation

## Executive Summary
Acme Corporation achieved record revenue of $2.5 billion in Q3 2024,
representing a 23% year-over-year growth. The growth was primarily
driven by the acquisition of Startup X in July 2024.

## Key Executives
- CEO: John Smith, who joined in 2019
- CFO: Sarah Johnson, responsible for the Q4 acquisition strategy
- CTO: Michael Chen, leading the AI transformation initiative

## Financial Highlights
- Total Revenue: $2.5 billion (Q3 2024)
- Net Profit: $450 million
- Operating Margin: 18%
- Cash Reserves: $800 million

## Risk Factors
The primary risk factors include supply chain disruptions in Asia,
regulatory changes in the European markets, and competition from
Tech Giant Corp.

## Governance
The Board of Directors is chaired by Dr. Emily Watson. John Smith
reports directly to the board. The penalty for any breach of fiduciary
duty is $5 million as per Section 14 of the corporate charter.
"""

with open('demo_data/q3_report.txt', 'w') as f:
    f.write(sample_text)

print('‚úÖ Demo data created at demo_data/q3_report.txt')

## ‚öôÔ∏è Step 4 ‚Äî Initialize VDBpipe with Config Override

Use `config_override` to set providers at runtime ‚Äî **no `config.yaml` file needed on Colab!**

In [None]:
# ============================================================
# Option A: Use a FREE local configuration (no API keys needed)
# - Embeddings: SentenceTransformers (all-MiniLM-L6-v2)
# - Vector DB: FAISS (local, in-memory)
# - LLM: None (RAG without generation ‚Äî retrieval only)
# ============================================================

pipeline = VDBpipe(config_override={
    "embedding": {
        "provider": "local",
        "model_name": "all-MiniLM-L6-v2"
    },
    "database": {
        "provider": "faiss",
        "mode": "local",
        "collection_name": "demo_collection"
    },
    "llm": {
        "provider": "null"
    },
    "paths": {
        "logs_dir": "logs/",
        "data_dir": "demo_data/"
    }
})

print('‚úÖ VDBpipe initialized successfully!')
print(f'   Graph: {pipeline.graph}')
print(f'   PageIndex: {pipeline.page_index}')

## üîÑ Step 5 ‚Äî Tri-Processing Ingestion

One call to `ingest()` runs **3 parallel pipelines**:
1. üóÇÔ∏è Chunks text and stores embeddings in FAISS
2. üìñ Builds a hierarchical PageIndex JSON structure
3. üï∏Ô∏è Extracts entities and relationships into a NetworkX graph

In [None]:
pipeline.ingest('demo_data/')

print('\n‚úÖ Ingestion complete!')
print(f'   Graph nodes: {list(pipeline.graph.nodes())}')
print(f'   PageIndex keys: {list(pipeline.page_index.keys())}')

## ü§ñ Step 6 ‚Äî OmniRouter Query (Retrieval without LLM)

Since we set `llm.provider: null`, we get ranked retrieval results back.
To get LLM-generated answers, set your OpenAI/Groq/Anthropic key in the config override.

In [None]:
# The OmniRouter classifies these queries and picks the right engine:

# Engine 1 ‚Äî Vector RAG (direct factual lookup)
result1 = pipeline.query("What was the total revenue in Q3 2024?")
print('Engine 1 (Vector RAG):')
print(result1)
print()

In [None]:
# Engine 2 ‚Äî Vectorless / PageIndex RAG (holistic reading)
result2 = pipeline.query("Summarize the overall document.")
print('Engine 2 (Vectorless RAG):')
print(result2)
print()

In [None]:
# Engine 3 ‚Äî GraphRAG (relationship reasoning)
result3 = pipeline.query("How is the CEO connected to the board?")
print('Engine 3 (GraphRAG):')
print(result3)
print()

## üß© Step 7 ‚Äî (Optional) Use with OpenAI for Full RAG Generation

If you have an OpenAI API key, set it and re-initialize to get full LLM-generated answers.

In [None]:
# Uncomment and set your API key to enable LLM generation:

# import os
# os.environ['OPENAI_API_KEY'] = 'sk-...your-key-here...'

# pipeline_gpt = VDBpipe(config_override={
#     "embedding": {"provider": "local", "model_name": "all-MiniLM-L6-v2"},
#     "database": {"provider": "faiss", "mode": "local", "collection_name": "demo_gpt"},
#     "llm": {"provider": "openai", "model_name": "gpt-4o-mini"},
#     "paths": {"logs_dir": "logs/", "data_dir": "demo_data/"}
# })
# pipeline_gpt.ingest('demo_data/')
# answer = pipeline_gpt.query("What was Q3 revenue and who is CEO?")
# print(answer)

## üìä Step 8 ‚Äî Extract Structured JSON (Engine 4)

In [None]:
# Engine 4 works with an LLM. With llm=null it returns the retrieved context.
# With GPT/Groq configured, it returns type-safe JSON.

schema = {
    "company_name": "string",
    "revenue_usd": "integer",
    "ceo_name": "string",
    "risk_factors": "list of strings"
}

extracted = pipeline.extract(
    query="Extract all key company metrics from the document.",
    schema=schema
)
print('üß© Extracted Data (Engine 4):')
print(extracted)

## ‚úÖ Summary

| Feature | Status |
|---|---|
| Package Installation | ‚úÖ |
| VDBpipe Initialization | ‚úÖ |
| Tri-Processing Ingestion | ‚úÖ |
| Engine 1 ‚Äî Vector RAG | ‚úÖ |
| Engine 2 ‚Äî Vectorless RAG | ‚úÖ |
| Engine 3 ‚Äî GraphRAG | ‚úÖ |
| Engine 4 ‚Äî LangChain Extract | ‚úÖ (needs LLM for generation) |

---
*vectorDBpipe v0.2.1 | Created by Yash Desai | [GitHub](https://github.com/vectordbpipe/vectorDBpipe)*