# Basic RAG System Example

This notebook demonstrates the basic functionality of the unstructured RAG system.

## Setup

First, let's import the necessary modules.

In [None]:
import os
import sys

# Add parent directory to path
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), "..")))

from app.config import config
from rag.data_ingestion.loader import load_document
from rag.processing.chunker import chunk_text
from rag.processing.embedder import get_embedder
from rag.retrieval.milvus_client import get_milvus_client, store_chunks
from rag.retrieval.search import search_documents
from rag.generation.response import generate_response

## Loading and Processing a Document

Let's load a document and process it.

In [None]:
# Define document path
# Replace with your document path
document_path = "../data/sample.pdf"

# Load document
text, metadata = load_document(document_path)

print(f"Document loaded: {metadata.get('file_name')}")
print(f"Text length: {len(text)} characters")
print(f"Metadata: {metadata}")

## Chunking the Document

Now, let's split the document into chunks.

In [None]:
# Chunk text
chunks = chunk_text(text)

print(f"Document split into {len(chunks)} chunks")
print(f"\nSample chunk:\n{chunks[0].text[:200]}...")

## Generating Embeddings

Let's generate embeddings for the chunks.

In [None]:
# Get embedder
embedder = get_embedder()

# Generate embeddings
embedder.embed_chunks(chunks)

print(f"Embeddings generated for {len(chunks)} chunks")
print(f"Embedding dimension: {len(chunks[0].embedding)}")

## Storing Chunks in Milvus

Now, let's store the chunks in Milvus.

In [None]:
# Get document ID
doc_id = os.path.basename(document_path)
doc_name = metadata.get("file_name", doc_id)

# Store chunks
store_chunks(chunks, doc_id, doc_name, embedder)

print(f"Chunks stored in Milvus collection: {config.milvus.collection}")

## Searching for Information

Let's search for information in the document.

In [None]:
# Define a query
query = "What is the main topic of this document?"

# Search for relevant chunks
results = search_documents(query)

print(f"Found {len(results)} relevant chunks")
for i, result in enumerate(results):
    print(f"\nResult {i+1} (Score: {result.score:.4f}):\n{result.text[:200]}...")

## Generating a Response

Finally, let's generate a response to the query.

In [None]:
# Generate response
response = generate_response(query, results)

print(f"Query: {query}")
print(f"\nResponse:\n{response}")

## Try Another Query

Let's try another query.

In [None]:
# Define another query
query = "What are the key findings or conclusions?"

# Search and generate response
results = search_documents(query)
response = generate_response(query, results)

print(f"Query: {query}")
print(f"\nResponse:\n{response}")