## Semantic Node Splitter

In [4]:
from rag.document_loader import DocumentLoader

# Create loader
doc_loader = DocumentLoader()

# Ingest all papers (adjust path as needed)
result = doc_loader.ingest_directory(
    directory_path="Research Paper",
    document_type="research_paper",
    tenant_id="staging",
    recursive=True
)

print(f"✓ Ingested {result['documents_processed']} documents")
print(f"✓ Created {result['nodes_processed']} searchable chunks")

[32m2025-12-01 11:46:47.690[0m | [1mINFO    [0m | [36mrag.vector_store[0m:[36m_create_indexes[0m:[36m85[0m - [1mCreated MongoDB indexes for filtering[0m
[32m2025-12-01 11:46:47.691[0m | [1mINFO    [0m | [36mrag.vector_store[0m:[36m__init__[0m:[36m65[0m - [1mInitialized VectorStore: db=swastya, collection=user_context_vectors, index=vector_search_index[0m
[32m2025-12-01 11:46:47.692[0m | [1mINFO    [0m | [36mrag.embedding_service[0m:[36m__init__[0m:[36m43[0m - [1mInitialized EmbeddingService with model: text-embedding-3-small[0m
[32m2025-12-01 11:46:47.693[0m | [1mINFO    [0m | [36mrag.embedding_service[0m:[36m__init__[0m:[36m43[0m - [1mInitialized EmbeddingService with model: text-embedding-3-small[0m
[32m2025-12-01 11:46:47.694[0m | [1mINFO    [0m | [36mrag.semantic_chunking[0m:[36m__init__[0m:[36m63[0m - [1mInitialized SemanticChunker with buffer_size=1, threshold=95[0m
[32m2025-12-01 11:46:47.695[0m | [1mINFO    [0m | 

✓ Ingested 68 documents
✓ Created 232 searchable chunks


In [5]:
tenant_id = "staging"
user_id = "research_paper:research_paper"
query = "what are the side effects of a high fat diet?"

In [6]:
from rag.retriever import RAGRetriever

# Initialize retriever
retriever = RAGRetriever()

[32m2025-12-01 11:50:48.764[0m | [1mINFO    [0m | [36mrag.vector_store[0m:[36m_create_indexes[0m:[36m85[0m - [1mCreated MongoDB indexes for filtering[0m
[32m2025-12-01 11:50:48.766[0m | [1mINFO    [0m | [36mrag.vector_store[0m:[36m__init__[0m:[36m65[0m - [1mInitialized VectorStore: db=swastya, collection=user_context_vectors, index=vector_search_index[0m
[32m2025-12-01 11:50:48.767[0m | [1mINFO    [0m | [36mrag.embedding_service[0m:[36m__init__[0m:[36m43[0m - [1mInitialized EmbeddingService with model: text-embedding-3-small[0m
[32m2025-12-01 11:50:48.770[0m | [1mINFO    [0m | [36mrag.retriever[0m:[36m__init__[0m:[36m58[0m - [1mInitialized RAGRetriever with LlamaIndex components[0m


In [7]:
# Or use LlamaIndex's query engine for more advanced RAG
query_engine = retriever.as_query_engine(
    user_id=user_id,
    tenant_id=tenant_id
)
response = query_engine.query(query)
print(response)

2025-12-01 11:50:51,627 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-01 11:50:54,262 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Some reported side effects of a high-fat diet include bad taste in the mouth, constipation, diarrhea, dizziness, halitosis, headache, insomnia, nausea, thirst, and feelings of tiredness, weakness, or fatigue.


### The side effects of a low-fat diet may include potential nutrient deficiencies, particularly in fat-soluble vitamins like Vitamin D, E, A, and K, as well as essential fatty acids. Additionally, some individuals may experience increased hunger and cravings due to reduced satiety from fat intake, leading to potential overeating or difficulty in maintaining weight loss.

### The side effects of a high-fat diet can include an elevated risk of coronary heart disease, potential harmful effects on kidney function, and an increased risk of all-cause mortality, particularly when the diet is based on animal protein sources.

In [16]:
result = retriever.retrieve(
    query=query,
    user_id=user_id,
    tenant_id=tenant_id,
    top_k=5
)

for res in result["chunks"]:
    print("---------------------------------------- Chunk Start ------------------------------------------")
    print("Source Document: ", res["metadata"].get("filename", "N/A"))
    print("------------------------------------------------------------------------------------------\n")
    print("Text: ", res["text"])
    print("---------------------------------------- Chunk End ------------------------------------------")
    print("\n\n")

[32m2025-12-01 11:59:33.323[0m | [1mINFO    [0m | [36mrag.retriever[0m:[36mretrieve[0m:[36m84[0m - [1mRetrieving context for query: 'what are the side effects of a high fat diet?...'[0m
2025-12-01 11:59:33,844 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[32m2025-12-01 11:59:34.110[0m | [1mINFO    [0m | [36mrag.retriever[0m:[36mretrieve[0m:[36m165[0m - [1mRetrieved 5 chunks for user research_paper:research_paper[0m


---------------------------------------- Chunk Start ------------------------------------------
Source Document:  Popular-Diets-A-Scientific-Review.pdf
------------------------------------------------------------------------------------------

Text:  A number of different metabolic effects have been re-
ported for high-fat, low-CHO diets. The most common is
ketosis, as measured by increased urinary ketones
(24,57,58,60,63,69,79). Ketogenic diets usually have less
than 20% calories from CHOs (80). Because many of these
are also low calorie, average CHO intake is 50 to 100 g/d.
All popular low-CHO diets recommend,100 g of CHO per
day. Ketogenic diets may cause a significant increase in
blood uric acid concentration (57,60,63,67,78).
Other metabolic effects range from decreased blood
glucose and insulin levels, to altered blood lipid levels
(Table 10). Many of these effects (e.g., decreased LDL
and HDL cholesterol) may be the consequence of weight
loss, rather than diet composition, espec