# Medical QA System: Experimentation and Evaluation Notebook

This notebook is designed for developing, testing, and evaluating components of the Medical QA system.

## 1. Setup and Configuration

- Load environment variables.
- Import necessary libraries and project modules.
- Configure paths and model names.

In [None]:
# %load_ext autoreload
# %autoreload 2

import os
from dotenv import load_dotenv
import pandas as pd

# Load environment variables from .env file
load_dotenv(override=True) # Use override=True if you need to reload during notebook execution

# Add project root to Python path (if modules are not found)
# import sys
# sys.path.append(os.path.abspath('..')) # Adjust based on notebook location relative to project root

# Import your project modules
# from document_loader import DocumentLoader
# from vector_store import VectorStoreManager
# from knowledge_graph import KnowledgeGraphManager
# from agents import SimpleVectorStoreAgent, KnowledgeGraphAgent # ... and other agents
# from mcp import MasterControlProgram
# from fallback import FallbackHandler

print(f"DATA_PATH: {os.getenv('DATA_PATH')}")
print(f"PERSIST_DIRECTORY (Vector Store): {os.getenv('PERSIST_DIRECTORY')}")
print(f"KG_FILE_PATH: {os.getenv('KG_FILE_PATH')}")

## 2. Document Loading and Preprocessing

- Initialize `DocumentLoader`.
- Load documents from the `DATA_PATH`.
- Experiment with chunking strategies and text splitting parameters.
- Inspect loaded and split documents.

In [None]:
# DATA_DIR = os.getenv("DATA_PATH", "./data") # Default to ./data if not set
# doc_loader = DocumentLoader(data_path=DATA_DIR, chunk_size=1000, chunk_overlap=150)

# # Load and split documents
# processed_documents = doc_loader.load_and_split_documents()

# if processed_documents:
#     print(f"Loaded and split {len(processed_documents)} document chunks.")
#     print("Example chunk metadata:", processed_documents[0].metadata)
#     print("Example chunk content snippet:", processed_documents[0].page_content[:200])
# else:
#     print("No documents processed. Check DATA_PATH.")

## 3. Vector Store Management

- Initialize `VectorStoreManager`.
- Create or load a vector store using the processed documents.
- Test similarity search with sample queries.
- Evaluate different embedding models (if applicable).

In [None]:
# EMBEDDING_MODEL_NAME = os.getenv("MODEL_NAME", "sentence-transformers/all-MiniLM-L6-v2")
# VECTOR_DB_PATH = os.getenv("PERSIST_DIRECTORY", "./vector_store_db")

# vs_manager = VectorStoreManager(persist_directory=VECTOR_DB_PATH, embedding_model_name=EMBEDDING_MODEL_NAME)

# # Create/load the store (force_recreate=True for initial setup or changes)
# # Ensure 'processed_documents' is available from the previous step
# if 'processed_documents' in locals() and processed_documents:
#     vector_db = vs_manager.create_or_load_store(documents=processed_documents, force_recreate=True)
#     if vector_db:
#         print("Vector store created/loaded successfully.")
#     else:
#         print("Failed to create/load vector store.")
# else:
#     print("Skipping vector store creation as no documents were processed.")

# # Test search
# if vs_manager.vector_store:
#     sample_query = "What are the treatments for diabetes?"
#     search_results = vs_manager.similarity_search(query=sample_query, k=3)
#     print(f"\nSearch results for '{sample_query}':")
#     for doc, score in search_results:
#         print(f"  Source: {doc.metadata.get('source', 'N/A')}, Score: {score:.4f}, Content: {doc.page_content[:100]}...")
# else:
#     print("Vector store not available for search test.")

## 4. Knowledge Graph Construction and Querying

- Initialize `KnowledgeGraphManager`.
- Develop methods/scripts to extract entities and relationships from documents (this is a major sub-task, potentially using NLP/LLMs).
- Add extracted triplets to the KG.
- Test KG querying with sample questions/entities.
- Visualize the KG.

In [None]:
# KG_PATH = os.getenv("KG_FILE_PATH", "./data/medical_kg.gml")
# kg_manager = KnowledgeGraphManager(kg_file_path=KG_PATH)

# # Example: Manually adding some triplets
# # In a real scenario, these would be extracted from documents or other sources
# example_triplets = [
#     ("Aspirin", "Drug", "treats", "Headache", "Symptom"),
#     ("Aspirin", "Drug", "may_cause", "Bleeding", "SideEffect"),
#     ("Diabetes", "Condition", "associated_with", "HighBloodSugar", "Finding")
# ]
# kg_manager.add_triplets(example_triplets)
# kg_manager.save_graph()
# print(f"KG has {kg_manager.graph.number_of_nodes()} nodes and {kg_manager.graph.number_of_edges()} edges.")

# # Test query
# if kg_manager.graph.number_of_nodes() > 0:
#     query_results = kg_manager.query_graph(start_node="Aspirin", relationship="treats")
#     print(f"\nQuery: What does Aspirin treat? Result: {query_results}")
# else:
#     print("KG is empty, skipping query test.")

# # Visualize (requires matplotlib and a display environment or saves to file)
# # kg_manager.visualize_graph(output_file='./data/kg_notebook_visualization.png')
# # print("KG visualization saved if graph was not empty.")

## 5. Agent Initialization and Testing

- Initialize individual agents (e.g., `SimpleVectorStoreAgent`, `KnowledgeGraphAgent`).
- This may involve loading LLMs, configuring API keys (already in .env).
- Test each agent with sample questions.

In [None]:
# # Ensure VectorStoreManager (vs_manager) and KnowledgeGraphManager (kg_manager) are initialized
# 
# # Initialize LLM (example, replace with your actual LLM setup)
# # from langchain.llms import HuggingFaceHub # or other LLM providers
# # from langchain.chains import RetrievalQA
# # HUGGINGFACEHUB_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")
# # if HUGGINGFACEHUB_API_TOKEN:
# #     llm = HuggingFaceHub(repo_id="google/flan-t5-small", model_kwargs={"temperature":0.7, "max_length":256}, huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN)
# # else:
# #     llm = None
# #     print("Warning: HUGGINGFACEHUB_API_TOKEN not set. LLM-based agents might not work fully.")

# # Vector Store Agent
# if 'vs_manager' in locals() and vs_manager.vector_store:
#     # If using RetrievalQA chain, it needs a retriever from the vector store
#     # retriever = vs_manager.vector_store.as_retriever()
#     # qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever) if llm else None
#     # vs_agent = SimpleVectorStoreAgent(vector_store_manager=vs_manager, llm_pipeline=qa_chain)
#     vs_agent = SimpleVectorStoreAgent(vector_store_manager=vs_manager) # Without LLM for now
#     print("SimpleVectorStoreAgent initialized.")
#     vs_query = "What are common side effects of Aspirin?"
#     vs_response = vs_agent.query(vs_query)
#     print(f"\nVectorStoreAgent response for '{vs_query}':\n{vs_response}")
# else:
#     print("Skipping SimpleVectorStoreAgent initialization as vector store is not ready.")

# # Knowledge Graph Agent
# if 'kg_manager' in locals() and kg_manager.graph.number_of_nodes() > 0:
#     kg_agent = KnowledgeGraphAgent(kg_manager=kg_manager)
#     print("KnowledgeGraphAgent initialized.")
#     kg_query = "Aspirin"
#     kg_response = kg_agent.query(kg_query)
#     print(f"\nKnowledgeGraphAgent response for '{kg_query}':\n{kg_response}")
# else:
#     print("Skipping KnowledgeGraphAgent initialization as KG is not ready or empty.")

## 6. Master Control Program (MCP) Testing

- Initialize `FallbackHandler`.
- Initialize `MasterControlProgram` with the configured agents and fallback handler.
- Test the end-to-end QA pipeline with a variety of questions.
- Evaluate how the MCP selects answers or uses fallback.

In [None]:
# agents_list = []
# if 'vs_agent' in locals():
#     agents_list.append(vs_agent)
# if 'kg_agent' in locals():
#     agents_list.append(kg_agent)

# if agents_list:
#     fallback_h = FallbackHandler()
#     mcp = MasterControlProgram(agents=agents_list, fallback_handler=fallback_h, confidence_threshold=0.6)
#     print("MasterControlProgram initialized.")

#     test_questions = [
#         "What are treatments for headaches?", # Likely VS Agent
#         "Tell me about Aspirin.",             # Could be KG or VS
#         "What is the capital of France?"      # Likely Fallback
#     ]

#     for q in test_questions:
#         print(f"\n--- MCP Query: {q} ---")
#         final_answer = mcp.handle_question(q)
#         print(f"MCP Final Answer: {final_answer.get('answer')}")
#         print(f"Chosen Agent: {final_answer.get('agent_name')}, Confidence: {final_answer.get('confidence')}")
# else:
#     print("Skipping MCP initialization as no agents are ready.")

## 7. Evaluation

- Define a set of test questions with known answers/expected outcomes.
- Run these questions through the MCP.
- Calculate metrics (e.g., accuracy, F1 score for retrieval, answer relevance, KG query success rate).
- Analyze failures and identify areas for improvement.

In [None]:
# evaluation_dataset = [
#     {"question": "What are the side effects of Ibuprofen?", "expected_keywords": ["nausea", "dizziness"], "expected_agent": "SimpleVectorStoreAgent"},
#     {"question": "What does Paracetamol treat?", "expected_answer_fragment": "Pain", "expected_agent": "KnowledgeGraphAgent"},
#     # Add more test cases
# ]

# results = []
# if 'mcp' in locals():
#     for item in evaluation_dataset:
#         response = mcp.handle_question(item['question'])
#         results.append({
#             "question": item['question'],
#             "answer": response.get('answer'),
#             "agent": response.get('agent_name'),
#             "confidence": response.get('confidence'),
#             "expected_agent": item.get('expected_agent'),
#             "expected_keywords": item.get('expected_keywords', [])
#         })
#     eval_df = pd.DataFrame(results)
#     print("\nEvaluation Results:")
#     # display(eval_df) # Use display in a Jupyter environment
#     print(eval_df)
# else:
#     print("MCP not initialized. Skipping evaluation.")

# # Further analysis:
# # - Compare 'agent' with 'expected_agent'
# # - Check if 'answer' contains 'expected_keywords'
# # - Analyze low confidence answers

## 8. Further Experiments

- **Hybrid Agents**: Develop agents that combine vector search with KG lookups.
- **Multi-turn Conversations**: Extend agents and MCP to handle follow-up questions and maintain context.
- **LLM Fine-tuning**: Experiment with fine-tuning smaller LLMs on domain-specific medical text for better understanding and generation.
- **Advanced KG Extraction**: Use more sophisticated NLP techniques for building the KG from text.
- **User Feedback Loop**: Design mechanisms to incorporate user feedback for improving answers.