# 🏥 RAG for Healthcare Document QA

This notebook demonstrates how to build a RAG pipeline to answer questions from healthcare-related documents such as insurance claims or patient summaries.

## 📦 Install Required Libraries

In [None]:
!pip install langchain openai faiss-cpu sentence-transformers pypdf python-dotenv

## 🔑 Environment Setup

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ['OPENAI_API_KEY'] = os.getenv("OPENAI_API_KEY")

## 📄 Load and Split PDF

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFLoader("data/health_insurance_claim_detailed.pdf")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = text_splitter.split_documents(documents)
print(f"Loaded {len(chunks)} chunks.")

## 🔍 Embed and Store with FAISS

In [None]:
from langchain.vectorstores import FAISS
from langchain.embeddings import SentenceTransformerEmbeddings

embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
db = FAISS.from_documents(chunks, embedding_model)

## 🧠 Set Up Retriever and LLM

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

retriever = db.as_retriever()
llm = ChatOpenAI(model_name="gpt-3.5-turbo")
rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

## ❓ Ask Questions

In [None]:
query = "What amount was claimed by the patient?"
response = rag_chain.run(query)
print(response)

## ✅ Try More Questions

In [None]:
questions = [
    "Who is the policyholder?",
    "What is the diagnosis?",
    "What treatment was administered?",
    "When was the claim submitted?"
]

for q in questions:
    print(f"\nQ: {q}\nA: {rag_chain.run(q)}")

## 📌 Summary
- Loaded healthcare PDF
- Embedded chunks using SentenceTransformer
- Retrieved context with FAISS
- Answered questions using GPT-3.5

Ready to customize for other domains or LLMs!