# Medical RAG System with Qdrant Vector Database

This notebook implements a comprehensive Medical Retrieval-Augmented Generation (RAG) system that:
- Processes medical textbooks (PDF) and creates searchable chunks
- Uses Qdrant vector database for efficient document retrieval
- Employs medical-specific embeddings (S-PubMedBert)
- Implements query decomposition for complex medical questions
- Generates accurate answers using Zephyr-7B model

## System Architecture
1. **Document Processing**: PDF → Chunks with metadata
2. **Vector Storage**: Qdrant with medical embeddings
3. **Query Processing**: FLAN-T5 for question decomposition
4. **Retrieval**: Semantic search + BGE reranking
5. **Generation**: Zephyr-7B for medical answer synthesis

## 📦 Installation and Dependencies

Install all required packages for the medical RAG system. This includes:
- LangChain for document processing
- Qdrant for vector database
- Transformers for language models
- Sentence-transformers for embeddings
- PyMuPDF for PDF processing

In [None]:
!pip install langchain qdrant-client sentence-transformers transformers pymupdf

## Additional Community Package

Install the LangChain community package for additional integrations and document loaders.

In [None]:
!pip install -U langchain-community

## 🔧 Core Imports and Setup

Import all necessary libraries for the medical RAG system:
- PyTorch for deep learning operations
- Transformers for language models
- LangChain for document processing
- Qdrant for vector database operations

In [None]:
import torch
import re
import numpy as np
from typing import List, Dict
from transformers import (
    AutoTokenizer, AutoModelForCausalLM,
    pipeline, AutoModelForSeq2SeqLM
)
from sentence_transformers import SentenceTransformer, CrossEncoder
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Qdrant
from langchain.embeddings import HuggingFaceEmbeddings
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
from langchain_community.document_loaders import PyMuPDFLoader

## 📚 Chapter Mapping Configuration

Define the Table of Contents (TOC) mapping for the medical textbook.
This allows us to automatically tag each document chunk with its corresponding chapter,
enabling better organization and filtering during retrieval.

In [None]:
# TOC-based chapter mapping
CHAPTER_MAP = {
    (1, 9): "Introduction and Overview of the Abdomen",
    (10, 23): "Osteology of the Abdomen",
    (24, 45): "Anterior Abdominal Wall",
    (46, 58): "Inguinal Region/Groin",
    (59, 73): "Male External Genital Organs",
    (74, 92): "Abdominal Cavity and Peritoneum",
    (93, 108): "Abdominal Part of Esophagus, Stomach, and Spleen",
    (109, 125): "Liver and Extrahepatic Biliary Apparatus",
    (126, 143): "Duodenum, Pancreas, and Portal Vein",
    (144, 164): "Small and Large Intestines",
    (165, 184): "Kidneys, Ureters, and Suprarenal Glands",
    (185, 201): "Posterior Abdominal Wall and Associated Structures",
    (202, 211): "Pelvis",
    (212, 224): "Pelvic Walls and Associated Soft Tissue Structures",
    (225, 237): "Perineum",
    (238, 250): "Urinary Bladder and Urethra",
    (251, 259): "Male Genital Organs",
    (260, 278): "Female Genital Organs",
    (279, 290): "Rectum and Anal Canal",
    (291, 298): "Introduction to the Lower Limb",
    (299, 327): "Bones of the Lower Limb",
    (328, 343): "Front of the Thigh",
    (344, 352): "Medial Side of the Thigh",
    (353, 363): "Gluteal Region",
    (364, 376): "Back of the Thigh and Popliteal Fossa",
    (377, 385): "Hip Joint",
    (386, 399): "Front of the Leg and Dorsum of the Foot",
    (400, 406): "Lateral and Medial Sides of the Leg",
    (407, 419): "Back of the Leg",
    (420, 431): "Sole of the Foot",
    (432, 438): "Arches of the Foot",
    (439, 457): "Joints of the Lower Limb",
    (458, 466): "Venous and Lymphatic Drainage of the Lower Limb",
    (467, 478): "Innervation of the Lower Limb"
}

def get_chapter_by_page(page_number):
    for (start, end), title in CHAPTER_MAP.items():
        if start <= page_number <= end:
            return title
    return "Unknown Chapter"

## 🏷️ Content Label Detection

Implement automatic content labeling to categorize medical text chunks.
This function analyzes text content and assigns labels such as:
- @definition: Text containing definitions
- @symptoms: Text describing symptoms
- @diagnosis: Diagnostic information
- @treatment: Treatment procedures
- @anatomy_structure: Anatomical descriptions

In [None]:
def detect_labels(text):
    labels = []
    t = text.lower()
    if "is defined as" in t or "refers to" in t or "means" in t:
        labels.append("@definition")
    if "symptoms include" in t or "signs are" in t or "manifestations" in t:
        labels.append("@symptoms")
    if "diagnosis is based on" in t or "diagnosed by" in t or "investigations include" in t:
        labels.append("@diagnosis")
    if "treatment includes" in t or "managed by" in t or "therapy" in t:
        labels.append("@treatment")
    if "relations include" in t or "borders are" in t:
        labels.append("@anatomy_structure")
    if "supplied by" in t or "innervated by" in t:
        labels.append("@supply")
    return labels or ["@general"]

## 📄 PDF Processing and Chunking

Process medical PDFs and create intelligent chunks with rich metadata.
Each chunk includes:
- Chapter information based on page number
- Content labels for categorization
- Unique chunk ID for tracking
- Book metadata for source attribution

Uses RecursiveCharacterTextSplitter with 250 character chunks and 50 character overlap for optimal retrieval.

In [None]:
def load_and_chunk_pdf(pdf_path):
    loader = PyMuPDFLoader(pdf_path)
    pages = loader.load()
    splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=50)
    chunked_docs = []
    for i, page in enumerate(pages):
        page_number = i + 1
        chapter = get_chapter_by_page(page_number)
        chunks = splitter.split_documents([page])
        for j, chunk in enumerate(chunks):
            chunk.metadata = {
                "chapter": chapter,
                "page_number": page_number,
                "labels": detect_labels(chunk.page_content),
                "book_name": pdf_path.split("/")[-1],
                "chunk_id": f"{chapter[:20].replace(' ', '_')}-{page_number}-{j}"
            }
            chunked_docs.append(chunk)
    return chunked_docs

## 🧬 Medical Embeddings Setup

Initialize the medical-specific embedding model.
Using S-PubMedBert-MS-MARCO which is fine-tuned specifically for medical text retrieval.
This model understands medical terminology and relationships better than general-purpose embeddings.

In [None]:
from langchain_community.embeddings import SentenceTransformerEmbeddings

In [None]:
embedding_model = SentenceTransformerEmbeddings(
    model_name="pritamdeka/S-PubMedBert-MS-MARCO"
)

## 📚 Document Loading and Processing

Load the medical textbook PDF and create document chunks.
Update the pdf_path variable to point to your specific medical textbook file.
The system will automatically process all pages and create searchable chunks with metadata.

In [None]:
# Example PDF path (adjust to your file)
pdf_path = "/kaggle/input/vishramsingh/Vishram Singh Textbook of Anatomy Vol 2.pdf"

# ✅ Load and chunk the PDF
chunks = load_and_chunk_pdf(pdf_path)
print(f"[✅] Loaded and chunked: {len(chunks)} chunks")

## 🗄️ Qdrant Vector Database Setup

Initialize Qdrant client and create a collection for medical documents.
Qdrant provides:
- Efficient vector search with COSINE similarity
- Rich metadata filtering capabilities
- Scalable performance for large document collections
- Persistent storage (when using host/port instead of :memory:)

In [None]:
# Initialize Qdrant client and create collection
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

# Initialize Qdrant client (local instance)
qdrant_client = QdrantClient(":memory:")  # Use in-memory for testing, or specify host/port for persistent

# Create collection with appropriate vector size (768 for S-PubMedBert)
collection_name = "medical_documents"
qdrant_client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=768, distance=Distance.COSINE)
)

## 📊 Vector Store Initialization

Create the Qdrant vector store and populate it with medical document chunks.
This step:
- Generates embeddings for all document chunks
- Stores vectors and metadata in Qdrant
- Enables semantic search capabilities
- Preserves all metadata for filtering and attribution

In [None]:
# ✅ Use Qdrant vector database with medical documents
vectorstore = Qdrant(
    client=qdrant_client,
    collection_name=collection_name,
    embeddings=embedding_model
)

# Add documents to Qdrant
vectorstore.add_documents(chunks)
print(f"[✅] Added {len(chunks)} chunks to Qdrant vector database")

## 🧠 Query Decomposition Model

Load FLAN-T5 model for intelligent query decomposition.
This model breaks down complex medical questions into simpler, focused sub-questions
that can be answered more accurately by the retrieval system.

In [None]:
from transformers import pipeline
import re

# Load the FLAN-T5 model for text2text-generation
subq_pipe = pipeline("text2text-generation", model="google/flan-t5-base", device=0)

## 🔍 Advanced Query Decomposition Function

Implement sophisticated query decomposition with:
- Medical question parsing and pronoun resolution
- Example-based prompting for consistent sub-question generation
- Semantic deduplication to avoid redundant sub-questions
- Keyword-based filtering to ensure diverse coverage

This ensures complex medical questions are broken down into manageable, non-overlapping components.

In [None]:
def extract_subquestionss(query):
    
    # Extract noun phrase for pronoun replacement
    noun_match = re.search(r"what\s+is\s+(.*?)(?:\s*(,|and|\.|\?)|$)", query, re.IGNORECASE)
    subject = noun_match.group(1).strip() if noun_match else None

    # Construct prompt
    prompt = f"""
You are a helpful medical assistant.

Break down each medical question into smaller subquestions that cover one clear medical concept at a time.

Examples:

Question: Tell me about lung surfaces, borders and structures surrounding it.
Subquestions:
1. What are the surfaces of the lungs?
2. What are the borders of the lungs?
3. What are the structures surrounding the lungs?

Question: Tell me about liver anatomy and function.
Subquestions:
1. What is the anatomy of the liver?
2. What are the functions of the liver?

Question: Tell me about the surfaces, borders and relations of the liver.
Subquestions:
1. What are the surfaces of the liver?
2. What are the borders of the liver?
3. What are the relations of the liver?

Question: {query}
Subquestions:
"""

    # Generate subquestions using BioFLAN
    output = subq_pipe(prompt, max_new_tokens=200, do_sample=False)[0]["generated_text"]
    raw = re.findall(r"\d+\.\s*([^0-9]+(?:\?.*?)?)", output)

    # Basic cleanup and pronoun replacement
    cleaned = []
    seen_normalized = set()

    for q in raw:
        q_clean = q.strip()

        if subject:
            q_clean = re.sub(r"\bits\b", f"the {subject}", q_clean, flags=re.IGNORECASE)
            q_clean = re.sub(r"\bit\b", f"the {subject}", q_clean, flags=re.IGNORECASE)
            q_clean = re.sub(r"\btheir\b", f"the {subject}'s", q_clean, flags=re.IGNORECASE)

        # Fix repeated "the the"
        q_clean = re.sub(r"\bthe\s+the\b", "the", q_clean, flags=re.IGNORECASE)

        # Normalize for character-level deduplication
        norm = re.sub(r"[^a-z]", "", q_clean.lower())
        if norm not in seen_normalized:
            cleaned.append(q_clean)
            seen_normalized.add(norm)

    # --- 🔍 Semantic Deduplication Using Keyword Sets ---
    stopwords = {"what", "are", "is", "the", "of", "in", "its", "a", "an", "and", "on"}
    def keyword_set(text):
        tokens = re.findall(r"\w+", text.lower())
        return set(t for t in tokens if t not in stopwords)

    final = []
    seen_keywords = []

    for q in cleaned:
        q_keywords = keyword_set(q)

        if any(q_keywords == existing for existing in seen_keywords):
            continue  # Skip semantically duplicate question

        final.append(q)
        seen_keywords.append(q_keywords)
    return final

## 🎯 Reranking Model Setup

Initialize BGE reranker for improved document relevance scoring.
This cross-encoder model provides more accurate relevance scores
by considering the interaction between query and document content,
leading to better retrieval quality.

In [None]:
reranker = CrossEncoder("BAAI/bge-reranker-base")

## 🔄 Retrieval and Reranking Pipeline

Implement the core retrieval function that:
1. Decomposes complex queries into sub-questions
2. Retrieves relevant documents for each sub-question
3. Reranks results using cross-encoder for better relevance
4. Returns top documents organized by sub-question

This two-stage approach ensures high-quality document retrieval for medical question answering.

In [None]:
def retrieve_and_rerank_by_subquery(query):

  import time
  start_time = time.time()
  print("[🔍] Extracting subquestions...")
  subqueries = extract_subquestionss(query)
  print(f"[✅] Found {len(subqueries)} subquestions: {subqueries}")

  subquery_doc_map = {}
  print(subqueries)

  for sub in subqueries:
    print(sub)
    print(f"[🔎] Retrieving for subquestion: '{sub}'")
    retriever = vectorstore.as_retriever(search_kwargs={"k": 15})
    docs = retriever.get_relevant_documents("query: " + sub)
    scores = reranker.predict([(sub, doc.page_content) for doc in docs])
    top_docs = sorted(zip(docs, scores), key=lambda x: x[1], reverse=True)[:2]  # top 2
    subquery_doc_map[sub] = [doc for doc, _ in top_docs]

  print("[⏱️] Retrieval + reranking took {:.2f} seconds".format(time.time() - start_time))
  
  return subquery_doc_map

## 🤖 Zephyr Language Model Setup

Initialize the Zephyr-7B model for medical answer generation.
Configuration:
- Auto device mapping for multi-GPU setups
- Float16 precision for memory efficiency
- Evaluation mode for inference

Zephyr is optimized for instruction following and produces high-quality, contextual responses.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "HuggingFaceH4/zephyr-7b-beta"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",                # Spread across both GPUs
    torch_dtype=torch.float16         # Full FP16 precision
)

model.eval()

## ✍️ Medical Answer Generation Function

Generate accurate medical answers for sub-questions using retrieved documents.
Features:
- Structured context formatting from relevant documents
- Medical-specific prompting for accuracy
- Controlled generation parameters (temperature=0.3 for consistency)
- Source-aware responses that stick to provided context

The function ensures answers are grounded in the retrieved medical textbook content.

In [None]:
def generate_answer_per_subquery(subquery, docs):
    import time
    t_start = time.time()

    print(f"[📥] Preparing context for subquery: '{subquery}'")
    
    # Create structured bullet-point context from relevant documents
    context = []
    for i, doc in enumerate(docs):
        passage = doc.page_content.strip().replace("\n", " ")
        context.append(f"- {passage}")

    # Chat-formatted prompt (optimized for Zephyr)
    prompt = f"""<|system|>
You are a helpful and precise medical assistant. Use only the provided medical textbook excerpts to answer the user's question. 
Never add outside knowledge. If information is missing, state that clearly.

<|user|>
Question:
{subquery}

Medical Textbook Sources:
{chr(10).join(context)}

Provide a clear, concise, and medically accurate answer.
<|assistant|>"""

    print("[🧠] Tokenizing prompt...")
    device = model.device
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048).to(device)

    print("[🚀] Generating answer with Zephyr...")
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=400,
            temperature=0.3,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    print(f"[✅] Done with subquery: '{subquery}' in {time.time() - t_start:.2f} sec")
    return tokenizer.decode(output[0], skip_special_tokens=True).split("<|assistant|>")[-1].strip()

## 🎯 Complete Medical Query Processing Pipeline

The main function that orchestrates the entire medical RAG pipeline:
1. **Query Analysis**: Process the input medical question
2. **Decomposition**: Break complex questions into focused sub-questions
3. **Retrieval**: Find relevant documents for each sub-question
4. **Reranking**: Score and rank documents by relevance
5. **Generation**: Generate comprehensive answers using Zephyr
6. **Synthesis**: Combine results into a coherent response

This end-to-end pipeline ensures accurate, well-sourced medical answers.

In [None]:
def answer_medical_query(query):
    results = {}

    print(f"\n[🧠] Starting query: '{query}'")

    # ✅ Use your dedicated retrieval + reranking function
    subquery_docs = retrieve_and_rerank_by_subquery(query)

    # ✅ Loop through each subquery and generate answers
    for i, (subq, docs) in enumerate(subquery_docs.items(), start=1):
        print(f"\n🧩 Subquestion {i}/{len(subquery_docs)}: {subq}")

        # Generate answer for each subquery separately
        answer = generate_answer_per_subquery(subq, docs)

        # Print and store the answer
        print(f"\n📘 Answer:\n{answer}\n")
        results[subq] = answer

    return results

# Example
query = "What is the inguinal canal, and what are its contents?"
answers = answer_medical_query(query)