![i2b2 Logo](images/transmart-logo.png)

# Using ChromaDB + Embeddings to Search Patient Notes (RAG)

This notebook demonstrates how to implement **Retrieval-Augmented Generation (RAG)** using **local text embeddings** and **ChromaDB** to search and analyze clinical notes stored in an i2b2-like format. The workflow includes decoding BinHex-encoded notes, embedding them using `MiniLM`, storing and retrieving them via ChromaDB, and using a local LLM (e.g., Qwen or LLaMA 3 via Ollama) to generate structured, clinically meaningful responses.

### Key Concepts Covered

- Decoding BinHex-encoded clinical notes from a simulated i2b2 table
- Creating semantic vector embeddings with `MiniLM`
- Storing notes and metadata in a persistent ChromaDB vector store
- Performing similarity search and understanding cosine-based relevance scores
- Filtering results to retain only the **most recent note per patient**
- Using Maximal Marginal Relevance (MMR) to reduce redundancy in search results
- Injecting retrieved context into a reusable prompt template
- Generating structured, AI-powered responses using a local LLM via Ollama

Each notebook cell builds on the previous one to demonstrate a complete RAG workflow tailored to **clinical informatics and patient note analysis**.

> This notebook is part of the workshop: _Using LLMs to Search Patient Notes_.



In [None]:
# -----------------------------------------------------------
# 1. Load and Decode Clinical Notes from i2b2-Mimicking CSV
# -----------------------------------------------------------
# This cell loads visit-level data from a CSV file that mimics the i2b2 `visit_dimension` table
# and decodes the BinHex-encoded clinical notes into readable text.
#
# Each record contains:
#   - encounter_num: Unique encounter ID
#   - patient_num: Patient identifier
#   - start_date, end_date: Visit timestamps
#   - location_cd, location_path: Clinic/service details
#   - visit_blob: BinHex-encoded clinical note text

import pandas as pd
import binascii
from IPython.display import display, Markdown

# Define path to the input data file
csv_path = "datafiles/i2b2_encounter_table.csv"

# Load the CSV into a pandas DataFrame
df = pd.read_csv(csv_path)

# Define decoding function for BinHex notes
def decode_note(hex_blob):
    """Decode a single BinHex-encoded string into plain text."""
    hex_str = hex_blob.replace("0x", "")
    return binascii.unhexlify(hex_str).decode("utf-8", errors="ignore")

# Apply decoding to create a plain text column
df["note_text"] = df["visit_blob"].apply(decode_note)

# Display the first 10 decoded notes with key metadata
display(df.head(10))


## 2. Embed and Store Entire Clinical Notes in ChromaDB

In this step, we process and store clinical notes as **entire documents** in a ChromaDB vector store. This preserves complete patient-level context for semantic search and retrieval.

### Key Steps (2.1):
1. **Embed Full Notes**:
   - Each clinical note is transformed into a semantic vector using a transformer-based embedding model.
2. **Store in ChromaDB**:
   - The note and its metadata (patient ID, encounter number, date) are stored together in a persistent vector store.

### Why Use This Approach?

Storing full notes is valuable when:
- You want to retrieve the entire clinical context (not just snippets or chunks)
- The downstream task (e.g., summarization or decision support) benefits from broader information

This method is most useful when each note is concise enough to fit within LLM input limits and clinical completeness is critical.

<img src="./images/rag_full.png" alt="RAG Full" width="900">


In [None]:
# -----------------------------------------------------------
# 2. Embed Clinical Notes Using Local MiniLM Embeddings and Store in ChromaDB
# -----------------------------------------------------------
# This cell embeds each clinical note using a Hugging Face transformer model
# and stores the results in a ChromaDB vector store for later retrieval.
#
# The model used is `sentence-transformers/all-MiniLM-L6-v2`:
# - Lightweight and optimized for local semantic search
# - Produces 384-dimensional vectors suitable for RAG
#
# Requirements:
#   pip install langchain langchain-huggingface chromadb

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma

# Initialize local embedding model
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Create or connect to ChromaDB store (persistent directory will be used automatically)
vectorstore = Chroma(
    persist_directory="./datafiles/chroma_db_notes_full",
    embedding_function=embedding_model
)

# Extract clinical notes and metadata
documents = df["note_text"].tolist()
metadata = df[["patient_num", "encounter_num", "start_date"]].to_dict(orient="records")

# Add text + metadata to the Chroma vector store
vectorstore.add_texts(texts=documents, metadatas=metadata)

print(f"Successfully embedded and stored {len(documents)} clinical notes using MiniLM and ChromaDB.")


## 3. Defining the Query for Clinical Note Retrieval

In [None]:
# -----------------------------------------------------------
# 3. Define the Query for Clinical Note Retrieval
# -----------------------------------------------------------
# This cell defines a natural language query that will be used to search the embedded clinical notes stored in ChromaDB.

# Key Concepts:
# - The query should reflect a specific clinical information need.
# - The vector store will compare the embedded form of this query to all stored clinical note vectors using semantic similarity.

# Example Query:
# "Who has asthma and is taking Fluticasone and Albuterol?"
# This query aims to retrieve notes describing patients diagnosed with asthma who are also prescribed both Fluticasone and Albuterol.

query = "Who has asthma and is taking Fluticasone and Albuterol?"

print(query)


## 4. Retrieving Clinical Notes with Similarity and MMR Search

This section demonstrates how to retrieve relevant clinical notes from ChromaDB using multiple vector-based search strategies. We compare traditional **similarity search** with more advanced techniques like **Maximal Marginal Relevance (MMR)**.

### Key Retrieval Methods:

1. **Similarity Search with Scores (Step 4.1)**
   - Retrieves clinical notes ranked by semantic similarity to the input query.
   - Returns relevance scores to support sorting and thresholding.

2. **Score Threshold Filtering (Step 4.2)**
   - Filters out low-confidence matches based on a minimum similarity score.
   - Improves retrieval precision by returning only highly aligned documents.

3. **Maximal Marginal Relevance (MMR) Search (Step 4.3)**
   - Balances relevance and diversity in retrieved documents.
   - Reduces redundancy while preserving contextual variety.

### Why Use These Strategies?

Effective retrieval is critical to building high-quality RAG pipelines. These techniques help:
- Improve contextual relevance of the retrieved clinical notes
- Eliminate noisy or marginal matches
- Encourage diversity in content to support more robust, less biased LLM outputs

<img src="./images/rag_retrieval.png" alt="RAG Retrieval" width="900">


In [None]:
# -----------------------------------------------------------
# 4.1. Performing Similarity Search with Relevance Scores
# -----------------------------------------------------------
# This cell retrieves clinical notes that are semantically similar to a given query.
# Each returned result includes a relevance score that reflects how well the note
# matches the query, enabling more transparent and controllable filtering.

# Key Function:
# - vectorstore.similarity_search_with_relevance_scores(query, k=10)
#   Retrieves the top k most relevant documents with their similarity scores.

# Use Case:
# - This method is ideal for inspecting how well the embedding model is performing.
# - It supports ranked retrieval and post-filtering for building RAG pipelines.

# Score Interpretation (higher is more relevant):
#   0.90 – 1.00 → Highly relevant
#   0.70 – 0.90 → Strong relevance
#   0.50 – 0.70 → Moderate relevance
#   0.30 – 0.50 → Low relevance
#   0.00 – 0.30 → Minimal or no relevance


from IPython.display import display, Markdown

results = vectorstore.similarity_search_with_relevance_scores(query, k=10)

display(Markdown("### Retrieved Clinical Notes with Relevance Scores"))

for idx, (doc, score) in enumerate(results, 1):
    patient = doc.metadata.get("patient_num", "N/A")
    encounter = doc.metadata.get("encounter_num", "N/A")
    date = doc.metadata.get("start_date", "N/A")
    doc_id = doc.id
    excerpt = doc.page_content[:1000].replace("\n", " ")

    display(Markdown(
        f"**Document {idx}**  \n"
        f"- **Relevance Score:** `{score:.4f}`  \n"
        f"- **Patient Num:** `{patient}`  \n"
        f"- **Encounter Num:** `{encounter}`  \n"
        f"- **Start Date:** `{date}`  \n"
        f"- **Document ID:** `{doc_id}`  \n"
        f"- **Excerpt:**\n\n```text\n{excerpt}...\n```"
    ))




In [None]:
# -----------------------------------------------------------
# 4.2. Using a Retriever with a Score Threshold
# -----------------------------------------------------------
# This cell demonstrates how to configure a retriever that returns only documents
# whose similarity scores exceed a defined threshold.

# Key Parameters:
# - search_type="similarity_score_threshold":
#     Instructs the retriever to filter results by a minimum score.
# - search_kwargs={"k": 10, "score_threshold": 0.5}:
#     - k: Number of top-ranked documents to consider.
#     - score_threshold: Minimum relevance score required for inclusion.

# Purpose:
# This approach increases precision by filtering out low-quality matches.
# It is especially useful in clinical settings where retrieval accuracy is essential.

from IPython.display import display, Markdown

retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={
        "k": 10,
        "score_threshold": 0.6
    }
)

results = retriever.invoke(query)

display(Markdown("### Retrieved Clinical Notes (Score ≥ 0.6)"))

for idx, doc in enumerate(results, 1):
    patient = doc.metadata.get("patient_num", "N/A")
    date = doc.metadata.get("start_date", "N/A")
    doc_id = doc.id
    excerpt = doc.page_content[:1000].replace("\n", " ")

    display(Markdown(
        f"**Document {idx}**  \n"
        f"- **Patient Num:** `{patient}`  \n"
        f"- **Start Date:** `{date}`  \n"
        f"- **Document ID:** `{doc_id}`  \n"
        f"- **Excerpt:**\n\n```text\n{excerpt}...\n```"
    ))

display(Markdown(f"**Total relevant results:** {len(results)}"))


In [None]:
# -----------------------------------------------------------
# 4.3. Performing Maximal Marginal Relevance (MMR) Search
# -----------------------------------------------------------
# This cell retrieves clinical notes using Maximal Marginal Relevance (MMR),
# which balances relevance to the query and diversity across the results.

# Key Parameters:
# - max_marginal_relevance_search(): Retrieves results using MMR.
# - fetch_k=100: Number of top documents considered before applying MMR.
# - k=10: Final number of documents returned.
# - lambda_mult:
#     - 0.0 → maximize diversity
#     - 1.0 → maximize relevance
#     - 0.5 → balance between the two

# Purpose:
# MMR reduces redundancy while maintaining relevance, useful when diverse perspectives
# on a clinical topic (e.g., treatment variations) are desired.

from IPython.display import display, Markdown

results = vectorstore.max_marginal_relevance_search(
    query=query,
    k=5,
    fetch_k=500,
    lambda_mult=0.5
)

display(Markdown("### Retrieved Clinical Notes Using MMR Search"))

for idx, doc in enumerate(results, 1):
    patient = doc.metadata.get("patient_num", "N/A")
    date = doc.metadata.get("start_date", "N/A")
    doc_id = getattr(doc, "id", "N/A")
    excerpt = doc.page_content[:1000].replace("\n", " ")

    display(Markdown(
        f"**Document {idx}**  \n"
        f"- **Patient Num:** `{patient}`  \n"
        f"- **Start Date:** `{date}`  \n"
        f"- **Document ID:** `{doc_id}`  \n"
        f"- **Excerpt:**\n\n```text\n{excerpt}...\n```"
    ))

display(Markdown(f"**Total results returned:** `{len(results)}`"))


## 5. Generating Structured Responses with an LLM

In this section, we take the clinical notes retrieved in the previous step and pass them into a Large Language Model (LLM) for analysis and summarization. This completes the RAG (Retrieval-Augmented Generation) workflow.

### Key Steps:

1. **Creating a Prompt Template for LLM Querying (Step 5.1)**
   - Defines a reusable prompt structure to guide the LLM in analyzing and summarizing clinical notes.
   - Ensures the output is consistent, structured, and clinically useful.

2. **Invoking LLM model with Retrieved Context (Step 5.2)**
   - Inserts the retrieved documents into the prompt.
   - Sends the final prompt to an LLM (e.g., qwen2, qwen3, llama3) for generation.
   - Outputs a structured answer to a medical query.

### Purpose

This final step showcases how LLMs can generate rich, relevant summaries or extractions from retrieved clinical data. It is particularly useful for clinical decision support, patient summarization, or intelligent search.

<img src="./images/rag_generation.png" alt="RAG Generation" width="1250">


In [None]:
# -----------------------------------------------------------
# 5.1. Create a Prompt Template for LLM Querying
# -----------------------------------------------------------
# This prompt template guides the LLM to generate structured, clinically relevant responses from retrieved clinical notes. The template is dynamic and reusable.

# Context:
# - Each clinical note is associated with metadata (patient_num, encounter_num, start_date).
# - These identifiers help structure the output and ensure traceability.

# Key Components:
# - PromptTemplate.from_template(): Allows dynamic substitution of note content and query.
# - {results}: Injects top-matching clinical notes as the context for the model.
# - {query}: Represents the user's clinical question.
# - Output format:
#     - Patient Num, Gender, Age, Race
#     - Visit Date
#     - Summary of findings related to the query

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "You are a medical assistant analyzing clinical notes. \n\n"

    "Answer the following question: {query}\n\n"

    "Based on the following records: {retrieved_docs}\n\n"

    "Provide your response using the following structure:\n"
    "- Patient Num: <patient_num>, Gender: <value>, Age: <value>, Race: <value>\n"
    "- Encounter: <encounter_num>, Visit Date: <start_date>\n"
    "- Summary: One paragraph summarizing the note and one paragraph answering the question.\n\n"
    "- Has Asthma: <Yes/No>"
    "Instructions:\n"
    "- Include all patients relevant to the query.\n"
    "- Use only the most recent note for each patient (identified by patient_num).\n"
)


In [29]:
# -----------------------------------------------------------
# 5.2. Use Retrieved Context to Invoke LLM and Generate Response
# -----------------------------------------------------------
# This step completes the RAG workflow by injecting the retrieved clinical notes
# into a structured prompt and using an LLM (via Ollama) to generate a response.

# Key Components:
# - prompt_template.format(...): Fills in the template with clinical context and user query.
# - model.invoke(...): Sends the completed prompt to the LLM for inference.
# - display(Markdown(...)): Nicely renders the model's response in the notebook.

from langchain_ollama import ChatOllama
from IPython.display import display, Markdown

# Initialize the local Ollama model (e.g., Qwen 2, LLaMA 3, etc.)
model = ChatOllama(model="llama3:70b")

# Prepare the context text (combine page_content from results list)
retrieved_context = "\n\n---\n\n".join([doc.page_content for doc in results])

# Fill the prompt template with retrieved notes and the user's query
final_prompt = prompt_template.format(
    retrieved_docs=retrieved_context,
    query=query
)

# Generate a structured response using the LLM
response = model.invoke(final_prompt)

# Display the LLM-generated output
display(Markdown("### LLM-Generated Response"))
display(Markdown(response.content))


### LLM-Generated Response

Here are the responses:

**Patient 1**
- Patient Num: 1000000005, Gender: Female, Age: 32, Race: Hispanic
- Encounter: 477663, Visit Date: June 21, 2005
- Summary: The patient presents with continued asthma symptoms despite regular usage of her asthma medications. She reports shortness of breath, wheezing, and a cough that disrupts her sleep.
- Has Asthma: Yes

**Patient 2**
- Patient Num: 1000000088, Gender: Male, Age: 9, Race: Asian
- Encounter: 477031, Visit Date: Oct 28, 2004
- Summary: The patient presents with complications associated with his asthma, including wheezing and shortness of breath. He has an established history of asthma and allergic rhinitis.
- Has Asthma: Yes

**Patient 3**
- Patient Num: 1000000112, Gender: Male, Age: 12, Race: Black
- Encounter: 478135, Visit Date: December 5, 2005
- Summary: The patient presents with poorly controlled asthma, evidenced by worsening symptoms and lung function test results. He reports frequent wheezing, episodes of dyspnea, and a persistent nocturnal cough.
- Has Asthma: Yes

**Patient 4**
- Patient Num: 1000000089, Gender: Male, Age: 37, Race: Black
- Encounter: 475021, Visit Date: September 9, 2002
- Summary: The patient presents with manageable asthma under his current treatment regime. However, he reports occasional exacerbations during physical activity and pollen season.
- Has Asthma: Yes

Note that Patient 4 also has allergic rhinitis and joint pain, but the primary diagnosis is asthma.

Only one patient (Patient 3) has a medication regimen that includes Atrovent (ipratropium bromide), Flovent (fluticasone), Prednisolone, Zantac, and Zithromax.