![i2b2 Logo](images/transmart-logo.png)

# Using ChromaDB + Embeddings to Search Patient Notes (RAG)

This notebook demonstrates how to use **local embeddings** and **retrieval-augmented generation (RAG)** to search and analyze clinical notes stored in an i2b2-like format. It shows how to decode notes, embed them using MiniLM, retrieve similar cases using FAISS, and use a local LLM (e.g., LLaMA 3 via Ollama) to generate structured, clinical responses.

### 🧠 Key Concepts Covered:

## ✅ REVIEW the  !!!
- Decoding binhex-encoded clinical notes
- Creating semantic vector embeddings with `MiniLM`
- Building a local FAISS vector store
- Performing similarity search and understanding cosine scores
- Filtering to only include the **most recent encounter per patient**
- Injecting retrieved context into a structured prompt
- Using a local LLM (Ollama) to generate medical insights

Each cell builds upon the previous one to simulate a full RAG workflow adapted for **clinical informatics** scenarios using i2b2-like data.

> This notebook is part of the workshop: _Using LLMs to Search Patient Notes_.


In [None]:
# -----------------------------------------------------------
# 1. Load and Decode Clinical Notes from i2b2-Mimicking CSV
# -----------------------------------------------------------
# This cell loads visit-level data from a CSV file that mimics the i2b2 `visit_dimension` table
# and decodes the BinHex-encoded clinical notes into readable text.
#
# Each record contains:
#   - encounter_num: Unique encounter ID
#   - patient_num: Patient identifier
#   - start_date, end_date: Visit timestamps
#   - location_cd, location_path: Clinic/service details
#   - visit_blob: BinHex-encoded clinical note text

import pandas as pd
import binascii
from IPython.display import display, Markdown

# Define path to the input data file
csv_path = "datafiles/i2b2_encounter_table.csv"

# Load the CSV into a pandas DataFrame
df = pd.read_csv(csv_path)

# Define decoding function for BinHex notes
def decode_note(hex_blob):
    """Decode a single BinHex-encoded string into plain text."""
    hex_str = hex_blob.replace("0x", "")
    return binascii.unhexlify(hex_str).decode("utf-8", errors="ignore")

# Apply decoding to create a plain text column
df["note_text"] = df["visit_blob"].apply(decode_note)

# Display the first 10 decoded notes with key metadata
display(df.head(10))


## 2. Embed and Store Entire Clinical Notes in ChromaDB

In this step, we process and store clinical notes as **entire documents** in a ChromaDB vector store. This preserves complete patient-level context for semantic search and retrieval.

### Key Steps (2.1):
1. **Embed Full Notes**:
   - Each clinical note is transformed into a semantic vector using a transformer-based embedding model.
2. **Store in ChromaDB**:
   - The note and its metadata (patient ID, encounter number, date) are stored together in a persistent vector store.

### Why Use This Approach?

Storing full notes is valuable when:
- You want to retrieve the entire clinical context (not just snippets or chunks)
- The downstream task (e.g., summarization or decision support) benefits from broader information

This method is most useful when each note is concise enough to fit within LLM input limits and clinical completeness is critical.

<img src="./images/rag_full.png" alt="RAG Full" width="900">


In [63]:
# -----------------------------------------------------------
# 2. Embed Clinical Notes Using Local MiniLM Embeddings and Store in ChromaDB
# -----------------------------------------------------------
# This cell embeds each clinical note using a Hugging Face transformer model
# and stores the results in a ChromaDB vector store for later retrieval.
#
# The model used is `sentence-transformers/all-MiniLM-L6-v2`:
# - Lightweight and optimized for local semantic search
# - Produces 384-dimensional vectors suitable for RAG
#
# Requirements:
#   pip install langchain langchain-huggingface chromadb

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma

# Initialize local embedding model
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Create or connect to ChromaDB store (persistent directory will be used automatically)
vectorstore = Chroma(
    persist_directory="./datafiles/chroma_db_notes_full",
    embedding_function=embedding_model
)

# Extract clinical notes and metadata
documents = df["note_text"].tolist()
metadata = df[["patient_num", "encounter_num", "start_date"]].to_dict(orient="records")

# Add text + metadata to the Chroma vector store
vectorstore.add_texts(texts=documents, metadatas=metadata)

print(f"Successfully embedded and stored {len(documents)} clinical notes using MiniLM and ChromaDB.")


Successfully embedded and stored 1128 clinical notes using MiniLM and ChromaDB.


## 3. Defining the Query for Clinical Note Retrieval

In [75]:
# -----------------------------------------------------------
# 3. Define the Query for Clinical Note Retrieval
# -----------------------------------------------------------
# This cell defines a natural language query that will be used to search the embedded clinical notes stored in ChromaDB.

# Key Concepts:
# - The query should reflect a specific clinical information need.
# - The vector store will compare the embedded form of this query to all stored clinical note vectors using semantic similarity.

# Example Query:
# "Who has asthma and is taking Fluticasone and Albuterol?"
# This query aims to retrieve notes describing patients diagnosed with asthma who are also prescribed both Fluticasone and Albuterol.

query = "Who has asthma and is taking Fluticasone and Albuterol?"

print(query)


Who has asthma and is taking Fluticasone and Albuterol?


## 4. Retrieving Clinical Notes with Similarity and MMR Search

This section demonstrates how to retrieve relevant clinical notes from ChromaDB using multiple vector-based search strategies. We compare traditional **similarity search** with more advanced techniques like **Maximal Marginal Relevance (MMR)**.

### Key Retrieval Methods:

1. **Similarity Search with Scores (Step 4.1)**
   - Retrieves clinical notes ranked by semantic similarity to the input query.
   - Returns relevance scores to support sorting and thresholding.

2. **Score Threshold Filtering (Step 4.2)**
   - Filters out low-confidence matches based on a minimum similarity score.
   - Improves retrieval precision by returning only highly aligned documents.

3. **Maximal Marginal Relevance (MMR) Search (Step 4.3)**
   - Balances relevance and diversity in retrieved documents.
   - Reduces redundancy while preserving contextual variety.

### Why Use These Strategies?

Effective retrieval is critical to building high-quality RAG pipelines. These techniques help:
- Improve contextual relevance of the retrieved clinical notes
- Eliminate noisy or marginal matches
- Encourage diversity in content to support more robust, less biased LLM outputs

<img src="./images/rag_retrieval.png" alt="RAG Retrieval" width="900">


In [65]:
# -----------------------------------------------------------
# 4.1. Performing Similarity Search with Relevance Scores
# -----------------------------------------------------------
# This cell retrieves clinical notes that are semantically similar to a given query.
# Each returned result includes a relevance score that reflects how well the note
# matches the query, enabling more transparent and controllable filtering.

# Key Function:
# - vectorstore.similarity_search_with_relevance_scores(query, k=10)
#   Retrieves the top k most relevant documents with their similarity scores.

# Use Case:
# - This method is ideal for inspecting how well the embedding model is performing.
# - It supports ranked retrieval and post-filtering for building RAG pipelines.

# Score Interpretation (higher is more relevant):
#   0.90 – 1.00 → Highly relevant
#   0.70 – 0.90 → Strong relevance
#   0.50 – 0.70 → Moderate relevance
#   0.30 – 0.50 → Low relevance
#   0.00 – 0.30 → Minimal or no relevance


from IPython.display import display, Markdown

results = vectorstore.similarity_search_with_relevance_scores(query, k=10)

display(Markdown("### Retrieved Clinical Notes with Relevance Scores"))

for idx, (doc, score) in enumerate(results, 1):
    patient = doc.metadata.get("patient_num", "N/A")
    encounter = doc.metadata.get("encounter_num", "N/A")
    date = doc.metadata.get("start_date", "N/A")
    doc_id = doc.id
    excerpt = doc.page_content[:1000].replace("\n", " ")

    display(Markdown(
        f"**Document {idx}**  \n"
        f"- **Relevance Score:** `{score:.4f}`  \n"
        f"- **Patient Num:** `{patient}`  \n"
        f"- **Encounter Num:** `{encounter}`  \n"
        f"- **Start Date:** `{date}`  \n"
        f"- **Document ID:** `{doc_id}`  \n"
        f"- **Excerpt:**\n\n```text\n{excerpt}...\n```"
    ))




### Retrieved Clinical Notes with Relevance Scores

**Document 1**  
- **Relevance Score:** `0.6289`  
- **Patient Num:** `1000000005`  
- **Encounter Num:** `477663`  
- **Start Date:** `06/21/2005`  
- **Document ID:** `7b43b174-5d83-480d-a166-d5510a9f57a1`  
- **Excerpt:**

```text
**Visit Information:** - Patient ID: 1000000005 - Encounter ID: 477663 - Visit Date: June 21, 2005  **Subjective:**  This is a 32-year-old Hispanic female who has been receiving care at our clinic for approximately 3 months. She speaks English and presents today for a follow-up visit scheduled as part of her ongoing asthma management. The patient reports continued asthma symptoms, including persistent shortness of breath, wheezing particularly at night, and a cough that disrupts her sleep. Over the past two weeks, these symptoms have intensified despite regular usage of her asthma medications: albuterol inhaler for rescue, daily fluticasone inhaler, and nightly montelukast. She states the albuterol provides only brief relief.  She has a medical history of unspecified asthma without mention of status asthmaticus, back sprain from unspecified causes, vaginitis, and a previous high-risk pregnancy requiring special investigations. Her social history includes living in an urban environment ...
```

**Document 2**  
- **Relevance Score:** `0.6143`  
- **Patient Num:** `1000000011`  
- **Encounter Num:** `476139`  
- **Start Date:** `11/20/2003`  
- **Document ID:** `82e011c9-1203-40c0-95de-d1c3757d44b4`  
- **Excerpt:**

```text
**Visit Information:** - Patient ID: 1000000011 - Encounter ID: 476139 - Visit Date: Nov 20, 2003  **Subjective:** This is a 54-year-old Caucasian female who speaks English and has been receiving care at our clinic. She presents today for a follow-up regarding her recurrent asthma and associated symptoms. The patient is experiencing persistent shortness of breath, wheezing, and a nocturnal cough that interrupts her sleep. These symptoms have been particularly troublesome over the last month despite adherence to her current medication regimen. She denies smoking and has reported no recent exposure to known allergens or new environmental triggers.  Her past medical history is significant for recurrent asthma, an acute myocardial infarction, hypertension, hypercholesterolemia, lumbar disc displacement leading to chronic lumbago, cervical dysplasia, and a panic disorder. Her medications include fluticasone and an albuterol inhaler for asthma, along with antihypertensive and lipid-lowering ...
```

**Document 3**  
- **Relevance Score:** `0.6027`  
- **Patient Num:** `1000000123`  
- **Encounter Num:** `475208`  
- **Start Date:** `11/27/2002`  
- **Document ID:** `699bf4d2-7faa-4c3d-bf47-53e1bd00e4e7`  
- **Excerpt:**

```text
**Visit Information:** - Patient ID: 1000000123 - Encounter ID: 475208 - Visit Date: November 27, 2002  **Subjective:** This is a 19-year-old Indian male who has been under our care for asthma management. The patient speaks German and came to the clinic on November 27, 2002, for a follow-up visit. Since his last appointment in March 2001, he has experienced an uptick in both the frequency and severity of his asthma attacks. Recently, his nocturnal asthma episodes have become more disruptive, leading to sleep disturbances and affecting his daily function and academic performance. The patient reports consistent symptoms of wheezing, shortness of breath, and chest tightness that are exacerbated by physical activities and cold weather. He has been reliant on his rescue inhaler, using it three to four times a day with only partial relief.  His medical history includes chronic asthma, which has persisted since childhood. He is currently managed with a fluticasone/salmeterol inhaler taken twi...
```

**Document 4**  
- **Relevance Score:** `0.6026`  
- **Patient Num:** `1000000005`  
- **Encounter Num:** `475726`  
- **Start Date:** `07/02/2003`  
- **Document ID:** `8b168d1f-5e49-436b-b6ba-c208abc4c080`  
- **Excerpt:**

```text
**Visit Information:** - Patient ID: 1000000005 - Encounter ID: 475726 - Visit Date: July 2, 2003  **Subjective:**  This is a 30-year-old Hispanic female who is an English speaker, followed at our clinic for asthma management. Today, she presents for a routine follow-up visit. The patient reports persistent symptoms of asthma, including increased shortness of breath, wheezing, and a nocturnal cough that disrupts her sleep. She mentions that these symptoms have worsened over the past two weeks despite regular use of her asthma medications. She is diligent with her albuterol inhaler (used as needed), fluticasone inhaler (daily), and montelukast (nightly), yet finds that the albuterol offers only temporary relief.  Her medical history includes asthma with multiple past exacerbations usually triggered by allergens or respiratory infections. There are no past surgeries of note. Socially, she resides in an urban environment with high allergen exposure and works as a primary school teacher. S...
```

**Document 5**  
- **Relevance Score:** `0.5959`  
- **Patient Num:** `1000000011`  
- **Encounter Num:** `476451`  
- **Start Date:** `03/16/2004`  
- **Document ID:** `c63140ad-4505-4b32-8e9c-41fb7126a0df`  
- **Excerpt:**

```text
## SOAP Note  **Visit Information:** - Patient ID: 1000000011 - Encounter ID: 476451 - Visit Date: March 16, 2004  **Subjective:** This is a 55-year-old Caucasian female who speaks English and has been receiving care at our clinic for the past several months. She presents today for a follow-up visit primarily concerning her asthma, which has been problematic over the last month. The patient reports persistent shortness of breath, wheezing, and a nocturnal cough, hindering her sleep quality despite adherence to her prescribed medication regimen. She denies recent smoking, exposure to known allergens, or new environmental triggers.  Her past medical history includes recurrent asthma, a previous myocardial infarction, hypertension, hypercholesterolemia, lumbar disc displacement with chronic lumbago, cervical dysplasia, and a panic disorder. She is currently taking fluticasone and an albuterol inhaler for asthma management, along with antihypertensive and lipid-lowering medications. The pa...
```

**Document 6**  
- **Relevance Score:** `0.5955`  
- **Patient Num:** `1000000123`  
- **Encounter Num:** `471535`  
- **Start Date:** `01/05/1998`  
- **Document ID:** `1001ffce-5a9d-4685-b03e-6e70aaefc36e`  
- **Excerpt:**

```text
**Visit Information:** - Patient ID: 1000000123 - Encounter ID: 471535 - Visit Date: January 5, 1998  **Subjective:** This is a 15-year-old Indian male who has been a patient at our clinic, presenting today for a follow-up visit. He speaks German and has a well-documented history of asthma. The patient reports experiencing increased frequency and severity of asthma attacks over the past month, particularly at night, resulting in disturbed sleep and difficulty concentrating at school. He describes his symptoms as wheezing, shortness of breath, and chest tightness, often triggered by exercise and cold air. He has been using his rescue inhaler more frequently, approximately three to four times daily, which provides temporary relief.   Past medical history is significant for persistent asthma since early childhood, managed with inhaled corticosteroids and a rescue inhaler. He has not undergone any surgeries. Currently, he is taking fluticasone/salmeterol inhaler twice daily and albuterol a...
```

**Document 7**  
- **Relevance Score:** `0.5952`  
- **Patient Num:** `1000000005`  
- **Encounter Num:** `481497`  
- **Start Date:** `09/02/2008`  
- **Document ID:** `ec8441dc-db63-4d89-a41c-6751e0223089`  
- **Excerpt:**

```text
Visit Information:    - Patient ID: 1000000005    - Encounter ID: 481497    - Visit Date: 09/02/2008  This is a 35-year-old Hispanic female who has been a patient at our clinic for over a year, presenting today for follow-up on her chronic asthma and routine health examination. The patient speaks English well. She reports her asthma symptoms have worsened over the past two weeks, including increased shortness of breath, constant wheezing, and persistent cough. Despite regular use of an albuterol inhaler, daily fluticasone (an inhaled corticosteroid), and nightly montelukast, her symptom control remains poor with only short-term relief from albuterol.  She has a history of episodic asthma without status asthmaticus, sarcoidosis, pain in limbs, and disturbance of skin sensation. There is no history of smoking or alcohol use. She lives in an urban area with high allergen exposure and works as a primary school teacher, which might contribute to her symptoms.  During the physical examinatio...
```

**Document 8**  
- **Relevance Score:** `0.5939`  
- **Patient Num:** `1000000005`  
- **Encounter Num:** `479401`  
- **Start Date:** `12/29/2006`  
- **Document ID:** `40f1f276-7805-44b3-8138-439d08191300`  
- **Excerpt:**

```text
 **Visit Information:** - Patient ID: 1000000005 - Encounter ID: 479401 - Visit Date: Dec 29, 2006  **Subjective:**  This is a 33-year-old Hispanic female who has been receiving care at our clinic over several visits. She speaks English and presented today for a follow-up visit to manage her asthma. She reports that her asthma symptoms, particularly shortness of breath, wheezing, and a persistent cough, have been worsening over the past two weeks despite adhering to her medication regimen which includes an albuterol inhaler as needed, a daily inhaled corticosteroid (fluticasone), and nightly montelukast. She notes that while the albuterol offers brief symptom relief, it has been insufficient to manage her exacerbations effectively.  Her medical history includes chronic asthma without status asthmaticus, non-specific epigastric and upper left quadrant abdominal pain, irritable bowel syndrome, painful respiration, and periodic vaginal infections. She is a non-smoker and non-drinker, livi...
```

**Document 9**  
- **Relevance Score:** `0.5904`  
- **Patient Num:** `1000000058`  
- **Encounter Num:** `471856`  
- **Start Date:** `09/08/1998`  
- **Document ID:** `accd3c7a-a206-48e1-848c-873e921c716b`  
- **Excerpt:**

```text
**Visit Information:** - Patient ID: 1000000058 - Encounter ID: 471856 - Visit Date: 09/08/1998  This is a 7-year-old Hispanic male who has been receiving care at our clinic for asthma and related conditions since the age of three. He returns today for a follow-up appointment accompanied by his mother, whose primary language is Spanish. They are here to discuss the management of his asthma and associated respiratory issues.  **Subjective:** The patient’s mother reports an increased frequency of coughing and wheezing over the past month, particularly disrupting sleep at night and occurring after physical activity. There have been no recent episodes of fever or respiratory infections. The patient has been using his albuterol inhaler approximately three times per week.  The patient’s past medical history is significant for asthma and seasonal allergies, with several years of documentation. His surgical history includes a unilateral repair of an indirect inguinal hernia and excision of a h...
```

**Document 10**  
- **Relevance Score:** `0.5899`  
- **Patient Num:** `1000000024`  
- **Encounter Num:** `482121`  
- **Start Date:** `01/02/2009`  
- **Document ID:** `e72c835f-d6e4-4974-befb-546939584d85`  
- **Excerpt:**

```text
**Visit Information:** - **Patient ID:** 1000000024 - **Encounter ID:** 482121 - **Visit Date:** January 2, 2009  **Subjective:** This is a 30-year-old Black male who has been receiving care at our clinic for approximately 3 months. He speaks English and presents today for a follow-up of his chronic asthma, which is the primary reason for his visit. He mentions intermittent symptoms of wheezing and shortness of breath, particularly at night and when engaging in physical activity. His asthma symptoms typically respond to his prescribed inhaled corticosteroid (fluticasone) and rescue inhaler (albuterol). However, he experiences symptomatic episodes roughly twice a week. The patient denies any recent respiratory infections or emergency department visits related to asthma exacerbation.   Additionally, he reports a persistent rash that first appeared two weeks ago. He does not attribute this to any known allergens or recent changes in his environment. He notes some mild itching but no signi...
```

In [69]:
# -----------------------------------------------------------
# 4.2. Using a Retriever with a Score Threshold
# -----------------------------------------------------------
# This cell demonstrates how to configure a retriever that returns only documents
# whose similarity scores exceed a defined threshold.

# Key Parameters:
# - search_type="similarity_score_threshold":
#     Instructs the retriever to filter results by a minimum score.
# - search_kwargs={"k": 10, "score_threshold": 0.5}:
#     - k: Number of top-ranked documents to consider.
#     - score_threshold: Minimum relevance score required for inclusion.

# Purpose:
# This approach increases precision by filtering out low-quality matches.
# It is especially useful in clinical settings where retrieval accuracy is essential.

from IPython.display import display, Markdown

retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={
        "k": 10,
        "score_threshold": 0.6
    }
)

results = retriever.invoke(query)

display(Markdown("### Retrieved Clinical Notes (Score ≥ 0.6)"))

for idx, doc in enumerate(results, 1):
    patient = doc.metadata.get("patient_num", "N/A")
    date = doc.metadata.get("start_date", "N/A")
    doc_id = doc.id
    excerpt = doc.page_content[:1000].replace("\n", " ")

    display(Markdown(
        f"**Document {idx}**  \n"
        f"- **Patient Num:** `{patient}`  \n"
        f"- **Start Date:** `{date}`  \n"
        f"- **Document ID:** `{doc_id}`  \n"
        f"- **Excerpt:**\n\n```text\n{excerpt}...\n```"
    ))

display(Markdown(f"**Total relevant results:** {len(results)}"))


### Retrieved Clinical Notes (Score ≥ 0.6)

**Document 1**  
- **Patient Num:** `1000000005`  
- **Start Date:** `06/21/2005`  
- **Document ID:** `7b43b174-5d83-480d-a166-d5510a9f57a1`  
- **Excerpt:**

```text
**Visit Information:** - Patient ID: 1000000005 - Encounter ID: 477663 - Visit Date: June 21, 2005  **Subjective:**  This is a 32-year-old Hispanic female who has been receiving care at our clinic for approximately 3 months. She speaks English and presents today for a follow-up visit scheduled as part of her ongoing asthma management. The patient reports continued asthma symptoms, including persistent shortness of breath, wheezing particularly at night, and a cough that disrupts her sleep. Over the past two weeks, these symptoms have intensified despite regular usage of her asthma medications: albuterol inhaler for rescue, daily fluticasone inhaler, and nightly montelukast. She states the albuterol provides only brief relief.  She has a medical history of unspecified asthma without mention of status asthmaticus, back sprain from unspecified causes, vaginitis, and a previous high-risk pregnancy requiring special investigations. Her social history includes living in an urban environment ...
```

**Document 2**  
- **Patient Num:** `1000000011`  
- **Start Date:** `11/20/2003`  
- **Document ID:** `82e011c9-1203-40c0-95de-d1c3757d44b4`  
- **Excerpt:**

```text
**Visit Information:** - Patient ID: 1000000011 - Encounter ID: 476139 - Visit Date: Nov 20, 2003  **Subjective:** This is a 54-year-old Caucasian female who speaks English and has been receiving care at our clinic. She presents today for a follow-up regarding her recurrent asthma and associated symptoms. The patient is experiencing persistent shortness of breath, wheezing, and a nocturnal cough that interrupts her sleep. These symptoms have been particularly troublesome over the last month despite adherence to her current medication regimen. She denies smoking and has reported no recent exposure to known allergens or new environmental triggers.  Her past medical history is significant for recurrent asthma, an acute myocardial infarction, hypertension, hypercholesterolemia, lumbar disc displacement leading to chronic lumbago, cervical dysplasia, and a panic disorder. Her medications include fluticasone and an albuterol inhaler for asthma, along with antihypertensive and lipid-lowering ...
```

**Document 3**  
- **Patient Num:** `1000000123`  
- **Start Date:** `11/27/2002`  
- **Document ID:** `699bf4d2-7faa-4c3d-bf47-53e1bd00e4e7`  
- **Excerpt:**

```text
**Visit Information:** - Patient ID: 1000000123 - Encounter ID: 475208 - Visit Date: November 27, 2002  **Subjective:** This is a 19-year-old Indian male who has been under our care for asthma management. The patient speaks German and came to the clinic on November 27, 2002, for a follow-up visit. Since his last appointment in March 2001, he has experienced an uptick in both the frequency and severity of his asthma attacks. Recently, his nocturnal asthma episodes have become more disruptive, leading to sleep disturbances and affecting his daily function and academic performance. The patient reports consistent symptoms of wheezing, shortness of breath, and chest tightness that are exacerbated by physical activities and cold weather. He has been reliant on his rescue inhaler, using it three to four times a day with only partial relief.  His medical history includes chronic asthma, which has persisted since childhood. He is currently managed with a fluticasone/salmeterol inhaler taken twi...
```

**Document 4**  
- **Patient Num:** `1000000005`  
- **Start Date:** `07/02/2003`  
- **Document ID:** `8b168d1f-5e49-436b-b6ba-c208abc4c080`  
- **Excerpt:**

```text
**Visit Information:** - Patient ID: 1000000005 - Encounter ID: 475726 - Visit Date: July 2, 2003  **Subjective:**  This is a 30-year-old Hispanic female who is an English speaker, followed at our clinic for asthma management. Today, she presents for a routine follow-up visit. The patient reports persistent symptoms of asthma, including increased shortness of breath, wheezing, and a nocturnal cough that disrupts her sleep. She mentions that these symptoms have worsened over the past two weeks despite regular use of her asthma medications. She is diligent with her albuterol inhaler (used as needed), fluticasone inhaler (daily), and montelukast (nightly), yet finds that the albuterol offers only temporary relief.  Her medical history includes asthma with multiple past exacerbations usually triggered by allergens or respiratory infections. There are no past surgeries of note. Socially, she resides in an urban environment with high allergen exposure and works as a primary school teacher. S...
```

**Total relevant results:** 4

In [112]:
# -----------------------------------------------------------
# 4.3. Performing Maximal Marginal Relevance (MMR) Search
# -----------------------------------------------------------
# This cell retrieves clinical notes using Maximal Marginal Relevance (MMR),
# which balances relevance to the query and diversity across the results.

# Key Parameters:
# - max_marginal_relevance_search(): Retrieves results using MMR.
# - fetch_k=100: Number of top documents considered before applying MMR.
# - k=10: Final number of documents returned.
# - lambda_mult:
#     - 0.0 → maximize diversity
#     - 1.0 → maximize relevance
#     - 0.5 → balance between the two

# Purpose:
# MMR reduces redundancy while maintaining relevance, useful when diverse perspectives
# on a clinical topic (e.g., treatment variations) are desired.

from IPython.display import display, Markdown

results = vectorstore.max_marginal_relevance_search(
    query=query,
    k=5,
    fetch_k=500,
    lambda_mult=0.5
)

display(Markdown("### Retrieved Clinical Notes Using MMR Search"))

for idx, doc in enumerate(results, 1):
    patient = doc.metadata.get("patient_num", "N/A")
    date = doc.metadata.get("start_date", "N/A")
    doc_id = getattr(doc, "id", "N/A")
    excerpt = doc.page_content[:1000].replace("\n", " ")

    display(Markdown(
        f"**Document {idx}**  \n"
        f"- **Patient Num:** `{patient}`  \n"
        f"- **Start Date:** `{date}`  \n"
        f"- **Document ID:** `{doc_id}`  \n"
        f"- **Excerpt:**\n\n```text\n{excerpt}...\n```"
    ))

display(Markdown(f"**Total results returned:** `{len(results)}`"))


### Retrieved Clinical Notes Using MMR Search

**Document 1**  
- **Patient Num:** `1000000005`  
- **Start Date:** `06/21/2005`  
- **Document ID:** `7b43b174-5d83-480d-a166-d5510a9f57a1`  
- **Excerpt:**

```text
**Visit Information:** - Patient ID: 1000000005 - Encounter ID: 477663 - Visit Date: June 21, 2005  **Subjective:**  This is a 32-year-old Hispanic female who has been receiving care at our clinic for approximately 3 months. She speaks English and presents today for a follow-up visit scheduled as part of her ongoing asthma management. The patient reports continued asthma symptoms, including persistent shortness of breath, wheezing particularly at night, and a cough that disrupts her sleep. Over the past two weeks, these symptoms have intensified despite regular usage of her asthma medications: albuterol inhaler for rescue, daily fluticasone inhaler, and nightly montelukast. She states the albuterol provides only brief relief.  She has a medical history of unspecified asthma without mention of status asthmaticus, back sprain from unspecified causes, vaginitis, and a previous high-risk pregnancy requiring special investigations. Her social history includes living in an urban environment ...
```

**Document 2**  
- **Patient Num:** `1000000088`  
- **Start Date:** `10/28/2004`  
- **Document ID:** `8fbe0ce5-5dc8-4129-b31d-7398336579c3`  
- **Excerpt:**

```text
Visit Information: - Patient ID: 1000000088 - Encounter ID: 477031 - Visit Date: Oct 28 2004  This is a 9-year-old Asian male, a German speaker, who has been receiving care at our clinic for the past two years. He presents today for a follow-up visit primarily to address complications associated with his asthma.  Subjective: The young patient and his mother report episodes of increased wheezing and shortness of breath occurring predominantly at night. The mother notes that the use of the albuterol inhaler has significantly increased, and he experiences more frequent coughing and chest tightness, particularly with physical activity or exposure to allergens like pollen and dust. He has an established history of asthma and allergic rhinitis. Moreover, his medical history is significant for cystic fibrosis, postinflammatory pulmonary fibrosis, and concerns regarding his growth and nutritional status. Current medications include a daily inhaled corticosteroid for asthma, albuterol for acute...
```

**Document 3**  
- **Patient Num:** `1000000105`  
- **Start Date:** `02/04/2009`  
- **Document ID:** `a47ad10f-10dc-4949-9973-b3c84e34a2e0`  
- **Excerpt:**

```text
This is a 65-year-old Hispanic male who has been receiving care at our clinic for the past few months. He speaks German and presented today, February 4, 2009, for a follow-up concerning his asthma and overall health management. The patient has a primary diagnosis of unspecified asthma, experiencing increased asthma attacks over the past two weeks, particularly in the early mornings and late evenings. These episodes are characterized by significant shortness of breath and wheezing. The patient has been using his rescue inhaler, Albuterol, 5-6 times per day but denies any nocturnal symptoms affecting his sleep.  The patient has a comprehensive medical history, which includes a chronic obstructive form of asthma, a history of malignant neoplasm of the prostate, previously treated with radical prostatectomy, and hypertension. Other notable conditions include a history of tobacco use disorder and prior preoperative cardiovascular examinations. He also underwent a simple excision of a lympha...
```

**Document 4**  
- **Patient Num:** `1000000112`  
- **Start Date:** `06/10/2005`  
- **Document ID:** `49873e65-4065-4844-9e4f-c4e96e74c72c`  
- **Excerpt:**

```text
Visit Information: - Patient ID: 1000000112 - Encounter ID: 477645 - Visit Date: 06/10/2005  This is a 12-year-old Black male who has been receiving care at our clinic for approximately three years. He is Spanish-speaking and presented today, June 10, 2005, for a follow-up visit concerning his asthma management.  Subjective: The primary concerns today remain focused on the patient’s chronic asthma. Over the past week, he has been experiencing increased symptoms, including wheezing, dyspnea, particularly during physical activity, and nocturnal cough, which have necessitated an increased use of his Atrovent inhaler. The patient denies any associated fever or chest pain. Despite adherence to his current medication regimen, these symptoms have persisted, indicating suboptimal asthma control.  The patient’s medical history is significant for multiple diagnoses of asthma, with subtypes including acute exacerbations and status asthmaticus, as well as past pneumonia and pulmonary collapse. The...
```

**Document 5**  
- **Patient Num:** `1000000110`  
- **Start Date:** `11/13/2008`  
- **Document ID:** `196978cf-15ed-448b-a696-0d780eb222ad`  
- **Excerpt:**

```text
Visit Information: - Patient ID: 1000000110 - Encounter ID: 481878 - Visit Date: 11/13/2008  Subjective: This is a 36-year-old Hispanic male, fluent in German, who is presenting for a follow-up visit. He has been previously diagnosed with multiple chronic conditions, primarily asthma, which he manages with inhaled corticosteroids. He reports that his asthma symptoms, including intermittent wheezing and shortness of breath, are predominantly triggered by exposure to pollen and dust. There have been no recent severe exacerbations requiring hospitalization.   In addition to asthma, the patient is monitoring several coexisting conditions. He has a history of both rheumatoid arthritis and polymyositis, which contribute to generalized fatigue and periodic limb pain, respectively. These symptoms have been relatively stable with his current medication regimen including Prednisone. He also reports occasional tingling in his extremities secondary to mononeuritis, but no recent exacerbations.  Fo...
```

**Total results returned:** `5`

## 5. Generating Structured Responses with an LLM

In this section, we take the clinical notes retrieved in the previous step and pass them into a Large Language Model (LLM) for analysis and summarization. This completes the RAG (Retrieval-Augmented Generation) workflow.

### Key Steps:

1. **Creating a Prompt Template for LLM Querying (Step 5.1)**
   - Defines a reusable prompt structure to guide the LLM in analyzing and summarizing clinical notes.
   - Ensures the output is consistent, structured, and clinically useful.

2. **Invoking AzureChatOpenAI with Retrieved Context (Step 5.2)**
   - Inserts the retrieved documents into the prompt.
   - Sends the final prompt to an LLM (e.g., Azure-hosted GPT-4 or GPT-4o) for generation.
   - Outputs a structured answer to a medical query.

### Purpose

This final step showcases how LLMs can generate rich, relevant summaries or extractions from retrieved clinical data. It is particularly useful for clinical decision support, patient summarization, or intelligent search.

<img src="./images/rag_generation.png" alt="RAG Generation" width="1250">


In [123]:
# -----------------------------------------------------------
# 5.1. Create a Prompt Template for LLM Querying
# -----------------------------------------------------------
# This prompt template guides the LLM to generate structured, clinically relevant responses
# from retrieved clinical notes. The template is dynamic and reusable.

# Context:
# - Each clinical note is associated with metadata (patient_num, encounter_num, start_date).
# - These identifiers help structure the output and ensure traceability.

# Key Components:
# - PromptTemplate.from_template(): Allows dynamic substitution of note content and query.
# - {retrieved_docs}: Injects top-matching clinical notes as the context for the model.
# - {query}: Represents the user's clinical question.
# - Output format:
#     - Patient Num, Gender, Age, Race
#     - Visit Date
#     - Summary of findings related to the query

# Purpose:
# This prompt ensures that the LLM provides:
# - Patient-specific insights
# - Structured and traceable outputs
# - A clear response aligned with the clinical question
# - Only the most recent note per patient is considered

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "You are a medical assistant analyzing clinical notes. \n\n"

    "Answer the following question: {query}\n\n"

    "Based on the following records: {retrieved_docs}\n\n"

    "Provide your response using the following structure:\n"
    "- Patient Num: <patient_num>, Gender: <value>, Age: <value>, Race: <value>\n"
    "- Encounter: <encounter_num>, Visit Date: <start_date>\n"
    "- Summary: One paragraph summarizing the note and one paragraph answering the question.\n\n"
    "- Has Asthma: <Yes/No>"
    "Instructions:\n"
    "- Include all patients relevant to the query.\n"
    "- Use only the most recent note for each patient (identified by patient_num).\n"
)


In [124]:
# -----------------------------------------------------------
# 7. Use Retrieved Context to Invoke LLM and Generate Response
# -----------------------------------------------------------
# This completes the RAG workflow by passing the retrieved clinical notes
# into the prompt template and invoking a local LLM to answer the user query.

# Key Components:
#   - prompt_template.format(...): Populates the prompt with notes and query.
#   - model.invoke(final_prompt): Sends the query to the LLM.
#   - print(response.content): Displays the result.

from langchain_ollama import ChatOllama

# Define the model name (assumed to be already pulled)

model = ChatOllama(model="qwen2")

# Format the final prompt using the template
final_prompt = prompt_template.format(retrieved_docs=results, query=query)

# Invoke the model (ChatOllama, AzureChatOpenAI, etc.)
response = model.invoke(final_prompt)

# Display the structured, AI-generated response
print("LLM-Generated Response:\n")
display(Markdown(response.content))


LLM-Generated Response:



Patient Num: 1000000112, Gender: Black Male, Age: 12 years, Race: Black

Encounter: 477645, Visit Date: 06/10/2005

Summary:
The patient is a 12-year-old Black male who has been receiving care at the clinic for approximately three years. At his recent follow-up visit on June 10, 2005, he reported increased symptoms of wheezing, dyspnea during physical activity, and nocturnal cough which required an increased use of his Atrovent inhaler. Despite adherence to his current medication regimen, these symptoms had persisted, indicating suboptimal asthma control.

Plan:
Asthma management plan involves increasing the dosage of inhaled corticosteroids in combination with his inhaler therapy. A comprehensive asthma action plan will be created and a follow-up appointment scheduled for two weeks later to reassess symptom improvement and medication adherence. Respiration monitoring is recommended, especially for signs of respiratory infections, and preventive health measures including vaccinations are emphasized.

Patient Num: 1000000110, Gender: Hispanic Male, Age: 36 years, Race: Hispanic

Encounter: 481878, Visit Date: 11/13/2008

Summary:
A 36-year-old Hispanic male presented for a follow-up visit. He manages asthma with inhaled corticosteroids and has been dealing with coexisting conditions such as rheumatoid arthritis, polymyositis, mononeuritis, sinusitis, anxiety, backache, and more. All were well-controlled under the current medication regimen.

Plan:
Continued management includes maintaining the prescribed asthma medication regimen while adding Augmentin for acute sinusitis that persists beyond 1-2 weeks. Non-pharmacological strategies are recommended for anxiety and backache, including Atarax on an as-needed basis and Ibuprofen as needed. There is also a plan to monitor mononeuritis symptoms, potentially requiring further neurologic evaluation if they persist or worsen.

Patient Num: 1000000112 (repeated), Gender: Black Male, Age: 12 years, Race: Black

Encounter: 477645, Visit Date: 06/10/2005

Summary:
A summary is repeated from the previous note as it covers a relevant patient under consideration.

Plan:
As previously stated, this plan involves increasing the dosage of inhaled corticosteroids in combination with his inhaler therapy for asthma management. Respiration monitoring and preventive health measures including vaccinations are emphasized.

Patient Num: 1000000110 (repeated), Gender: Hispanic Male, Age: 36 years, Race: Hispanic

Encounter: 481878, Visit Date: 11/13/2008

Summary:
A summary is repeated from the previous note as it covers a relevant patient under consideration.

Plan:
Continued management includes maintaining the prescribed asthma medication regimen while adding Augmentin for acute sinusitis that persists beyond 1-2 weeks. Non-pharmacological strategies are recommended for anxiety and backache, including Atarax on an as-needed basis and Ibuprofen as needed. There is also a plan to monitor mononeuritis symptoms, potentially requiring further neurologic evaluation if they persist or worsen.

Patient Num: 1000000112 (repeated), Gender: Black Male, Age: 12 years, Race: Black

Encounter: 477645, Visit Date: 06/10/2005

Summary:
A summary is repeated from the previous note as it covers a relevant patient under consideration.

Plan:
As previously stated, this plan involves increasing the dosage of inhaled corticosteroids in combination with his inhaler therapy for asthma management. Respiration monitoring and preventive health measures including vaccinations are emphasized.

Patient Num: 1000000110 (repeated), Gender: Hispanic Male, Age: 36 years, Race: Hispanic

Encounter: 481878, Visit Date: 11/13/2008

Summary:
A summary is repeated from the previous note as it covers a relevant patient under consideration.

Plan:
Continued management includes maintaining the prescribed asthma medication regimen while adding Augmentin for acute sinusitis that persists beyond 1-2 weeks. Non-pharmacological strategies are recommended for anxiety and backache, including Atarax on an as-needed basis and Ibuprofen as needed. There is also a plan to monitor mononeuritis symptoms, potentially requiring further neurologic evaluation if they persist or worsen.

Regarding the instructions provided:

Yes, only the most recent note for each patient (identified by patient_num) has been included in this response. The summaries provide key details pertinent to their conditions, such as asthma management and coexisting conditions, along with specific plans tailored to their health issues.