![i2b2 Logo](images/transmart-logo.png)

# Using LLM + Embeddings to Search Patient Notes (RAG - Basics)

This notebook demonstrates how to use **local embeddings** and **retrieval-augmented generation (RAG)** to search and analyze clinical notes stored in an i2b2-like format. It shows how to decode notes, embed them using MiniLM, retrieve similar cases using FAISS, and use a local LLM (e.g., LLaMA 3 via Ollama) to generate structured, clinical responses.

### 🧠 Key Concepts Covered:

## ✅ REVIEW the  !!!
- Decoding binhex-encoded clinical notes
- Creating semantic vector embeddings with `MiniLM`
- Building a local FAISS vector store
- Performing similarity search and understanding cosine scores
- Filtering to only include the **most recent encounter per patient**
- Injecting retrieved context into a structured prompt
- Using a local LLM (Ollama) to generate medical insights

Each cell builds upon the previous one to simulate a full RAG workflow adapted for **clinical informatics** scenarios using i2b2-like data.

> This notebook is part of the workshop: _Using LLMs to Search Patient Notes_.


In [2]:
# -----------------------------------------------------------
# 1. Load and Explore Visit Data from i2b2-Mimicking CSV
# -----------------------------------------------------------
# This cell demonstrates how to load and inspect clinical notes
# stored in BinHex format from a CSV that mimics the i2b2 `visit_dimension` table.
# Each record contains:
#   - encounter_num: Unique encounter ID
#   - patient_num: Patient identifier
#   - start_date, end_date: Visit timestamps
#   - location_cd, location_path: Clinic/service details
#   - visit_blob: BinHex-encoded clinical note text

# Import required library
import pandas as pd

# Define path to the input data file
csv_path = "datafiles/i2b2_encounter_table.csv"

# Load the CSV into a pandas DataFrame
df = pd.read_csv(csv_path)

# Preview the first 10 rows to understand structure
df.head(10)


Unnamed: 0,encounter_num,patient_num,start_date,end_date,inout_cd,location_cd,location_path,visit_blob
0,475303,1000000001,01/16/2003,01/16/2003,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
1,479681,1000000001,03/29/2007,03/29/2007,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
2,480315,1000000001,09/20/2007,09/20/2007,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
3,480903,1000000001,03/04/2008,03/04/2008,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x5468697320697320612032332D796561722D6F6C6420...
4,481398,1000000001,08/11/2008,08/11/2008,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x566973697420496E666F726D6174696F6E3A20202020...
5,482655,1000000001,05/18/2009,05/18/2009,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x566973697420496E666F726D6174696F6E3A0A202020...
6,471658,1000000002,04/17/1998,04/17/1998,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
7,472076,1000000002,01/04/1999,01/04/1999,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
8,472473,1000000002,08/12/1999,08/12/1999,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
9,472692,1000000002,12/01/1999,12/01/1999,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...


In [3]:
# -----------------------------------------------------------
# 2. Decode BinHex Clinical Notes and Prepare Text Corpus
# -----------------------------------------------------------
# This cell decodes the clinical notes stored in BinHex format
# and creates a clean `note_text` column for downstream processing.

import binascii
from IPython.display import display, Markdown

def decode_note(hex_blob):
    """Decode a single BinHex-encoded note string."""
    hex_str = hex_blob.replace("0x", "")
    return binascii.unhexlify(hex_str).decode("utf-8", errors="ignore")

# Apply decoding to the entire DataFrame
df["note_text"] = df["visit_blob"].apply(decode_note)

# Preview the first 10 decoded notes
display(df.head(10))

# Show one example decoded note (adjust index as needed)
example_index = 5  # Ensure this index exists in your DataFrame
display(Markdown(f"### Decoded Note Example (Row {example_index}):\n\n```\n{df['note_text'][example_index]}\n```"))


Unnamed: 0,encounter_num,patient_num,start_date,end_date,inout_cd,location_cd,location_path,visit_blob,note_text
0,475303,1000000001,01/16/2003,01/16/2003,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...,**Visit Information:**\n - Patient ID: 10000...
1,479681,1000000001,03/29/2007,03/29/2007,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...,**Visit Information:**\n- Patient ID: 10000000...
2,480315,1000000001,09/20/2007,09/20/2007,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...,**Visit Information:**\n\n - Patient ID: 10...
3,480903,1000000001,03/04/2008,03/04/2008,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x5468697320697320612032332D796561722D6F6C6420...,This is a 23-year-old Black female who has bee...
4,481398,1000000001,08/11/2008,08/11/2008,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x566973697420496E666F726D6174696F6E3A20202020...,Visit Information: \n - Patient ID: 1000...
5,482655,1000000001,05/18/2009,05/18/2009,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x566973697420496E666F726D6174696F6E3A0A202020...,Visit Information:\n - Patient ID: 10000000...
6,471658,1000000002,04/17/1998,04/17/1998,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...,**Visit Information:**\n- Patient ID: 10000000...
7,472076,1000000002,01/04/1999,01/04/1999,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...,**Visit Information:**\n- Patient ID: 10000000...
8,472473,1000000002,08/12/1999,08/12/1999,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...,**Visit Information:**\n- Patient ID: 10000000...
9,472692,1000000002,12/01/1999,12/01/1999,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...,**Visit Information:**\n- Patient ID: 10000000...


### Decoded Note Example (Row 5):

```
Visit Information:
    - Patient ID: 1000000001
    - Encounter ID: 482655
    - Visit Date: 05/18/2009

A 24-year-old Black female, who has been receiving care at our clinic for the past several months, presented on 05/18/2009 for a follow-up regarding her asthma management. She reports experiencing increased shortness of breath, persistent coughing, and wheezing, which have become particularly troublesome at night and during physical exertion. These symptoms align with her diagnosed cough variant asthma. Additionally, she mentions experiencing dizziness and occasional irregular menstrual cycles. There are no reports of chest pain or hemoptysis.

Her past medical history is significant for asthma, cardiac dysrhythmia, gastroesophageal reflux disease (GERD), and unspecified vaginitis. She completed a course of Penicillin for a previously diagnosed urinary tract infection but continues to experience urinary frequency. She has no history of smoking or alcohol consumption. Currently, she is on Albuterol Sulfate tablets, Maxair Autohaler, and Prilosec capsules for asthma and GERD, as well as Apri for contraception, Acyclovir for viral management, and Tetracycline for an unspecified reason. She resides with her sister and works part-time in retail, dealing with considerable stress related to her job and studies.

On physical examination, she appeared alert and oriented, with vital signs recorded as follows: blood pressure 110/72 mmHg, heart rate 78 beats per minute, respiratory rate 18 breaths per minute, and temperature 99.0°F. The respiratory exam uncovered bilateral expiratory wheezes with an extended expiratory phase, though there were no signs of accessory muscle use or cyanosis. Cardiovascular exam was unremarkable, demonstrating a regular heart rate and rhythm without murmurs. Abdominal examination showed no tenderness or masses. The musculoskeletal exam noted mild lower back tenderness without significant deformity or neurological deficits. Her skin remained clear of any new lesions, indicating controlled dermatitis. Recent laboratory tests, including complete blood count and urinalysis, returned within standard ranges.

The patient presents with an exacerbation of asthma that is not sufficiently managed by her current treatment regimen. The recurrent urinary symptoms suggest a possible relapse of her urinary tract infection. Mild musculoskeletal back pain was also noted, likely strain-related given her occupation and stress levels.

To improve asthma management, an escalation in the Flovent dosage is recommended alongside the addition of a leukotriene receptor antagonist. Emphasis on inhaler techniques and rigorous adherence to her medication regimen is highlighted. A repeat urinalysis is ordered to investigate the persistent urinary symptoms, and pending results, a culture and sensitivity test may be necessary. For her back pain, a referral to physical therapy for targeted strengthening and stretching exercises is suggested, supplemented with NSAIDs for pain control.

The patient is advised to contact the clinic urgently if her breathing difficulties intensify or do not improve within a week, or if her urinary symptoms worsen. Educational resources on medication usage, stress management strategies, and emergency asthma protocols were provided. A follow-up appointment is set for one month to reassess her asthma control and monitor any new developments in her condition. Immediate medical attention is recommended should she experience severe respiratory distress or notable aggravation of urinary symptoms.
```

In [7]:
# -----------------------------------------------------------
# 3. Embed Clinical Notes Using Local MiniLM Embeddings
# -----------------------------------------------------------
# This cell embeds each clinical note into a numerical vector using a
# transformer-based model from Hugging Face. These embeddings are stored
# in a FAISS index for efficient similarity search.

# The model used here is `sentence-transformers/all-MiniLM-L6-v2`:
# - Lightweight and fast (can run locally)
# - Trained for semantic similarity tasks (e.g., text matching)
# - Produces 384-dimensional vectors

# HuggingFaceEmbeddings loads this model via the `sentence-transformers` library.
# If not already installed, run:
#   pip install sentence-transformers

from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# Initialize the local embedding model
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Extract plain text notes and relevant metadata
documents = df["note_text"].tolist()
metadata = df[["patient_num", "encounter_num", "start_date"]].to_dict(orient="records")

# Create a FAISS index from the documents
vectorstore = FAISS.from_texts(documents, embedding_model, metadatas=metadata)

print(f"Successfully embedded {len(documents)} clinical notes using MiniLM.")


Successfully embedded 1128 clinical notes using MiniLM.


In [8]:
# -----------------------------------------------------------
# 3a. View a Specific Embedded Document, Metadata, and Vector
# -----------------------------------------------------------
# Select an index (e.g., id = 5) to inspect the stored document.
# This cell shows the document text, associated metadata, and
# the corresponding FAISS embedding vector.

id = 1  # You can change this index to view a different record

# Get (doc_id, Document) tuple from LangChain's docstore
doc_id, doc_example = list(vectorstore.docstore._dict.items())[id]

# Get corresponding vector from FAISS
vector_example = vectorstore.index.reconstruct(id)

# Display the results
print(f"\n-----| Document ID: {doc_id} |-----")
print("---> Metadata:", doc_example.metadata)
print("---> Document Text (First 300 chars):\n\n", doc_example.page_content[:300], "... \n\n" + "-" * 100)
print("\n---> Embedded Vector (Full Length):\n\n", vector_example, "\n" + "-" * 100)



-----| Document ID: d216d080-bb0a-4b4b-91d3-002f0ecb09f2 |-----
---> Metadata: {'patient_num': 1000000001, 'encounter_num': 479681, 'start_date': '03/29/2007'}
---> Document Text (First 300 chars):

 **Visit Information:**
- Patient ID: 1000000001
- Encounter ID: 479681
- Visit Date: 03/29/2007

**Subjective:**
This is a 22-year-old Black female who presents for a follow-up visit regarding her asthma management and other health concerns. She has a history of asthma, which has recently been exace ... 

----------------------------------------------------------------------------------------------------

---> Embedded Vector (Full Length):

 [-1.71678141e-02  2.57554296e-02 -3.02366652e-02  5.90960197e-02
 -5.69125786e-02 -4.99190413e-04  8.61884840e-03  8.68016854e-02
 -6.81851804e-02 -6.68402463e-02  3.30676250e-02 -3.55875865e-02
 -1.23848217e-02  7.65940845e-02 -3.75763103e-02  9.54439640e-02
  6.90067634e-02 -2.38603388e-05 -3.86461765e-02  4.25991453e-02
 -1.85658671e-02  5.6140605

In [9]:
# -----------------------------------------------------------
# 4. Embed a Query and Inspect Its Vector Representation
# -----------------------------------------------------------
# This step uses the same local MiniLM embedding model to convert a
# natural language query into a numerical vector. The vector can be
# used for similarity search within the FAISS vector store.
# We'll also inspect the structure of the resulting embedding.

query = "Who has asthma and is taking Fluticasone and Albuterol?"

# Generate the embedding for the query
query_vector = embedding_model.embed_query(query)

# Display the vector and its shape
print("--->Vectorized Query:\n\n", query_vector)


--->Vectorized Query:

 [0.03887706995010376, -0.042409393936395645, -0.05146171152591705, 0.04121880233287811, -0.025769369676709175, -0.04591357707977295, -0.0103689543902874, 0.08145193755626678, -0.057403627783060074, -0.024037929251790047, -0.03372789919376373, -0.017538271844387054, 0.011136984452605247, 0.026207493618130684, 0.05617568641901016, 0.10241179913282394, -0.010751360096037388, -0.080230213701725, -0.035150207579135895, 0.03298896923661232, -0.054257433861494064, 0.03680410236120224, -0.020248012617230415, -0.014638238586485386, -0.02407461777329445, 0.003959783352911472, -0.06827481091022491, -0.06968778371810913, -0.012535175308585167, -0.02130780555307865, 0.04345591366291046, 0.012301910668611526, 0.034768421202898026, 0.016457173973321915, -0.048210542649030685, -0.06472979485988617, -0.03769080340862274, 0.041117001324892044, -0.019274244084954262, 0.0037662305403500795, -0.021108878776431084, 0.04656617343425751, -0.008450526744127274, -0.049669861793518066, -0

In [11]:
# -----------------------------------------------------------
# 5a. Similarity Search (Top K Results - No Filtering)
# -----------------------------------------------------------
# Demonstrates how to retrieve clinical notes along with their similarity scores,
# allowing for more precise filtering and ranking of results.

# Key Components:
#   - similarity_search_with_score(query, k=10):
#     Retrieves the top k (10 in this case) most similar documents along with their similarity scores.
#   - vectorstore.similarity_search_with_score():
#     Searches across all embedded documents and returns full-note matches.
#   - print(results): Displays each matched document along with its similarity score.

# Score Interpretation:
#   - 0.9 – 1.0: Highly relevant match
#   - 0.7 – 0.9: Strong relevance
#   - 0.5 – 0.7: Moderate relevance
#   - 0.3 – 0.5: Low relevance
#   - 0.0 – 0.3: Minimal or no relevance

# Purpose:
# This method provides greater transparency in retrieval by returning similarity scores,
# enabling fine-tuned filtering to ensure only highly relevant clinical notes are used for AI analysis.

# -----------------------------------------------------------

# Number of top results to retrieve
top_k = 5

# Perform similarity search using embedded query
results = vectorstore.similarity_search_with_score(query, k=top_k)

print(f"Top {top_k} most similar clinical notes to the query:\n")

# Display the retrieved documents and similarity scores
for i, (doc, score) in enumerate(results):
    print(f"------------ Result {i+1} ------------")
    print(f"|--> Similarity Score: {score:.4f} <--|")

    # Show a preview of the matched note
    from IPython.display import Markdown, display
    display(Markdown(f"Note Preview:\n\n{doc.page_content[:1200]}"))


Top 5 most similar clinical notes to the query:

------------ Result 1 ------------
|--> Similarity Score: 0.5248 <--|


Note Preview:

**Visit Information:**
- Patient ID: 1000000005
- Encounter ID: 477663
- Visit Date: June 21, 2005

**Subjective:**

This is a 32-year-old Hispanic female who has been receiving care at our clinic for approximately 3 months. She speaks English and presents today for a follow-up visit scheduled as part of her ongoing asthma management. The patient reports continued asthma symptoms, including persistent shortness of breath, wheezing particularly at night, and a cough that disrupts her sleep. Over the past two weeks, these symptoms have intensified despite regular usage of her asthma medications: albuterol inhaler for rescue, daily fluticasone inhaler, and nightly montelukast. She states the albuterol provides only brief relief.

She has a medical history of unspecified asthma without mention of status asthmaticus, back sprain from unspecified causes, vaginitis, and a previous high-risk pregnancy requiring special investigations. Her social history includes living in an urban environment known for high allergen exposure, working as a primary school teacher, and maintaining a non-smoking and non-drinking lifestyle. She has no known allergies.

**Objective:**

On examination, the patien

------------ Result 2 ------------
|--> Similarity Score: 0.5455 <--|


Note Preview:

**Visit Information:**
- Patient ID: 1000000011
- Encounter ID: 476139
- Visit Date: Nov 20, 2003

**Subjective:**
This is a 54-year-old Caucasian female who speaks English and has been receiving care at our clinic. She presents today for a follow-up regarding her recurrent asthma and associated symptoms. The patient is experiencing persistent shortness of breath, wheezing, and a nocturnal cough that interrupts her sleep. These symptoms have been particularly troublesome over the last month despite adherence to her current medication regimen. She denies smoking and has reported no recent exposure to known allergens or new environmental triggers.

Her past medical history is significant for recurrent asthma, an acute myocardial infarction, hypertension, hypercholesterolemia, lumbar disc displacement leading to chronic lumbago, cervical dysplasia, and a panic disorder. Her medications include fluticasone and an albuterol inhaler for asthma, along with antihypertensive and lipid-lowering agents. She resides independently and leads an active lifestyle with a balanced diet and routine physical activity, adjusted as needed for her medical conditions.

**Objective:**
On examination, the p

------------ Result 3 ------------
|--> Similarity Score: 0.5618 <--|


Note Preview:

**Visit Information:**
- Patient ID: 1000000123
- Encounter ID: 475208
- Visit Date: November 27, 2002

**Subjective:**
This is a 19-year-old Indian male who has been under our care for asthma management. The patient speaks German and came to the clinic on November 27, 2002, for a follow-up visit. Since his last appointment in March 2001, he has experienced an uptick in both the frequency and severity of his asthma attacks. Recently, his nocturnal asthma episodes have become more disruptive, leading to sleep disturbances and affecting his daily function and academic performance. The patient reports consistent symptoms of wheezing, shortness of breath, and chest tightness that are exacerbated by physical activities and cold weather. He has been reliant on his rescue inhaler, using it three to four times a day with only partial relief.

His medical history includes chronic asthma, which has persisted since childhood. He is currently managed with a fluticasone/salmeterol inhaler taken twice daily and albuterol on an as-needed basis. He does not use any herbal supplements. As a high school student, he lives with his parents and younger siblings. He maintains an active lifestyle but has

------------ Result 4 ------------
|--> Similarity Score: 0.5620 <--|


Note Preview:

**Visit Information:**
- Patient ID: 1000000005
- Encounter ID: 475726
- Visit Date: July 2, 2003

**Subjective:**

This is a 30-year-old Hispanic female who is an English speaker, followed at our clinic for asthma management. Today, she presents for a routine follow-up visit. The patient reports persistent symptoms of asthma, including increased shortness of breath, wheezing, and a nocturnal cough that disrupts her sleep. She mentions that these symptoms have worsened over the past two weeks despite regular use of her asthma medications. She is diligent with her albuterol inhaler (used as needed), fluticasone inhaler (daily), and montelukast (nightly), yet finds that the albuterol offers only temporary relief.

Her medical history includes asthma with multiple past exacerbations usually triggered by allergens or respiratory infections. There are no past surgeries of note. Socially, she resides in an urban environment with high allergen exposure and works as a primary school teacher. She asserts a non-smoking and non-drinking lifestyle. No known allergies have been reported.

**Objective:**

Upon examination, the patient is in mild respiratory distress. Vital signs include a blood 

------------ Result 5 ------------
|--> Similarity Score: 0.5715 <--|


Note Preview:

## SOAP Note

**Visit Information:**
- Patient ID: 1000000011
- Encounter ID: 476451
- Visit Date: March 16, 2004

**Subjective:**
This is a 55-year-old Caucasian female who speaks English and has been receiving care at our clinic for the past several months. She presents today for a follow-up visit primarily concerning her asthma, which has been problematic over the last month. The patient reports persistent shortness of breath, wheezing, and a nocturnal cough, hindering her sleep quality despite adherence to her prescribed medication regimen. She denies recent smoking, exposure to known allergens, or new environmental triggers.

Her past medical history includes recurrent asthma, a previous myocardial infarction, hypertension, hypercholesterolemia, lumbar disc displacement with chronic lumbago, cervical dysplasia, and a panic disorder. She is currently taking fluticasone and an albuterol inhaler for asthma management, along with antihypertensive and lipid-lowering medications. The patient lives independently and maintains an active lifestyle, adjusting her routine as necessary due to her health conditions. She follows a balanced diet and engages in regular physical activity. She 

In [12]:
# -----------------------------------------------------------
# 5b. Filter Search Results by Similarity Score Threshold
# -----------------------------------------------------------
# Filters the top-K search results to retain only the most relevant clinical notes,
# based on a minimum cosine similarity score.

# Score Threshold Logic:
#   - We retain only documents where the similarity score is above the threshold.
#   - Higher scores indicate greater similarity.

# -----------------------------------------------------------

threshold = 0.55  # Only include matches with score ≥ 0.7 (strong similarity)

# Filter results
filtered_results = [(doc, score) for doc, score in results if score >= threshold]

print(f"{len(filtered_results)} out of {top_k} notes passed the similarity threshold (≥ {threshold}):\n")

# Display filtered matches
for i, (doc, score) in enumerate(filtered_results):
    print(f"-------- Filtered Match {i+1} --------")
    print(f"|--> Similarity Score: {score:.4f} <--|\n")

    from IPython.display import Markdown, display
    display(Markdown(f"Note Preview:\n\n{doc.page_content[:1200]}"))


3 out of 5 notes passed the similarity threshold (≥ 0.55):

-------- Filtered Match 1 --------
|--> Similarity Score: 0.5618 <--|



Note Preview:

**Visit Information:**
- Patient ID: 1000000123
- Encounter ID: 475208
- Visit Date: November 27, 2002

**Subjective:**
This is a 19-year-old Indian male who has been under our care for asthma management. The patient speaks German and came to the clinic on November 27, 2002, for a follow-up visit. Since his last appointment in March 2001, he has experienced an uptick in both the frequency and severity of his asthma attacks. Recently, his nocturnal asthma episodes have become more disruptive, leading to sleep disturbances and affecting his daily function and academic performance. The patient reports consistent symptoms of wheezing, shortness of breath, and chest tightness that are exacerbated by physical activities and cold weather. He has been reliant on his rescue inhaler, using it three to four times a day with only partial relief.

His medical history includes chronic asthma, which has persisted since childhood. He is currently managed with a fluticasone/salmeterol inhaler taken twice daily and albuterol on an as-needed basis. He does not use any herbal supplements. As a high school student, he lives with his parents and younger siblings. He maintains an active lifestyle but has

-------- Filtered Match 2 --------
|--> Similarity Score: 0.5620 <--|



Note Preview:

**Visit Information:**
- Patient ID: 1000000005
- Encounter ID: 475726
- Visit Date: July 2, 2003

**Subjective:**

This is a 30-year-old Hispanic female who is an English speaker, followed at our clinic for asthma management. Today, she presents for a routine follow-up visit. The patient reports persistent symptoms of asthma, including increased shortness of breath, wheezing, and a nocturnal cough that disrupts her sleep. She mentions that these symptoms have worsened over the past two weeks despite regular use of her asthma medications. She is diligent with her albuterol inhaler (used as needed), fluticasone inhaler (daily), and montelukast (nightly), yet finds that the albuterol offers only temporary relief.

Her medical history includes asthma with multiple past exacerbations usually triggered by allergens or respiratory infections. There are no past surgeries of note. Socially, she resides in an urban environment with high allergen exposure and works as a primary school teacher. She asserts a non-smoking and non-drinking lifestyle. No known allergies have been reported.

**Objective:**

Upon examination, the patient is in mild respiratory distress. Vital signs include a blood 

-------- Filtered Match 3 --------
|--> Similarity Score: 0.5715 <--|



Note Preview:

## SOAP Note

**Visit Information:**
- Patient ID: 1000000011
- Encounter ID: 476451
- Visit Date: March 16, 2004

**Subjective:**
This is a 55-year-old Caucasian female who speaks English and has been receiving care at our clinic for the past several months. She presents today for a follow-up visit primarily concerning her asthma, which has been problematic over the last month. The patient reports persistent shortness of breath, wheezing, and a nocturnal cough, hindering her sleep quality despite adherence to her prescribed medication regimen. She denies recent smoking, exposure to known allergens, or new environmental triggers.

Her past medical history includes recurrent asthma, a previous myocardial infarction, hypertension, hypercholesterolemia, lumbar disc displacement with chronic lumbago, cervical dysplasia, and a panic disorder. She is currently taking fluticasone and an albuterol inhaler for asthma management, along with antihypertensive and lipid-lowering medications. The patient lives independently and maintains an active lifestyle, adjusting her routine as necessary due to her health conditions. She follows a balanced diet and engages in regular physical activity. She 

In [13]:
# -----------------------------------------------------------
# 6. Create a Prompt Template for LLM Querying
# -----------------------------------------------------------
# This prompt is used to generate structured, patient-specific summaries
# from the most relevant clinical notes retrieved via similarity search.

# Metadata Integration:
# Each note includes metadata fields: patient_num, encounter_num, and visit_date.
# These fields will help the LLM provide traceable, organized, and clinically meaningful responses.

# Key Components:
#   - PromptTemplate.from_template(): Creates a reusable prompt with placeholders.
#   - {retrieved_docs}: Will be populated with the top-matching clinical notes.
#   - {query}: The user’s clinical question.
#   - Expected Output Structure:
#       - Patient Num, Gender, Age, Race
#       - Visit Date
#       - Summary (contextual answer to the query)

# Instructions to the model ensure:
#   - Only one note per patient is considered (preferably the most recent)
#   - The response includes structured identifiers to support traceability
#   - The summary addresses the query clearly and concisely

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "You are a medical assistant analyzing clinical notes. Based on the following records:\n\n"
    "{retrieved_docs}\n\n"
    "Answer the question: {query} using the following structure:\n"
    "   - Patient Num: <value>, Gender: <value>, Age: <value>, Race: <value>\n"
    "   - Visit Date: <value>\n"
    "   - Summary: One paragraph summarizing the patient note and answering the question.\n\n"
    "Instructions:\n"
    "- Show all patients that are relevant to the query.\n"
    "- Only consider the most recent note for each patient (identified by patient_num)."
)


In [15]:
# -----------------------------------------------------------
# 7. Use Retrieved Context to Invoke LLM and Generate Response
# -----------------------------------------------------------
# This completes the RAG workflow by passing the retrieved clinical notes
# into the prompt template and invoking a local LLM to answer the user query.

# Key Components:
#   - prompt_template.format(...): Populates the prompt with notes and query.
#   - model.invoke(final_prompt): Sends the query to the LLM.
#   - print(response.content): Displays the result.

from langchain_ollama import ChatOllama

# Define the model name (assumed to be already pulled)

model = ChatOllama(model="qwen2")

# Select top N retrieved documents (e.g., 2)
retrieved_context = "\n\n---\n\n".join([doc.page_content for doc, _ in filtered_results])

# Format the final prompt using the template
final_prompt = prompt_template.format(retrieved_docs=retrieved_context, query=query)

# Invoke the model (ChatOllama, AzureChatOpenAI, etc.)
response = model.invoke(final_prompt)

# Display the structured, AI-generated response
print("📋 LLM-Generated Response:\n")
display(Markdown(response.content))


📋 LLM-Generated Response:



Based on the provided clinical notes, there are three patients with asthma taking Fluticasone and Albuterol:

1. Patient Num: 1000000123, Gender: Male, Age: 19 years, Race: Indian  
   Visit Date: November 27, 2002
   Summary: This patient is a 19-year-old Indian male with asthma that has been under their care for management. He experienced an increase in asthma attacks since his last visit three months prior, leading to sleep disturbances and academic difficulties. His symptoms include wheezing, shortness of breath, and chest tightness exacerbated by physical activities and cold weather. He uses a rescue inhaler three to four times daily due to inadequate control with current medications including fluticasone/salmeterol and albuterol.

2. Patient Num: 1000000005, Gender: Female, Age: 30 years, Race: Hispanic  
   Visit Date: July 2, 2003
   Summary: This patient is a 30-year-old Hispanic female with persistent symptoms of asthma that have worsened over the past two weeks despite regular use of her medications. She reports increased shortness of breath and nocturnal coughing which disrupts her sleep. Her condition seems to be exacerbated by allergen exposure, requiring temporary increases in her medication dosage including fluticasone inhaler (daily) and albuterol as needed.

3. Patient Num: 1000000011, Gender: Female, Age: 55 years, Race: Caucasian  
   Visit Date: March 16, 2004
   Summary: This patient is a 55-year-old Caucasian female with persistent shortness of breath and nocturnal cough that hinders her sleep. Despite adhering to her prescribed medication regimen which includes fluticasone inhaler twice daily and albuterol as needed for asthma management, she experiences difficulty in controlling her symptoms. She also has additional health concerns like hypertension, hypercholesterolemia, and cervical dysplasia.