![i2b2 Logo](images/transmart-logo.png)

# Using LLM + Embeddings to Search Patient Notes (RAG - Basics)

This notebook demonstrates how to use **local embeddings** and a **Retrieval-Augmented Generation (RAG)** approach to search and analyze clinical notes stored in an i2b2-like format. You'll learn how to decode raw notes, embed them using the MiniLM model, perform semantic search with FAISS, and generate structured clinical responses using a local LLM (e.g., Qwen or LLaMA 3 via Ollama).

### 🔍 Key Concepts Covered

- Decoding BinHex-encoded clinical notes
- Creating semantic vector embeddings with `MiniLM`
- Storing embeddings in a FAISS vector store (in memory)
- Performing similarity search and interpreting cosine similarity scores
- Filtering to include only the **most relevant and recent** patient notes
- Injecting retrieved context into a structured prompt template
- Using a local LLM (Ollama) to generate clinically relevant summaries

Each cell builds on the previous one to demonstrate a complete, hands-on RAG pipeline adapted for **clinical informatics** use cases using familiar i2b2-style data.


## 1. Prepare Data for Embedding

Before we can search and analyze clinical notes using vector similarity, we need to prepare the data:

- **1.1**: Load the i2b2-mimicking dataset containing BinHex-encoded clinical notes.
- **1.2**: Decode the notes and add a new column (`note_text`) with plain-text content.

This prepares the dataset for the next step, where we will embed the notes into a vector space using a local MiniLM model.


In [None]:
# -----------------------------------------------------------
# 1.1. Load and Explore Visit Data from i2b2-Mimicking CSV
# -----------------------------------------------------------
# This cell loads clinical visit data from a CSV that simulates the i2b2
# `visit_dimension` table. Each row contains metadata and a clinical note
# encoded in BinHex format.

# Fields included:
# - encounter_num: Unique visit ID
# - patient_num: Patient identifier
# - start_date, end_date: Visit dates
# - location_cd, location_path: Care location details
# - visit_blob: BinHex-encoded clinical note text

import pandas as pd
# from mistune import markdown

# Define the path to the simulated i2b2 CSV file
csv_path = "datafiles/i2b2_encounter_table.csv"

# Load the data into a pandas DataFrame
df = pd.read_csv(csv_path)

# Display the first 10 rows to inspect the structure
df.head(10)


In [None]:
# -----------------------------------------------------------
# 1.2. Decode BinHex Clinical Notes and Prepare Text Corpus
# -----------------------------------------------------------
# This cell decodes the clinical notes stored in BinHex format and
# adds a new `note_text` column containing plain-text notes.
# These will be used for embedding in the next step.

import binascii
from IPython.display import display, Markdown

# Function to decode a single BinHex string
def decode_note(hex_blob):
    hex_str = hex_blob.replace("0x", "")
    return binascii.unhexlify(hex_str).decode("utf-8", errors="ignore")

# Apply decoding to all rows
df["note_text"] = df["visit_blob"].apply(decode_note)

# Display the first 10 decoded records
display(df.head(10))

# Display an example decoded note
example_index = 10
display(Markdown(f"### Decoded Note Example (Row {example_index} of {len(df)}):\n\n```text\n{df['note_text'][example_index]}\n```"))


## 2. Decode, Embed and Store Clinical Notes in a FAISS Vector Store (In-Memory)

In this step, we embed full clinical notes and store them in a **FAISS** vector store, which enables efficient similarity search. We use a lightweight transformer model (`MiniLM`) to convert each note into a semantic vector.

### Steps:
- **2.1**: Embed clinical notes using the `all-MiniLM-L6-v2` model from Hugging Face.
- **2.2**: View an embedded document along with its metadata and vector representation.

### Why Use This Approach?

Storing entire notes is useful when:
- You want to preserve the full clinical context for each patient.
- Your downstream use case (e.g., summarization or structured extraction) requires complete narrative input.
- The notes are concise enough to fit within the input limits of an LLM.

This method simplifies retrieval workflows by allowing you to work with whole documents rather than fragmented chunks.

<img src="./images/rag_full.png" alt="RAG Full" width="900">




In [None]:
# -----------------------------------------------------------
# 2.1. Embed Clinical Notes Using Local MiniLM Embeddings
# -----------------------------------------------------------
# This cell encodes each clinical note into a vector using a local
# transformer model and stores those embeddings in a FAISS index for
# fast similarity search.

# Model: `sentence-transformers/all-MiniLM-L6-v2`
# - Optimized for semantic similarity tasks
# - Lightweight and fast (384-dimensional vectors)
# - Runs fully offline

# Requirements:
#   pip install langchain langchain-huggingface sentence-transformers faiss-cpu

from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# Initialize the embedding model
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Prepare inputs for embedding: note text and relevant metadata
documents = df["note_text"].tolist()
metadata = df[["patient_num", "encounter_num", "start_date"]].to_dict(orient="records")

# Create a FAISS vector store from the documents
vectorstore = FAISS.from_texts(documents, embedding_model, metadatas=metadata)

print(f"✅ Successfully embedded {len(documents)} clinical notes using MiniLM.")


In [None]:
# -----------------------------------------------------------
# 2.2. View a Specific Embedded Document, Metadata, and Vector
# -----------------------------------------------------------
# Select an index (e.g., id = 5) to inspect the stored document.
# This cell shows the document text, associated metadata, and
# the corresponding FAISS embedding vector.

id = 1  # You can change this index to view a different record

# Get (doc_id, Document) tuple from LangChain's docstore
doc_id, doc_example = list(vectorstore.docstore._dict.items())[id]

# Retrieve corresponding vector from FAISS
vector_example = vectorstore.index.reconstruct(id)

display(Markdown(f"### 🧾 Document ID: `{doc_id}`"))
display(Markdown(f"**Metadata:** `{doc_example.metadata}`"))

display(Markdown("**Document Text (First 500 characters):**"))
display(Markdown(f"```text\n{doc_example.page_content[:500]}...\n```"))

display(Markdown("**Embedded Vector (First 100):**"))
display(Markdown(f"```text\n{vector_example[:100]}\n```"))

## 3. Retrieving Clinical Notes with Similarity Score (RAG Retrieval)

In this section, we perform semantic search over embedded clinical notes using a FAISS vector store and a locally generated query vector. We use similarity scores to evaluate the relevance of each match to the query.

### Key Retrieval Steps:

1. **Embed a Query (Step 3.1)**
   - Converts a natural language question into a numerical vector using the same MiniLM model used to embed the notes.

2. **Similarity Search with Scores (Step 3.2)**
   - Retrieves the top-k clinical notes ranked by cosine similarity to the query.
   - Includes similarity scores for transparency and ranking.

3. **Score Threshold Filtering (Step 3.3)**
   - Filters out matches with low similarity scores.
   - Helps improve the precision and clinical relevance of the results.

### Why Use These Techniques?

Similarity search helps identify notes most relevant to a user-defined question or condition. Threshold filtering ensures:
- Only strong matches are considered for downstream tasks like summarization
- Noisy or unrelated content is excluded
- Each result can be justified based on a similarity score

<img src="./images/rag_retrieval.png" alt="RAG Retrieval" width="900">


In [None]:
# -----------------------------------------------------------
# 3.1. Embed a Query and Inspect Its Vector Representation
# -----------------------------------------------------------
# This step encodes a natural language query into a numerical vector
# using the same MiniLM model used for the clinical notes.
# This vector will be used to search for semantically similar notes.

from IPython.display import display, Markdown

# Define a sample clinical query
query = "Who has asthma and is taking Fluticasone and Albuterol?"

# Generate the embedding for the query
query_vector = embedding_model.embed_query(query)

# Display the vector and its shape
display(Markdown("### Vectorized Query"))
display(Markdown(f"`Query:` *{query}*"))
display(Markdown("**Embedding Vector (truncated):**"))
display(Markdown(f"```text\n{query_vector[:100]} ... [{len(query_vector)} dimensions]\n```"))


In [None]:
# -----------------------------------------------------------
# 3.2. Similarity Search (Top-K Results, No Filtering)
# -----------------------------------------------------------
# This cell performs a semantic similarity search using the embedded query,
# returning the top-k most similar clinical notes along with similarity scores.

# Score interpretation:
# - 0.90 – 1.00: Highly relevant
# - 0.70 – 0.90: Strong match
# - 0.50 – 0.70: Moderate match
# - 0.30 – 0.50: Low match
# - 0.00 – 0.30: Minimal or irrelevant

from IPython.display import Markdown, display

# Define number of top results
top_k = 5

# Run similarity search
results = vectorstore.similarity_search_with_score(query, k=top_k)

# Display header
display(Markdown(f"### 🔍 Top {top_k} Most Similar Clinical Notes"))

# Iterate and display each match
for i, (doc, score) in enumerate(results):
    display(Markdown(f"---\n**Result {i+1}**  \n- **Similarity Score:** `{score:.4f}`  \n- **Patient Num:** `{doc.metadata.get('patient_num', 'N/A')}`  \n- **Encounter:** `{doc.metadata.get('encounter_num', 'N/A')}`\n\n**Note Preview:**\n```text\n{doc.page_content[:1200]}\n```"))


In [None]:
# -----------------------------------------------------------
# 3.3. Filter Search Results by Similarity Score Threshold
# -----------------------------------------------------------
# This cell filters the top-K search results to keep only those
# with high similarity scores above a defined threshold.

# Similarity Score Threshold:
# - Only notes with scores ≥ threshold will be retained.
# - Higher scores = greater semantic similarity.

from IPython.display import Markdown, display

threshold = 0.55  # Keep notes with score ≥ 0.55

# Initialize an empty list to hold the filtered results
filtered_results = []

# Loop through each result (a tuple of Document and score)
for doc, score in results:
    # Check if the similarity score meets the threshold
    if score >= threshold:
        # If so, add it to the filtered list
        filtered_results.append((doc, score))


# Summary
display(Markdown(f"### ✅ {len(filtered_results)} of {top_k} notes passed the similarity threshold (≥ {threshold})"))

# Show filtered results
for i, (doc, score) in enumerate(filtered_results):
    display(Markdown(
        f"---\n**Filtered Match {i+1}**  \n"
        f"- **Similarity Score:** `{score:.4f}`  \n"
        f"- **Patient Num:** `{doc.metadata.get('patient_num', 'N/A')}`  \n"
        f"- **Encounter:** `{doc.metadata.get('encounter_num', 'N/A')}`\n\n"
        f"**Note Preview:**\n```text\n{doc.page_content[:1200]}\n```"
    ))


## 4. Generating Structured Responses with an LLM (RAG Retrieval)

In this section, we take the clinical notes retrieved via semantic search and pass them into a Large Language Model (LLM) to generate structured, clinically meaningful responses. This is the final step in the **Retrieval-Augmented Generation (RAG)** pipeline.

### Key Steps:

1. **Creating a Prompt Template for LLM Querying (Step 4.1)**
   - Defines a reusable prompt structure for analyzing and summarizing clinical notes.
   - Ensures each response includes patient metadata and clear, structured outputs.

2. **Invoking LLM with Retrieved Context (Step 4.2)**
   - Inserts the top-retrieved clinical notes into the prompt.
   - Sends the prompt to a local LLM (e.g., Qwen2 via Ollama) for structured generation.
   - Returns a summary that directly answers the user’s medical query.

### Why This Matters

This step demonstrates how LLMs can synthesize information from real patient notes to produce:
- Patient-specific summaries
- Answered clinical questions
- Traceable outputs with structured identifiers

This capability is essential for use cases like clinical decision support, patient-facing summaries, or intelligent search interfaces.

<img src="./images/rag_generation.png" alt="RAG Generation" width="1250">


In [None]:
# -----------------------------------------------------------
# 4.1. Create a Prompt Template for LLM Querying
# -----------------------------------------------------------
# This prompt template guides the LLM to generate structured summaries
# from clinical notes retrieved via similarity search.

# It includes placeholders for:
# - {retrieved_docs}: Injects the top-matching clinical notes
# - {query}: A user-defined clinical question

# Output Expectations:
# - One structured response per patient
# - Includes metadata for traceability
# - Summarizes and answers the query based on each patient's most recent note

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "You are a medical assistant analyzing clinical notes. Based on the following records:\n\n"
    "{retrieved_docs}\n\n"
    "Answer the question: {query} using the following structure:\n"
    "   - Patient Num: <value>, Gender: <value>, Age: <value>, Race: <value>\n"
    "   - Visit Date: <value>\n"
    "   - Summary: One paragraph summarizing the patient note and answering the question.\n\n"
    "   - Has Asthma: <Yes/No>"
    "Instructions:\n"
    "- Show all patients that are relevant to the query.\n"
    "- Only consider the most recent note for each patient (identified by patient_num)."
)

display(Markdown(f"```\n{prompt_template.template}\n```"))


In [None]:
# -----------------------------------------------------------
# 4.2. Use Retrieved Context to Invoke LLM and Generate Response
# -----------------------------------------------------------
# This cell completes the RAG workflow by injecting the top-matching clinical notes
# into a prompt template and invoking a local LLM to generate a structured response.

from langchain_ollama import ChatOllama
from IPython.display import display, Markdown

# Initialize the local LLM (ensure this model has been pulled via Ollama)
model = ChatOllama(model="qwen2")

# Prepare the context by joining top retrieved notes
retrieved_context = "\n\n---\n\n".join([doc.page_content for doc, _ in filtered_results])

# Fill in the prompt template with the retrieved notes and query
final_prompt = prompt_template.format(
    retrieved_docs=retrieved_context,
    query=query
)

# Run inference using the LLM
response = model.invoke(final_prompt)

# Display the generated response
display(Markdown("### 📋 LLM-Generated Response"))
display(Markdown(response.content))
