![i2b2 Logo](./images/transmart-logo.png)

# Using LLM for Clinical Note Analysis (Zero-shot)

This notebook demonstrates how to use a **locally hosted large language model (LLM)** with LangChain and Ollama to analyze clinical content in a zero-shot setting—without prior training or fine-tuning.

You will walk through a structured process for:
- Loading a local LLM (e.g., Qwen or DeepSeek via Ollama)
- Running basic clinical queries using system and user messages
- Creating reusable prompt templates for dynamic disease-specific explanations
- Preparing a prompt for structured clinical note extraction
- Applying the LLM to decode and analyze real patient notes

Each cell builds on the previous one, moving from simple interactions to more advanced, structured information extraction from unstructured clinical text.


In [72]:
# -----------------------------------------------------------
# 1. Load the Ollama Model
# -----------------------------------------------------------
# This cell loads a local Large Language Model (LLM) using LangChain's
# `ChatOllama` wrapper. Make sure the model has been pulled via Ollama CLI before use.

# To explore available models, visit:
# - Ollama Library: https://ollama.com/library
# - Hugging Face (for embeddings and additional models): https://huggingface.co/models

from langchain_ollama import ChatOllama

# Define the model name — make sure this model is already downloaded using:
#   ollama pull deepseek-v2:16b
model_name = "qwen2"  # Alternatives: "qwen2", "llama3", etc.

# Initialize the model with LangChain
model = ChatOllama(model=model_name)

print(f"✅ Model '{model_name}' is loaded and ready to use.")


✅ Model 'qwen2' is loaded and ready to use.


## 2. Basic Prompt Interaction - Simple Query

<img src="./images/basic_prompt.png" alt="i2b2 Logo" width="900">



In [73]:
# -----------------------------------------------------------
# 2. Run a Simple Clinical Query with System + User Prompts
# -----------------------------------------------------------
# This tests the model by asking a simple clinical question using structured messages.

from langchain_core.messages import HumanMessage, SystemMessage
from IPython.display import display, Markdown


messages = [
    SystemMessage(content=(
        "You are a knowledgeable medical provider. "
        "Provide clear, evidence-based explanations about a medical conditions."
    )),
    HumanMessage(content="What is asthma? What are its common symptoms and treatments?")
]

# Run inference
response = model.invoke(messages)

# Display result
display(Markdown("### Model Response:"))
display(Markdown(response.content))


### Model Response:

Asthma is a long-term, chronic condition that affects the airways in the lungs. It involves inflammation and narrowing of the bronchial tubes, which carry air into and out of the lungs. This inflammation causes swelling and mucus production, making it difficult for air to flow through them.

### Common Symptoms

The common symptoms of asthma include:

1. **Recurrent Coughing**: Often occurring at night or early in the morning.
2. **Wheezing**: A whistling sound when breathing, especially during exhalation.
3. **Breathlessness**: Difficulty in breathing both on exertion and at rest.
4. **Chest Tightness or Pressure**: Often described as a sensation of tightness around the chest.
5. **Prolonged Exhalation**: The typical "whooshing" sound that occurs during expiration.

### Causes

Asthma is typically triggered by allergens, exercise, cold weather, air pollution, smoking, and other irritants in the air. It can affect people of all ages but is most commonly diagnosed in children under 12 years old.

### Treatments

#### Medications
Asthma treatment aims to control inflammation and reduce symptoms or prevent asthma attacks:

- **Inhaled Corticosteroids**: These are corticosteroid medications that you inhale through a metered-dose inhaler or a nebulizer. They help decrease swelling in the airways.
  
  - **Short-acting bronchodilators (SABAs)**: These include albuterol, salmeterol, and formoterol. They are used to quickly relieve symptoms of asthma when needed.

#### Lifestyle Changes
Modifying daily activities can also help manage asthma:

1. **Avoiding Triggers**: Identifying and avoiding allergens like dust mites, pet dander, pollen, and mold is crucial.
2. **Physical Activity**: Regular exercise can improve lung function but needs to be carefully planned to avoid triggers.

#### Long-term Management
- **Long-term Control Medications**: Apart from corticosteroids, long-acting bronchodilators like salmeterol or formoterol are used daily to keep the airways open and prevent symptoms.
- **Peak Flow Meter**: Monitoring peak expiratory flow can help identify when asthma is worsening before symptoms appear.

#### Emergency Management
- **Quick-relief Medications**: These are inhaled medications taken during an asthma attack. Examples include albuterol inhalers, which are typically used to relieve symptoms quickly.

### Monitoring and Education
Regular check-ups with a healthcare provider are essential to monitor lung function and adjust treatment as needed. Patients should learn how to recognize their triggers, understand the severity of their condition, and know when to seek emergency care.

### Conclusion
Asthma management is multifaceted, involving both medical treatments and lifestyle changes tailored to each individual's needs. Effective management can significantly improve quality of life by reducing symptoms and preventing asthma attacks.

## 3. Using ChatPromptTemplate for Dynamic Prompts

This section shows how to create reusable prompt templates using LangChain’s `ChatPromptTemplate`.

- **3.1**: Runs a dynamic medical query by filling in a condition (e.g., epilepsy) and a patient type (e.g., child).
- **3.2**: Prepares a prompt template for clinical note extraction. This will be used later in Section 4 when we insert actual patient notes.

Prompt templates make it easier to reuse the same structure with different inputs.

<img src="./images/prompt_template.png" alt="Prompt Template" width="900">


In [74]:
# -----------------------------------------------------------
# 3.1. Create a Reusable Prompt Template (Dynamic Querying)
# -----------------------------------------------------------
# This cell demonstrates how to build a dynamic prompt template using placeholders
# for different medical conditions and patient profiles. The prompt is populated
# with variables at runtime and sent to the LLM for inference.

from langchain.prompts import ChatPromptTemplate
from IPython.display import display, Markdown

# Define input variables
patient_type = "5 year-old"
disease = "epilepsy"

# Define a role-based prompt using variable placeholders
messages = [
    ("system",
     "You are a knowledgeable medical provider. Provide clear, evidence-based explanations appropriate for a {patient_type} patient."),

    ("human",
     "What is {disease}? What are its common symptoms and treatments?")
]

# Create a prompt template with dynamic inputs
prompt_template = ChatPromptTemplate.from_messages(messages)

# Fill the template with actual values
prompt = prompt_template.invoke({
    "patient_type": patient_type,
    "disease": disease
})

# Run the model with the constructed prompt
response = model.invoke(prompt)

# Display the generated answer
display(Markdown("### AI Response"))
display(Markdown(response.content))


### AI Response

Epilepsy is a condition where someone has seizures because their brain's electrical signals don't work properly. Imagine your brain as a little town with lots of traffic lights that control when cars can go or stop. In epilepsy, these traffic lights sometimes get confused and change too quickly or randomly, making it hard for the town to function smoothly. When this happens, it might cause seizures.

**Common Symptoms:**
1. **Sudden Seizures:** These are like the town suddenly going into chaos. Some kids may freeze in place, others might have a lot of jerky movements all over their body.
2. **Confusion or Changes in Behavior:** After some seizures, kids might be very confused and not remember things that just happened or seem upset for no reason.
3. **Staring Spells:** Sometimes, during a seizure, kids will just stare into space without moving much.

**Treatments:**
1. **Medications:** Doctors often give medicines to help control the brain's signals so they don't get confused as often. It's like giving the traffic lights special instructions on when to change.
2. **Special Diets:** Sometimes, kids need to follow very specific diets that limit certain foods because they can make seizures happen less often.
3. **Surgery or Devices:** In some cases, doctors might do surgery to fix parts of the brain that are causing seizures or use a device like a pacemaker for the brain (VNS) to help control them.

The goal is to keep kids as healthy and happy as possible while managing their epilepsy. It's important for parents and teachers to learn about what can trigger seizures, so they know how to make safe environments and react when needed.

In [75]:
# -----------------------------------------------------------
# 3.2. Clinical Note Extraction Prompt
# -----------------------------------------------------------
# This cell defines a reusable prompt template to guide the LLM
# in extracting structured clinical information from a free-text note.

from langchain.prompts import ChatPromptTemplate

# Define the prompt structure using role-based messages
messages_notes = [
    ("system",
     "You are an advanced medical documentation assistant. "
     "Your task is to extract key clinical details from unstructured notes in a clear and structured format."),

    ("human",
     "Analyze the following clinical note:\n\n"
     "{patient_note}\n\n"
     "Extract the following:\n"
     "1. Patient demographics\n"
     "2. Chief complaints\n"
     "3. Current medications\n"
     "4. Does the patient have asthma? (Yes/No)\n\n"
     "Format your response using clear, structured bullet points.")
]

# Create the prompt template with a placeholder for the clinical note
prompt_template_notes = ChatPromptTemplate.from_messages(messages_notes)


## 4. Extract and Decode Notes from i2b2 Encounter Table

In this section, we explore how to work with real clinical notes stored in a simulated i2b2 `visit_dimension` table. These notes are encoded in BinHex format and need to be decoded before being used in a prompt.

### What You'll Do:
- **4.1**: Load and inspect the structure of the CSV file containing patient visits and encoded notes.
- **4.2**: Interactively select a patient and encounter to preview the decoded note.
- **4.3**: Use the previously defined prompt template to extract structured clinical information from a real decoded note.

This prepares the data for downstream tasks like summarization or diagnostic tagging using a large language model.


In [76]:
# -----------------------------------------------------------
# 4.1. Load and Explore Visit Data from i2b2-Mimicking CSV
# -----------------------------------------------------------
# This cell loads a simulated `visit_dimension` table similar to what is used in i2b2.
# It includes clinical notes stored in BinHex format and related visit metadata.

# Each record includes:
# - encounter_num: Unique visit ID
# - patient_num: Patient identifier
# - start_date, end_date: Visit timestamps
# - location_cd, location_path: Care setting info
# - visit_blob: BinHex-encoded clinical note

# Purpose:
# Load the data into a pandas DataFrame and inspect the structure before decoding.

import pandas as pd

# Load the visit-level dataset
csv_path = "datafiles/i2b2_encounter_table.csv"
df = pd.read_csv(csv_path)

# Preview the first 10 records
df.head(10)


Unnamed: 0,encounter_num,patient_num,start_date,end_date,inout_cd,location_cd,location_path,visit_blob
0,475303,1000000001,01/16/2003,01/16/2003,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
1,479681,1000000001,03/29/2007,03/29/2007,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
2,480315,1000000001,09/20/2007,09/20/2007,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
3,480903,1000000001,03/04/2008,03/04/2008,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x5468697320697320612032332D796561722D6F6C6420...
4,481398,1000000001,08/11/2008,08/11/2008,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x566973697420496E666F726D6174696F6E3A20202020...
5,482655,1000000001,05/18/2009,05/18/2009,O,ASTHMA_CLINIC,\Hospital\Clinic\Pulmonary\Asthma\\,0x566973697420496E666F726D6174696F6E3A0A202020...
6,471658,1000000002,04/17/1998,04/17/1998,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
7,472076,1000000002,01/04/1999,01/04/1999,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
8,472473,1000000002,08/12/1999,08/12/1999,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...
9,472692,1000000002,12/01/1999,12/01/1999,O,GEN_MED_OUTPATIENT,\Hospital\Outpatient\GeneralMedicine\\,0x2A2A566973697420496E666F726D6174696F6E3A2A2A...


In [77]:
# -----------------------------------------------------------
# 4.2. Interactive Exploration and Decoding of Clinical Notes
# -----------------------------------------------------------
# This cell allows users to interactively select a patient and encounter
# to preview the decoded clinical note (stored as BinHex in the dataset).

import binascii
import ipywidgets as widgets
from IPython.display import display, Markdown

# Step 1: Create dropdown for unique patient IDs
distinct_patients = sorted(df['patient_num'].unique())

patient_dropdown = widgets.Dropdown(
    options=distinct_patients,
    description="Patient #:",
    layout=widgets.Layout(width='25%')
)

# Step 2: Encounter dropdown (updates based on patient selection)
encounter_dropdown = widgets.Dropdown(
    options=[],
    description="Encounter #:",
    layout=widgets.Layout(width='25%')
)

# Step 3: Update encounters dynamically when patient is selected
def update_encounters(*args):
    patient_encounters = df[df['patient_num'] == patient_dropdown.value]['encounter_num'].tolist()
    encounter_dropdown.options = patient_encounters

# Step 4: Decode and display the selected clinical note
def decode_and_preview(patient_num, encounter_num):
    selected_row = df[
        (df['patient_num'] == patient_num) &
        (df['encounter_num'] == encounter_num)
    ].iloc[0]

    hex_blob = selected_row["visit_blob"].replace("0x", "")
    decoded_note = binascii.unhexlify(hex_blob).decode("utf-8", errors="ignore")

    display(Markdown(f"### Clinical Note Preview (Patient {patient_num}, Encounter {encounter_num})"))
    display(Markdown(f"```text\n{decoded_note}\n```"))

# Step 5: Set up dropdown linkage and render interface
patient_dropdown.observe(update_encounters, names='value')
update_encounters()

widgets.interact(
    decode_and_preview,
    patient_num=patient_dropdown,
    encounter_num=encounter_dropdown
)


interactive(children=(Dropdown(description='Patient #:', layout=Layout(width='25%'), options=(np.int64(1000000…

<function __main__.decode_and_preview(patient_num, encounter_num)>

In [78]:
# -----------------------------------------------------------
# 4.3. Analyze a Real Patient Note Using the Prompt
# -----------------------------------------------------------
# This cell selects a decoded clinical note for a given patient and encounter,
# fills the clinical extraction prompt (prepared in Cell 3.2), and runs it through the LLM.

# Define patient and encounter to analyze
patient_num = 1000000002
encounter_num = 475326

# Extract and decode the clinical note
selected_row = df[
    (df['patient_num'] == patient_num) &
    (df['encounter_num'] == encounter_num)
].iloc[0]

hex_blob = selected_row["visit_blob"].replace("0x", "")
decoded_note = binascii.unhexlify(hex_blob).decode("utf-8", errors="ignore")

# Fill the template with the note text
filled_prompt = prompt_template_notes.invoke({"patient_note": decoded_note})

# Invoke the model with the prompt
clinical_response = model.invoke(filled_prompt)

# Display the structured output
from IPython.display import display, Markdown
display(Markdown("### Extracted Clinical Information"))
display(Markdown(clinical_response.content))


### Extracted Clinical Information

### Patient Demographics

- **ID Number**: 1000000002
- **Date of Visit**: January 28, 2003
- **Encounter ID**: 475326
- **Age**: 37 years
- **Sex**: Female
- **Ethnicity**: White
- **Language**: Spanish

### Chief Complaints

- **Hair Loss**
- **Severe Manic Episodes**
- **Headaches (frontal region, throbbing)**
- **Throat Pain** (continuous for several weeks)
- **Ongoing Respiratory Difficulties** (shortness of breath, wheezing)

### Current Medications

- **Claritin Reditab 10mg**: For allergy relief
- **Desipramine Hydrochloride 25mg**: To manage bipolar disorder

### Diagnosis and Management

1. **Alopecia**: Referral to dermatology for further evaluation.
2. **Bipolar Disorder, manic phase**: Adjustment in psychiatric medication regimen.
3. **Chronic Headache**: Differentiation between headache types, management trial with medication.
4. **Chronic Throat Pain**: Potential causes including reflux or tobacco use.

### Asthma Status

- **Yes**

### Follow-Up and Plan of Care

- **Follow-up referral to dermatology** for alopecia.
- **Psychiatric coordination** for adjusting bipolar disorder medications.
- **Headache management trial** with potential lifestyle changes and medication adjustments.
- **Smoking cessation advice** and proton pump inhibitors prescription.
- **Laboratory tests monitoring** (complete blood count, liver function).
- **Lifestyle modifications** including strict adherence to medication regimen and proposed lifestyle changes.
- **Continuous care** through follow-up appointments and comprehensive education on the importance of follow-up visits.