Imports

In [1]:
#imports
from tqdm import tqdm
from transformers import pipeline

import sys
import os

utils_path = os.path.abspath(os.path.join(os.getcwd(), '..', 'utils'))

if utils_path not in sys.path:
    sys.path.insert(0, utils_path)

from llm_extractor import load_prompt_template, make_binary_prompt, llm_extraction, parse_llm_answer
from utils import load_data, prepare_all_samples, get_entity_date_pairs, calculate_metrics

Data Loading

In [2]:
# Load data
df = load_data("../data/medcat_re_dataset.csv")
print(f"Loaded {len(df)} records")
#df

Loaded 5 records


In [3]:
# Prepare all samples
samples = prepare_all_samples(df)
print(f"Prepared {len(samples)} samples")
#samples[0]

Prepared 5 samples


LLM

In [4]:
# Define generator
#generator = pipeline("text-generation", model="../Llama-3.2-3B-Instruct", device=-1)
generator = pipeline("text2text-generation", model="google/flan-t5-small", device=-1)

Device set to use cpu


In [5]:
# Test simple prompt using generator
prompt = "Does the following text indicate a relationship between 'asthma' and '2024-08-02'? Answer YES or NO. Text: Patient diagnosed with asthma on 2024-08-02."
result = generator(prompt)
print(result[0]['generated_text'])

No


In [6]:
# Test simple prompt using llm_extraction() function
prompt = "Does the following text indicate a relationship between 'asthma' and '2024-08-02'? Answer YES or NO. Text: Patient diagnosed with asthma on 2024-08-02."
response = llm_extraction(prompt, generator)
response

'No'

In [7]:
#Prompt to use
prompt_to_use = 'prompt.txt'

In [9]:
#Process all date-entity pairs, make prompt, do llm extraction and make prediction
predictions = []

for sample in tqdm(samples, desc="Samples"):
    pairs = get_entity_date_pairs(sample['entities_list'], sample['dates'])
    #for pair in pairs[:1]:
    for pair in pairs:
        #print(pair)
        prompt = make_binary_prompt(pair['entity'], pair['date_info'], sample['note_text'], prompt_to_use)
        print(prompt)
        response = llm_extraction(prompt, generator)
        #print(response)
        pred, conf = parse_llm_answer(response)
        #print(pred, conf)
        if pred == 1:
            predictions.append({
                'entity_label': pair['entity_label'],
                'date': pair['date'],
                'confidence': conf
            })

Samples:   0%|          | 0/5 [00:00<?, ?it/s]

You are a clinical assistant. You are assisting in a task to construct patient timelines from free-text clinical notes.

To do this you are trying to understand which clinical entities in the note are related to which dates in a positive sense. For example the entity may have been diagnosed, treated, prescribed or mentioned as occurring on that date.

Here are some examples with the correct answers and some reasoning for the answers.

Examples:
Entity: asthma
Date: 2024-08-02
Note: Patient diagnosed with asthma on 2024-08-02.
Answer: Yes
Reasoning: asthma is linked to the date 2024-08-02 since the sentence clearly states the asthma diagnosis was made on that date.

Entity: diabetes
Date: 2024-08-02
Note: Diabetes was ruled out on 2024-08-02.
Answer: No
Reasoning: diabetes and the date are not linked because the diabetes is negated (ruled out).

Entity: hypertension
Date: 2024-08-02
Note: Family history of hypertension, last reviewed in 2022.
Answer: No
Reasoning: the last review if 202

Samples:  20%|██        | 1/5 [04:47<19:09, 287.33s/it]

You are a clinical assistant. You are assisting in a task to construct patient timelines from free-text clinical notes.

To do this you are trying to understand which clinical entities in the note are related to which dates in a positive sense. For example the entity may have been diagnosed, treated, prescribed or mentioned as occurring on that date.

Here are some examples with the correct answers and some reasoning for the answers.

Examples:
Entity: asthma
Date: 2024-08-02
Note: Patient diagnosed with asthma on 2024-08-02.
Answer: Yes
Reasoning: asthma is linked to the date 2024-08-02 since the sentence clearly states the asthma diagnosis was made on that date.

Entity: diabetes
Date: 2024-08-02
Note: Diabetes was ruled out on 2024-08-02.
Answer: No
Reasoning: diabetes and the date are not linked because the diabetes is negated (ruled out).

Entity: hypertension
Date: 2024-08-02
Note: Family history of hypertension, last reviewed in 2022.
Answer: No
Reasoning: the last review if 202

Samples:  40%|████      | 2/5 [06:33<09:02, 180.68s/it]

You are a clinical assistant. You are assisting in a task to construct patient timelines from free-text clinical notes.

To do this you are trying to understand which clinical entities in the note are related to which dates in a positive sense. For example the entity may have been diagnosed, treated, prescribed or mentioned as occurring on that date.

Here are some examples with the correct answers and some reasoning for the answers.

Examples:
Entity: asthma
Date: 2024-08-02
Note: Patient diagnosed with asthma on 2024-08-02.
Answer: Yes
Reasoning: asthma is linked to the date 2024-08-02 since the sentence clearly states the asthma diagnosis was made on that date.

Entity: diabetes
Date: 2024-08-02
Note: Diabetes was ruled out on 2024-08-02.
Answer: No
Reasoning: diabetes and the date are not linked because the diabetes is negated (ruled out).

Entity: hypertension
Date: 2024-08-02
Note: Family history of hypertension, last reviewed in 2022.
Answer: No
Reasoning: the last review if 202

Samples:  60%|██████    | 3/5 [07:52<04:28, 134.36s/it]

You are a clinical assistant. You are assisting in a task to construct patient timelines from free-text clinical notes.

To do this you are trying to understand which clinical entities in the note are related to which dates in a positive sense. For example the entity may have been diagnosed, treated, prescribed or mentioned as occurring on that date.

Here are some examples with the correct answers and some reasoning for the answers.

Examples:
Entity: asthma
Date: 2024-08-02
Note: Patient diagnosed with asthma on 2024-08-02.
Answer: Yes
Reasoning: asthma is linked to the date 2024-08-02 since the sentence clearly states the asthma diagnosis was made on that date.

Entity: diabetes
Date: 2024-08-02
Note: Diabetes was ruled out on 2024-08-02.
Answer: No
Reasoning: diabetes and the date are not linked because the diabetes is negated (ruled out).

Entity: hypertension
Date: 2024-08-02
Note: Family history of hypertension, last reviewed in 2022.
Answer: No
Reasoning: the last review if 202

Samples:  80%|████████  | 4/5 [08:17<01:31, 91.22s/it] 

You are a clinical assistant. You are assisting in a task to construct patient timelines from free-text clinical notes.

To do this you are trying to understand which clinical entities in the note are related to which dates in a positive sense. For example the entity may have been diagnosed, treated, prescribed or mentioned as occurring on that date.

Here are some examples with the correct answers and some reasoning for the answers.

Examples:
Entity: asthma
Date: 2024-08-02
Note: Patient diagnosed with asthma on 2024-08-02.
Answer: Yes
Reasoning: asthma is linked to the date 2024-08-02 since the sentence clearly states the asthma diagnosis was made on that date.

Entity: diabetes
Date: 2024-08-02
Note: Diabetes was ruled out on 2024-08-02.
Answer: No
Reasoning: diabetes and the date are not linked because the diabetes is negated (ruled out).

Entity: hypertension
Date: 2024-08-02
Note: Family history of hypertension, last reviewed in 2022.
Answer: No
Reasoning: the last review if 202

Samples: 100%|██████████| 5/5 [09:14<00:00, 110.86s/it]


In [10]:
#Look at prediction
predictions

[]

In [11]:
#Calculcate metrics
metrics = calculate_metrics(predictions, df)
metrics

{'precision': 0, 'recall': 0.0, 'f1': 0, 'tp': 0, 'fp': 0, 'fn': 37}