# Experiment 1.1

As in the previous experiment 1, the goal is: 

* Demonstrate an example of calling the ChatGPT API 
* Solve some of the problems in formatting output
* Use ChatGPT to (a) pick a medical condition and (b) generate an admission note given some simple patient details
* Evaluate the output for (a) successful generation of text (b) plausibility of admission note

In order to seed the LLM with some patient information, I start with some patient parameters provided by synthetic data. NHS England prepared a synthetic dataset of A&E presentations. A blog post about it is [here](https://open-innovations.org/blog/2019-01-24-exploring-methods-for-creating-synthetic-a-e-data). The dataset can be accessed from [this website](https://data.england.nhs.uk/dataset/a-e-synthetic-data/resource/81b068e5-6501-4840-a880-a8e7aa56890e#). 

A notebook exploring these data is [here](explore-nhse-ae-data.ipynb)

This dataset provides, among other things, the hospital length of stay and whether the patient was admitted. As my interest is in clinical notes that are generated during hospital visits, I'm more interested in inpatients. I will use the length of stay information from the A&E dataset to ask ChatGPT to pick a medical condition that would merit inpatient admission. I also investigate whether, by giving the lengh of stay information, the admission note would indicate a severity that could plausibly (at least barring any unexpected developments in the patient's trajectory) result in a such a stay. 

In previous iterations I have already found limitations when evaluating the output on:

* on their readability as admission notes (although as I am not medical, I cannot do a full evaluation)
* on the plausbility of the medical condition chosen by ChatGPT leading to a hospital visit of this length

However, as I'm currently not able to access ChatGPT v4, I want to save 100 notes using gpt-3.5-turbo-0613, for comparison later. So I will continue with this approach for now.


## Set up

In [2]:
# Reload functions every time
%load_ext autoreload 
%autoreload 2

In [35]:
# Load libraries
import sys
import os
import pandas as pd
from pathlib import Path

import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)


# Import the variables that have been set in the init.py folder in the root directory
# These include a constant called PROJECT_ROOT which stores the absolute path to this folder
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), os.pardir)))
import init
PROJECT_ROOT = os.getenv("PROJECT_ROOT")

# Add the src folder to sys path, so that the application knows to look there for libraries
sys.path.append(str(Path(PROJECT_ROOT) / 'src'))

# Import function to load data
from data_ingestion.load_data import load_nhse_data

from utils.write_to_json import write_to_json
from utils.load_from_json import load_from_json

from functions.patient_class import Patient

from functions.verify_admission_decision import verify_admission_decision




## Load data

Here the function called load_file()  

* check if the NHS England datset has already been saved in a local folder called data_store in a parquet format; if so the function returns it
* if not, it checks for the zip file from the NHSE website and if not downloads it, unzips it and saves it 
* the file is unzipped to csv and read as as pandas dataframe and saved to parquet format

In [4]:
ed = load_nhse_data()
# ed.head()

## Generate 100 instances of a patient with admission notes

Here I pick rows from the A&E dataset at random, and generate admission notes for them. I chose to save these to a json file, rather than to individual text files or to SQL, as this is human readable. You can view the full output [here](../src/data_exports/note_dict_20231002_2120.json)

This uses a class definition called Patient(). To see the script, go to [../src/functions/patient_class.py](../src/functions/patient_class.py) An instance of this class, here referred to as a persona, is a single patient. The instance is populated with the variables retrieved from a single row of the ED dataset loaded above.

The steps are the following:
* select a row from the A&E data
* pass the row information to the class definition

In addition, as part of creating the persona, a call to ChatGPT is made using the OpenAI API. Certain details about the patient (listed below) are embedded into the prompt to be passed in. ChatGPT is asked to generate a medical condition, and an admission note. 

A sub-function [pick_medical_condition()](../src/functions/pick_medical_condition.py) is called to populate three additional attributes of the persona: the medical condition, their admission note, and their most recent note (which is this case is the admission note, but later this attribute could be something else like a progress note). 

The script [pick_medical_condition()](../src/functions/pick_medical_condition.py)

* calls a function [generate_prompt_presenting_condition()](../src/functions/pick_medical_condition.py) (scroll down the file) which populates a ChatGPT prompt with details about the patient. The prompt contains ChatGPT's instructions and requests the ouput in a json format. To see the text of the prompt, go to [pick_medical_condition.txt](../templates/prompt_templates/pick_medical_condition.txt)
* calls ChatGPT with the prompt. Functions used to call ChatGPT are in [prompt_functions.py](../src/functions/prompt_functions.py)
* attempts to parse the json output

In [5]:
attributes = ['Age_Band', 'AE_Arrive_HourOfDay', 'AE_Time_Mins',  'Length_Of_Stay_Days', 'ICD10_Chapter_Code', 'Title', 'Medical_Condition', 'Admission_Note']
note_dict = {}

def row_to_patient(row):
    return Patient(*row)

In [13]:
while len(note_dict.keys()) < 100:

    for index, persona in ed.sample(1).iterrows():

        Pat = row_to_patient(persona)

        if Pat.Admission_Note != '' and 'failed on json' not in str(Pat.Admission_Note):

            print("Successful admission note for patient with id " + str(index))
            
            note_dict[Pat.id] = {}
            note_dict[Pat.id] = {attr: getattr(Pat, attr) for attr in attributes if hasattr(Pat, attr)}

        else: 

            print("Failed admission note for patient with id " + str(index))

    if len(note_dict.keys()) % 10 == 0:
        write_to_json(note_dict, 'experiment_1.1')

write_to_json(note_dict, 'experiment_1.1')

Failed admission note for patient with id 3209564
Successful admission note for patient with id 5154297
Failed admission note for patient with id 3476926
Failed admission note for patient with id 4071698
Failed admission note for patient with id 2833307
Failed admission note for patient with id 121764
Successful admission note for patient with id 2486033
Failed admission note for patient with id 3435584
Failed admission note for patient with id 2005326
Successful admission note for patient with id 4780609
Failed admission note for patient with id 3847653
Successful admission note for patient with id 4757775
Failed admission note for patient with id 707137
Successful admission note for patient with id 4925218
Failed admission note for patient with id 2551064
Successful admission note for patient with id 741336
Successful admission note for patient with id 5230991
Failed admission note for patient with id 839359
Successful admission note for patient with id 3606052
Successful admission n

## Evaluate output

In [19]:
note_dict = load_from_json('experiment_1.1')

### Use LLM for evaluation

In [Experiment 1.0](experiment-1.0.ipynb) I found that the agent wanted to admit people with conditions that did not merit admission. Here I use the agent first tried in [Experiment 1.0.1](experiment-1.0.1.ipynb) to evaluate the admission decision. The text prompt is [verify_admission_decision.txt](../templates/prompt_templates/verify_admission_decision.txt). The agent is given the admission note, and asked to act as a senior colleague to appraise a junior colleague's decision to admit the patient. The agent returns an agree/disagree decision, and a discharge note. (The reason for adding the discharge note was to encourage the LLM to think about how the patient's condition could be managed without admission)

The code below loops through the 100 admission notes, to evaluate whether an admission decision was the right call. An example is below.

In [42]:
opinion, reason, discharge_note = verify_admission_decision(persona['Admission_Note'])
print(f"Admission note:\n{persona['Admission_Note'].strip()}\n")
print(f"Senior colleague's opinion:\n{opinion}\n")
print(f"Reason:\n{reason}\n")
print(f"Discharge Note:\n{discharge_note}\n")

Admission note:
Chief Complaint:
The patient presented to the Accident & Emergency Department with severe abdominal pain and nausea. 

History of Present Illness:
The patient reports experiencing intermittent upper abdominal pain for the past week. The pain is typically provoked by fatty meals and radiates towards the right shoulder. The patient noticed yellowing of the skin and eyes, along with dark-colored urine. There are no associated fevers, chills, or changes in bowel habits. 

Past Medical History:
The patient has a history of obesity, but there are no previous gastrointestinal issues. There is no known family history of gallbladder disease. 

Physical Examination:
Upon examination, the patient appears uncomfortable and pale. Vital signs reveal a mildly elevated temperature of 37.5°C, heart rate of 90 bpm, blood pressure of 130/80 mmHg, and respiratory rate of 18 breaths per minute. Abdominal examination reveals tenderness in the right upper quadrant with guarding. Murphy's sign

In [27]:
agree_count = 0
failed_count = 0

for id, persona in note_dict.items():
    opinion, reason, discharge_note = verify_admission_decision(persona['Admission_Note']) 
    print(opinion)
    if opinion in ['agree', 'disagree']:
        note_dict[id]['Admission_Note_agreement'] = opinion
        note_dict[id]['Admission_Note_agreement_reason'] = reason
        note_dict[id]['Admission_Note_if_discharged'] = discharge_note

        if opinion == 'agree':
            agree_count += 1
    else: 
        failed_count += 1

print("Number of records where senior colleague agrees: " + str(agree_count))
print("Number of records where agent evaluation failed: " + str(failed_count))



disagree
disagree
disagree
disagree
agree
agree
agree
agree
failed on json argument
disagree
agree
disagree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
failed on GPT response
disagree
agree
agree
disagree
agree
agree
agree
agree
disagree
agree
agree
agree
disagree
disagree
disagree
agree
disagree
agree
disagree
agree
agree
disagree
disagree
disagree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
agree
disagree
disagree
disagree
agree
agree
disagree
agree
disagree
disagree
agree
disagree
agree
agree
agree
agree
agree
agree
disagree
agree
agree
agree
agree
agree
agree
agree
agree
disagree
agree
agree
agree
Number of records where senior colleague agrees: 72
Number of records where agent evaluation failed: 2


In [43]:
write_to_json(note_dict, 'experiment_1.1')

Viewing the output, I note: 
* in 72 of 98 cases (excluding the two failed requests), the senior colleague agreed with the junior colleague. However, in some cases, the 'agree' decision is agreeing with a decision to discharge (eg 1609445) ie the agent has not fully understood the instruction. 

Specific cases
* 450178 and 1609445 both have UTIs. One is a young adult, for which the admission is overruled. The other is elderly for which admission is agreed. This seems appropriate.
* There are two hypertension cases (where the presenting condition is this) 2585524 a middle aged person, and 2631064 and elderly person. The senior colleague agrees with the admission for the elderly person, for whom the symptoms are more severe. Another case, 2777094 also elderly, is suggested to be more suited to discharge; a harsh decision perhaps given their age of 85+ (although note that the time of day of arrival was not given to the senior colleague in the prompt). Two other hypertension cases, 3543113 and 3546013, are deemed suitable for admission. 