# Experiment 1

## Goal
* Learn how to call ChatGPT API and format output
* Use ChatGPT to generate some clinical notes to accompany synthetic A&E data

## Dataset 

NHS England prepared a synthetic dataset of A&E presentations. 

A blog post about it is here https://open-innovations.org/blog/2019-01-24-exploring-methods-for-creating-synthetic-a-e-data. 

The dataset can be accessed from this website https://data.england.nhs.uk/dataset/a-e-synthetic-data/resource/81b068e5-6501-4840-a880-a8e7aa56890e#

Unfortunately, there is no data dictionary. Therefore, here I only use data which can be intuively understood ie:  

We will use these initially
* Age_Band
* AE_Arrive_HourOfDay
* AE_Time_Mins
* Admitted_Flag - assuming 1 means yes
* ICD10_Chapter_Code - for more see https://en.wikipedia.org/wiki/ICD-10
* Length_Of_Stay_Days


## Set up

In [15]:
# Reload functions every time
%load_ext autoreload 
%autoreload 2

In [32]:
# Load libraries
import sys
import os
import pandas as pd
from pathlib import Path


# Import the variables that have been set in the init.py folder in the root directory
# These include a constant called PROJECT_ROOT which stores the absolute path to this folder
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), os.pardir)))
import init
PROJECT_ROOT = os.getenv("PROJECT_ROOT")

# Import function to load data
from src.data_ingestion.load_data import load_file




## Load data

Here load_file()  

* check if the NHS England datset has already been saved in a local folder called data_store in a parquet format; is so return it
* if not, it checks for the zip file from the NHSE website and if not downloads it, unzips it and saves it 
* the file is unzipped to csv and read as as pandas dataframe and saved to parquet format

In [35]:
ed = load_file()
ed.head()

Loaded parquet file, shape: (5256619, 21)
Loaded parquet file, shape: (5256619, 21)
Loaded parquet file, shape: (5256619, 21)
Loaded parquet file, shape: (5256619, 21)
Loaded parquet file, shape: (5256619, 21)
Loaded parquet file, shape: (5256619, 21)


Unnamed: 0,IMD_Decile_From_LSOA,Age_Band,Sex,AE_Arrive_Date,AE_Arrive_HourOfDay,AE_Time_Mins,AE_HRG,AE_Num_Diagnoses,AE_Num_Investigations,AE_Num_Treatments,...,Provider_Patient_Distance_Miles,ProvID,Admitted_Flag,Admission_Method,ICD10_Chapter_Code,Treatment_Function_Code,Length_Of_Stay_Days,Chapter,Block,Title
0,4.0,85+,2.0,2015-09-30 00:00:00,13-16,240,Low,1,10,10,...,2.0,15310,1.0,21,IX,OTHER,16.0,IX,I00–I99,Diseases of the circulatory system
1,9.0,65-84,1.0,2016-07-23 00:00:00,17-20,770,Low,1,2,2,...,4.0,15288,1.0,21,IX,300,7.0,IX,I00–I99,Diseases of the circulatory system
2,8.0,45-64,2.0,2017-05-26 00:00:00,09-12,90,High,0,8,5,...,3.0,15321,1.0,21,IX,OTHER,15.0,IX,I00–I99,Diseases of the circulatory system
3,10.0,25-44,1.0,2014-04-02 00:00:00,13-16,420,High,2,4,10,...,3.0,15297,1.0,21,IX,OTHER,2.0,IX,I00–I99,Diseases of the circulatory system
4,2.0,65-84,1.0,2014-06-08 00:00:00,13-16,230,Low,1,3,2,...,2.0,15336,1.0,21,IX,300,5.0,IX,I00–I99,Diseases of the circulatory system


As noted above, we will only use the following columns initially:

In [37]:
ed[['Age_Band', 'AE_Arrive_HourOfDay', 'AE_Time_Mins', 'Admitted_Flag', 'Length_Of_Stay_Days', 'ICD10_Chapter_Code', 'Title']]

Unnamed: 0,Age_Band,AE_Arrive_HourOfDay,AE_Time_Mins,Admitted_Flag,Length_Of_Stay_Days,ICD10_Chapter_Code,Title
0,85+,13-16,240,1.0,16.0,IX,Diseases of the circulatory system
1,65-84,17-20,770,1.0,7.0,IX,Diseases of the circulatory system
2,45-64,09-12,90,1.0,15.0,IX,Diseases of the circulatory system
3,25-44,13-16,420,1.0,2.0,IX,Diseases of the circulatory system
4,65-84,13-16,230,1.0,5.0,IX,Diseases of the circulatory system
...,...,...,...,...,...,...,...
5256614,45-64,17-20,220,1.0,43.0,XVIII,"Symptoms, signs and abnormal clinical and labo..."
5256615,65-84,09-12,140,1.0,8.0,XVIII,"Symptoms, signs and abnormal clinical and labo..."
5256616,85+,21-24,240,1.0,4.0,XVIII,"Symptoms, signs and abnormal clinical and labo..."
5256617,65-84,13-16,510,1.0,21.0,XVIII,"Symptoms, signs and abnormal clinical and labo..."
