# Modelling probability of admission to specialty, if admitted

This notebook demonstrates the second stage of prediction, to generate a probability of admission to a specialty for each patient in the ED if they are admitted. 

Here consult sequences provide the input to prediction, and the model is trained only on visits by adult patients that ended in admission. Patients less than 18 at the time of arrival to the ED are assumed to be admitted to paediatric wards. This assumption could be relaxed by changing the training data to include children, and changing how the inference stage is done. 

This approach assumes that, if admitted, a patient's probability of admission to any particular specialty is independent of their probability of admission to hospital. 

## Set up the notebook environment

In [1]:
# Reload functions every time
%load_ext autoreload 
%autoreload 2

In [12]:
from patientflow.load import set_project_root
project_root = set_project_root()




Inferred project root: /Users/zellaking/Repos/patientflow


## Load parameters and set file paths, and load data

In [3]:
import pandas as pd
from patientflow.load import load_data
from patientflow.load import set_file_paths

# set file paths
data_folder_name = 'data-public'
data_file_path = project_root / data_folder_name

data_file_path, media_file_path, model_file_path, config_path = set_file_paths(project_root, 
               data_folder_name=data_folder_name)

# load data
ed_visits = load_data(data_file_path, 
                    file_name='ed_visits.csv', 
                    index_column = 'snapshot_id',
                    sort_columns = ["visit_number", "snapshot_date", "prediction_time"], 
                    eval_columns = ["prediction_time", "consultation_sequence", "final_sequence"])

# load params
from patientflow.load import load_config_file
params = load_config_file(config_path)

start_training_set, start_validation_set, start_test_set, end_test_set = params["start_training_set"], params["start_validation_set"], params["start_test_set"], params["end_test_set"]

Configuration will be loaded from: /Users/zellaking/Repos/patientflow/config.yaml
Data files will be loaded from: /Users/zellaking/Repos/patientflow/data-public
Trained models will be saved to: /Users/zellaking/Repos/patientflow/trained-models/public
Images will be saved to: /Users/zellaking/Repos/patientflow/trained-models/public/media


## Train the model

This is the function that trains the specialty model, loaded from a file. Below we will break it down step-by-step.

In [4]:
from patientflow.train.emergency_demand import train_specialty_model, get_default_visits
??train_specialty_model

[0;31mSignature:[0m
[0mtrain_specialty_model[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mtrain_visits[0m[0;34m:[0m [0mpandas[0m[0;34m.[0m[0mcore[0m[0;34m.[0m[0mframe[0m[0;34m.[0m[0mDataFrame[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmodel_name[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmodel_metadata[0m[0;34m:[0m [0mDict[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mAny[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0muclh[0m[0;34m:[0m [0mbool[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mvisit_col[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minput_var[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mgrouping_var[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moutcome_var[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0mTuple[0m[0;34m[[0m[0mDict[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mAny[0m[0;34m][0m

The first step in the function above is to handle the fact that there are multiple snapshots per visit and we only want one for each visit in the training set. 

In [31]:
from patientflow.prepare import select_one_snapshot_per_visit

visits_single = select_one_snapshot_per_visit(ed_visits, visit_col = 'visit_number')

print(ed_visits.shape)
print(visits_single.shape)

(79814, 69)
(64497, 68)


To train the specialty model, we only use a subset of the columns. Here we can see the relevant columns

In [6]:
display(visits_single[['consultation_sequence', 'final_sequence', 'specialty', 'is_admitted', 'age_group']].head(10))


Unnamed: 0_level_0,consultation_sequence,final_sequence,specialty,is_admitted,age_group
snapshot_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,[],[],medical,False,55-64
2,[],[],surgical,False,75-102
3,[],[],medical,False,35-44
5,['haem_onc'],['haem_onc'],haem/onc,False,65-74
7,['surgical'],['surgical'],surgical,False,25-34
10,[],['haem_onc'],medical,False,65-74
11,['haem_onc'],['haem_onc'],medical,False,75-102
12,['haem_onc'],['haem_onc'],haem/onc,False,75-102
13,[],[],haem/onc,False,75-102
15,['ambulatory'],['ambulatory'],,False,0-17


We filter down to only include admitted patients, and remove any with a null value for the specialty column, since this is the model aims to predict. 

In [7]:
admitted = visits_single[
    (visits_single.is_admitted) & ~(visits_single.specialty.isnull())
]

Note that some visits that ended in admission had no consult request at the time they were sampled, as we can see below, where visits have an empty tuple

In [82]:
display(admitted[['consultation_sequence', 'final_sequence', 'specialty', 'is_admitted', 'age_group']].head(10))
    


Unnamed: 0_level_0,consultation_sequence,final_sequence,specialty,is_admitted,age_group
snapshot_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
20,['surgical'],"['surgical', 'surgical']",surgical,True,45-54
58,['surgical'],['surgical'],surgical,True,35-44
77,[],['acute'],medical,True,65-74
117,[],['surgical'],surgical,True,35-44
125,['surgical'],['surgical'],surgical,True,25-34
128,['surgical'],['surgical'],surgical,True,75-102
141,[],['surgical'],surgical,True,65-74
163,['acute'],['acute'],medical,True,65-74
176,[],['surgical'],medical,True,75-102
227,[],['paeds'],paediatric,True,0-17


The UCLH data (not shared publicly) includes more detailed data on consult type, as shown in the `code` column in the dataset below. The public data has been simplified to a higher level (identified in the mapping below as `type`). 

In [83]:
from pathlib import Path
model_input_path = project_root / 'src' /  'patientflow'/ 'model-input'
name_mapping = pd.read_csv(str(model_input_path) + '/consults-mapping.csv')
name_mapping

Unnamed: 0,id,code,name,type
0,1,CON124,Inpatient consult to Neuro Ophthalmology,neuro
1,2,CON9,Inpatient consult to Neurology,neuro
2,3,CON34,Inpatient consult to Dietetics (N&D) - Not TPN,allied
3,4,CON134,Inpatient consult to PERRT,icu
4,5,CON163,IP Consult to MCC Complementary Therapy Team,pain
...,...,...,...,...
111,112,CON77,Inpatient consult to Paediatric Allergy,paeds
112,113,CON168,Inpatient consult to Acute Oncology Service,haem_onc
113,114,CON84,Inpatient consult to Paediatric Hematology - C...,haem_onc
114,115,CON122,Inpatient consult to Paediatric Epilepsy Service,paeds


For example, the code for a consult with Acute Medicine is convered to a more general category in the public dataset

In [84]:
name_mapping[name_mapping.code == 'CON157']

Unnamed: 0,id,code,name,type
14,15,CON157,Inpatient consult to Acute Medicine,acute


The medical group includes many of the more specific types

In [85]:
name_mapping[name_mapping.type == 'medical']

Unnamed: 0,id,code,name,type
7,8,CON165,Inpatient consult to Nutrition Team (TPN),medical
10,11,CON54,Inpatient consult to Respiratory Medicine,medical
12,13,CON43,Inpatient consult to Cardiology,medical
15,16,CON5,Inpatient consult to Infectious Diseases,medical
17,18,CON132,Inpatient consult to Adult Diabetes CNS,medical
33,34,CON68,Inpatient consult to Gastroenterology,medical
37,38,CON60,Inpatient consult to Endocrinology,medical
48,49,CON156,Inpatient consult to Adult Endocrine & Diabetes,medical
62,63,CON44,Inpatient consult to Rheumatology,medical
66,67,CON147,Inpatient consult to Cardiac Rehabilitation,medical


## Separate into training, validation and test sets

As part of preparing the data, each visit has already been allocated into one of three sets - training, vaidation and test sets. 


In [86]:
from patientflow.prepare import create_temporal_splits

# note that we derive the training set from visits_single, as the SequencePredictor() does the preprocessing mentioned above
train_visits, _, _ = create_temporal_splits(
    visits_single,
    start_training_set,
    start_validation_set,
    start_test_set,
    end_test_set,
    col_name="snapshot_date",
)

Split sizes: [42852, 5405, 16240]


## Train the model

Here, we load the SequencePredictor(), a function that takes a sequence as input (in this case consultation_sequence), a grouping variable (in this case final_sequence) and a outcome variable (in this case specialty), and uses a grouping variable to create a rooted directed tree. Each new consult in the sequence is a branching node of the tree. The grouping variable, final sequence, serves as the terminal nodes of the tree. The function maps the probability of each part-complete sequence of consults ending (via each final_sequence) in each specialty of admission.

In [129]:
from patientflow.predictors.sequence_predictor import SequencePredictor

spec_model = SequencePredictor(
    input_var="consultation_sequence",
    grouping_var="final_sequence",
    outcome_var="specialty",
    apply_special_category_filtering=False,
)

spec_model.fit(train_visits)



Meta data about the model can be viewed in the metrics object

In [133]:
spec_model.metrics

{'train_dttm': '2025-03-20 16:30',
 'train_set_no': 42852,
 'start_date': '3/1/2031',
 'end_date': '8/9/2031'}

Passing an empty tuple to the trained model shows the probability of ending in each specialty, if a visit has had no consults yet. 

In [130]:
print("For a visit which has no consult at the time of a snapsnot, the probabilities of ending up under a medical, surgical or haem/onc specialty are shown below")
print({k: round(v, 3) for k, v in spec_model.predict(tuple()) .items()})

    


For a visit which has no consult at the time of a snapsnot, the probabilities of ending up under a medical, surgical or haem/onc specialty are shown below
{'surgical': 0.258, 'medical': 0.574, 'paediatric': 0.078, 'haem/onc': 0.09}


The probabilities for each consult sequence ending in a given observed specialty have been saved in the model. These can be accessed as follows: 

In [89]:
weights = spec_model.weights
print("For a visit which has one consult to acute medicine at the time of a snapsnot, the probabilities of ending up under a medical, surgical or haem/onc specialty are shown below")
print({k: round(v, 3) for k, v in weights[tuple(['acute'])].items()})


For a visit which has one consult to acute medicine at the time of a snapsnot, the probabilities of ending up under a medical, surgical or haem/onc specialty are shown below
{'surgical': 0.013, 'medical': 0.946, 'paediatric': 0.002, 'haem/onc': 0.039}


The intermediate mapping of consultation_sequence to final_sequence can be accessed from the trained model like this. The first row shows the probability of a null sequence (ie no consults yet) ending in any of the final_sequence options. 

In [47]:
spec_model.input_to_grouping_probs

final_sequence,(),"('acute',)","('acute', 'acute')","('acute', 'acute', 'medical')","('acute', 'acute', 'medical', 'surgical')","('acute', 'acute', 'mental_health')","('acute', 'acute', 'palliative')","('acute', 'acute', 'surgical')","('acute', 'allied')","('acute', 'allied', 'acute')",...,"('surgical', 'surgical')","('surgical', 'surgical', 'acute')","('surgical', 'surgical', 'acute', 'mental_health', 'discharge', 'discharge')","('surgical', 'surgical', 'acute', 'surgical')","('surgical', 'surgical', 'icu')","('surgical', 'surgical', 'medical')","('surgical', 'surgical', 'obs_gyn')","('surgical', 'surgical', 'other')","('surgical', 'surgical', 'surgical')",probability_of_grouping_sequence
consultation_sequence,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
(),0.009819,0.458837,0.013218,0.000755,0.000378,0.000755,0.000000,0.000378,0.005665,0.000378,...,0.007553,0.000755,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000378,0.534194
"('acute',)",0.000000,0.830409,0.005848,0.000000,0.000000,0.000000,0.000000,0.000000,0.014620,0.000000,...,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.206980
"('acute', 'acute')",0.000000,0.000000,0.909091,0.045455,0.000000,0.000000,0.045455,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.004438
"('acute', 'allied')",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000202
"('acute', 'ambulatory')",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000403
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"('surgical', 'medical')",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000403
"('surgical', 'obs_gyn')",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000403
"('surgical', 'surgical')",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.814815,0.000000,0.037037,0.0,0.0,0.037037,0.037037,0.037037,0.037037,0.005447
"('surgical', 'surgical', 'acute')",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.500000,0.000000,0.5,0.0,0.000000,0.000000,0.000000,0.000000,0.000403


In [91]:
# save models and metadata
from patientflow.train.emergency_demand import save_model, save_metadata

save_model(spec_model, "specialty_no_filtering", model_file_path)
print(f"Model has been saved to {model_file_path}")

Model has been saved to /Users/zellaking/Repos/patientflow/trained-models/public


## Handle special categories

At UCLH, we assume that all under 18s will be admitted to a paediatric specialty. Their visits are therefore used to train the specialty predictor. An `apply_special_category_filtering` parameter can be set in the `SequencePredictor` to handle certain categories differently. When this is set, the `SequencePredictor` will retrieve the relevant logic that has been defined in a class called `SpecialCategoryParams`. 

In [127]:
train_visits.snapshot_date.min()

'3/1/2031'

In [128]:
spec_model= SequencePredictor(
    input_var="consultation_sequence",
    grouping_var="final_sequence",
    outcome_var="specialty",
    apply_special_category_filtering=True,
)

spec_model.fit(train_visits)

weights = spec_model.weights
print("For a visit which has no consult at the time of a snapsnot, the probabilities of ending up under a medical, surgical or haem/onc specialty are shown below")
print({k: round(v, 3) for k, v in spec_model.predict(tuple()) .items()})
print("For a visit which has one consult to acute medicine at the time of a snapsnot, the probabilities of ending up under a medical, surgical or haem/onc specialty are shown below")
print({k: round(v, 3) for k, v in weights[tuple(['acute'])].items()})

save_model(spec_model, "specialty", model_file_path)



TypeError: unhashable type: 'list'

The handling of special categories is saved as an attribute of the trained model as shown below.

In [95]:
spec_model.special_params

{'special_category_func': <bound method SpecialCategoryParams.special_category_func of <patientflow.prepare.SpecialCategoryParams object at 0x28a867ef0>>,
 'special_category_dict': {'medical': 0.0,
  'surgical': 0.0,
  'haem/onc': 0.0,
  'paediatric': 1.0},
 'special_func_map': {'paediatric': <bound method SpecialCategoryParams.special_category_func of <patientflow.prepare.SpecialCategoryParams object at 0x28a867ef0>>,
  'default': <bound method SpecialCategoryParams.opposite_special_category_func of <patientflow.prepare.SpecialCategoryParams object at 0x28a867ef0>>}}