# Predict yet to arrive 

prepare a model that will predict the number of patients yet to arrive.

Inputs
- A series of times in the day at which we want to make these predictions is set 
- A series of dates on which we want to make these predictions
- A time window after the prediction time, within which we are interested in predicting a number of patients (eg 8 hours)

## Set up the notebook environment

In [1]:
# Reload functions every time
%load_ext autoreload 
%autoreload 2

In [2]:
from pathlib import Path
import sys
import json
import pandas as pd

PROJECT_ROOT = Path().home() 
USER_ROOT = Path().home() / 'work'

sys.path.append(str(USER_ROOT / 'patientflow' / 'src' / 'patientflow'))
sys.path.append(str(USER_ROOT / 'patientflow' / 'functions'))
sys.path.append(str(USER_ROOT / 'ed-predictor' / 'functions'))


model_file_path = PROJECT_ROOT /'data' / 'ed-predictor' / 'trained-models'
data_path = USER_ROOT / 'patientflow' / 'data-raw'
media_file_path = USER_ROOT / 'patientflow' / 'notebooks' / 'img'
media_file_path.mkdir(parents=True, exist_ok=True)

## Load parameters

These are set in config.json. You can change these for your own purposes. But the times of day will need to match those in the provided dataset if you want to run this notebook successfully.

In [3]:
uclh = False

In [4]:
from load_config import load_config_file

if uclh:
    config_path = Path(USER_ROOT / 'patientflow' / 'config-uclh.yaml')
else:
    config_path = Path(USER_ROOT / 'patientflow' / 'config.yaml')

params = load_config_file(config_path)

prediction_times = params[0]
start_training_set, start_validation_set, start_test_set, end_test_set = params[1:5]



## Load data

In [5]:
import pandas as pd
from load_data_utils import set_file_locations

if uclh:
    visits_path, visits_csv_path, yta_path, yta_csv_path = set_file_locations(uclh, data_path, config_path)
else:
    visits_csv_path, yta_csv_path = set_file_locations(uclh, data_path)

yta = pd.read_csv(yta_csv_path)



In [36]:
yta.head()

Unnamed: 0,training_validation_test,admission_datetime,sex,specialty,is_child
0,train,2030-06-13 14:33:22+00:00,F,haem/onc,False
1,train,2030-04-03 10:43:56+00:00,F,haem/onc,False
2,train,2030-04-12 13:47:06+00:00,F,haem/onc,False
3,train,2030-04-12 12:33:22+00:00,M,haem/onc,False
4,train,2030-03-29 16:39:00+00:00,F,surgical,False


## Separate into training, validation and test sets

As part of preparing the data, each visit has already been allocated into one of three sets - training, vaidation and test sets. 

In [37]:
yta.training_validation_test.value_counts()

training_validation_test
train    14071
test      4919
valid     1684
Name: count, dtype: int64

In [38]:
train_yta = yta[yta.training_validation_test == 'train']#.drop(columns='training_validation_test')
valid_yta = yta[yta.training_validation_test == 'valid']#.drop(columns='training_validation_test')
test_yta = yta[yta.training_validation_test == 'test']#.drop(columns='training_validation_test')

train_yta['arrival_datetime'] = pd.to_datetime(train_yta['arrival_datetime'], utc = True)
train_yta.set_index('arrival_datetime', inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train_df['admission_datetime'] = pd.to_datetime(train_df['admission_datetime'], utc = True)


In [39]:
isinstance(train_yta.index, pd.DatetimeIndex)

True

## Train the Poisson-Binomial model

In [27]:
from predict.emergency_demand.poisson_binomial_predictor import PoissonBinomialPredictor

### Train a model for all admission, irrespective of specialty of admission

In [40]:
from predict.emergency_demand.poisson_binomial_predictor import PoissonBinomialPredictor
from joblib import dump, load

model =  PoissonBinomialPredictor()

model.fit(train_yta, prediction_window, time_interval, prediction_times)

MODEL__ED_YETTOARRIVE__NAME = 'ed_yet_to_arrive_all_' + str(int(prediction_window/60)) + '_hours'
full_path = model_file_path / MODEL__ED_YETTOARRIVE__NAME 
full_path = full_path.with_suffix('.joblib')

dump(model, full_path)

Calculating time-varying arrival rates for data provided, which spans 520 days
Poisson Binomial Predictor trained for these times: [(6, 0), (9, 30), (12, 0), (15, 30), (22, 0)]
using prediction window of 480 minutes after the time of prediction
and time interval of 15 minutes within the prediction window.
The error value for prediction will be 1e-07
To see the weights saved by this model, used the get_weights() method


['/home/jovyan/data/ed-predictor/trained-models/ed_yet_to_arrive_all_8_hours.joblib']

In [41]:
weights = model.get_weights()


In [42]:
prediction_context = {
    'default': {
        'prediction_time': tuple([7, 0])  
    }
}

x1 = float(config['x1'])
y1 = float(config['y1'])
x2 = float(config['x2'])
y2 = float(config['y2'])

MODEL__ED_YETTOARRIVE__NAME = 'ed_yet_to_arrive_all_' + str(int(prediction_window/60)) + '_hours'
full_path = model_file_path / MODEL__ED_YETTOARRIVE__NAME 
full_path = full_path.with_suffix('.joblib')

model = load(full_path)

preds = model.predict(prediction_context, x1, y1, x2, y2)
preds



{'default':      agg_proba
 sum           
 0     0.014505
 1     0.061405
 2     0.129970
 3     0.183398
 4     0.194092
 ..         ...
 220   0.000000
 221   0.000000
 222   0.000000
 223   0.000000
 224   0.000000
 
 [225 rows x 1 columns]}

### Predict within specialty

In [47]:
train_yta

Unnamed: 0_level_0,training_validation_test,sex,specialty,is_child
admission_datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2030-06-13 14:33:22+00:00,train,F,haem/onc,False
2030-04-03 10:43:56+00:00,train,F,haem/onc,False
2030-04-12 13:47:06+00:00,train,F,haem/onc,False
2030-04-12 12:33:22+00:00,train,M,haem/onc,False
2030-03-29 16:39:00+00:00,train,F,surgical,False
...,...,...,...,...
2030-11-27 01:57:00+00:00,train,M,,False
2030-11-27 01:50:00+00:00,train,M,,False
2031-02-17 04:33:00+00:00,train,M,,False
2031-03-05 02:55:00+00:00,train,F,,False


In [53]:
from predict.emergency_demand.poisson_binomial_predictor import PoissonBinomialPredictor

specialty_filters = {
    'medical': {'specialty': 'medical', 'is_child': False},
    'surgical': {'specialty': 'surgical', 'is_child': False},
    'haem/onc': {'specialty': 'haem/onc', 'is_child': False},
    'paediatric': {'is_child': True}  # Pediatric doesn't filter by observed_specialty
}

model_by_spec =  PoissonBinomialPredictor(filters = specialty_filters)

model_by_spec.fit(train_yta, prediction_window, time_interval, prediction_times)


MODEL__ED_YETTOARRIVE__NAME = 'ed_yet_to_arrive_by_spec_' + str(int(prediction_window/60)) + '_hours'
full_path = model_file_path / MODEL__ED_YETTOARRIVE__NAME 
full_path = full_path.with_suffix('.joblib')

dump(model_by_spec, full_path)

{'medical': {'specialty': 'medical', 'is_child': False}, 'surgical': {'specialty': 'surgical', 'is_child': False}, 'haem/onc': {'specialty': 'haem/onc', 'is_child': False}, 'paediatric': {'is_child': True}}
Calculating time-varying arrival rates for data provided, which spans 519 days
Calculating time-varying arrival rates for data provided, which spans 520 days
Calculating time-varying arrival rates for data provided, which spans 519 days
Calculating time-varying arrival rates for data provided, which spans 519 days
Poisson Binomial Predictor trained for these times: [(6, 0), (9, 30), (12, 0), (15, 30), (22, 0)]
using prediction window of 480 minutes after the time of prediction
and time interval of 15 minutes within the prediction window.
The error value for prediction will be 1e-07
To see the weights saved by this model, used the get_weights() method


['/home/jovyan/data/ed-predictor/trained-models/ed_yet_to_arrive_by_spec_8_hours.joblib']

In [55]:
MODEL__ED_YETTOARRIVE__NAME = 'ed_yet_to_arrive_by_spec_' + str(int(prediction_window/60)) + '_hours'
full_path = model_file_path / MODEL__ED_YETTOARRIVE__NAME 
full_path = full_path.with_suffix('.joblib')

model_by_spec = load(full_path)

x1 = float(config['x1'])
y1 = float(config['y1'])
x2 = float(config['x2'])
y2 = float(config['y2'])

prediction_context = {
    'medical': {
        'prediction_time': tuple([7, 0])  
    }
}

preds = model_by_spec.predict(prediction_context, x1, y1, x2, y2)
preds['medical']



Unnamed: 0_level_0,agg_proba
sum,Unnamed: 1_level_1
0,0.145917
1,0.280849
2,0.270277
3,0.173402
4,0.083438
...,...
188,0.000000
189,0.000000
190,0.000000
191,0.000000
