Skip to content

tufts-ml/single-hospital-count-forecasting

Repository files navigation

Welcome!

This repository hosts code and datasets related to forecasting counts of resource utilization at a single hospital site, given a past time-series history of counts at that site. This code might be a start to help answering questions like: "How many people will be admitted on each day for the next two weeks? How many beds will be in use?" This is a collaborative research project between researchers in the Department of Computer Science at Tufts University and at Tufts Medical Center.

See our manuscript.

Alexandra Hope Lee, Panagiotis Lymperopoulos, Joshua T. Cohen, John B. Wong, and Michael C. Hughes. Forecasting COVID-19 counts at a single hospital: A Hierarchical Bayesian approach. In ICLR 2021 Workshop on Machine Learning for Preventing and Combating Pandemics, 2021. PDF URL: https://arxiv.org/pdf/2104.09327.pdf

Jump to: Project Goals   Limitations   Quickstart Guide   Repository Contents   Installation Guide   Datasets Guide   How to Run Experiments

Project Goals

We consider the problem of forecasting the daily number of hospitalized COVID-19 patients at a single hospital site, in order to help administrators with logistics and planning.

We develop several candidate hierarchical Bayesian models which can:

  • capture the count nature of data via either the generalized Poisson likelihood (recommended) or the standard Poisson likelihood
  • model time-series dependencies via two kinds of latent process: autoregressive and Gaussian process
  • share statistical strength across related sites

In our workshop paper we demonstrated our approach on several public datasets:

  • 8 hospitals in Massachusetts, U.S.A.
  • 10 hospitals in the United Kingdom

Further prospective evaluation in our manuscript compares our approach favorably to baselines currently used by stakeholders at 3 related hospitals to forecast 2-week-ahead demand by rescaling state-level forecasts.

Limitations

There are some serious limitations to our approach, which are thoroughly reviewed in our manuscript. Assuming the future is like the past is always a frail assumption, and the data we assume is available to do forecasting is quite limited. Nevertheless, we believe our probabilistic models can help communicate uncertainty and be a starting point for helping make challenging decisions from limited data.

Quickstart Guide

To create a set of forecasts for a single site:

  1. Checkout this repository on your local machine
  2. Install and activate conda environment (see Installation Guide)
  3. Create a subdirectory called gar_samples
  4. To produce a forecast for Tufts Medical Center given our available data from summer 2020, you can do:
python run_simple_forecast.py -a ../datasets/mass_dot_gov/tufts_medical_center_2020-04-29_to_2020-07-06.csv

Expected output from running forecasts for Tufts Medical Center:

Example forecast plot for Tufts Medical Center

Repository Contents

Files and directories in this repository:

  • notebooks - Jupyter notebooks with detailed specifications of the models and how they translate to code.
  • datasets/ - CSV files with data used for experiments.
  • src/
    • arg_types.py - Checks that filenames specified by command-line arguments have proper suffixes.
    • gar_forecast.py - Makes future forecasts using single-site GAR model.
    • gar_grid_search.py - Performs grid search for single-site GAR model.
    • GenPoisson.py - Defines a Generalized Poisson distribution as a PyMC3 custom Discrete distribution.
    • GenPoissonAutoregression.py - Defines a generalized autoregressive (GAR) model with Generalized Poisson likelihood.
    • GenPoissonGaussianProcess.py - Defines a generalized Gaussian Process (GGP) model with Generalized Poisson likelihood.
    • ggp_forecast.py - Makes future forecasts using single-site GGP model.
    • ggp_grid_search.py - Performs grid search for GGP model.
    • grid_search.py - Launches grid searches for single-site models.
    • multi_site_gar.py - Trains, evaluates, and makes forecasts for multi-site GAR model.
    • plot_forecasts.py - Plots summary statistics of forecasts against true observed counts.
    • poisson_vs_genpoisson.py - Compares standard and generalized Poisson likelihoods on our model.
    • run_simple_forecast.py - Launches forecasting for single-site models.

Installation

Requirements

  • Anaconda 4.8 or higher
  • Python 3.6+

Follow the two steps below to install everything on your local system.

These have been successfully tested on TODO LIST PLATFORMS as of 2021-04-06.

Step 1) Install conda to manage your environment and packages

Links for installation of "minimal" version of conda:

https://docs.conda.io/en/latest/miniconda.html

Make sure this will edit your PATH (should be automatic on OS X and Linux).

Step 2) Using conda, create the project specific environment (includes all python packages needed)

We use the included YAML specification file: site_level_forecaster.yml

To install, just open any terminal, then do:

$ conda env create -f site_level_forecaster.yml

How We Ran Our Experiments

Required Libraries

Experiment #1: Standard vs Generalized Poisson

On each dataset, trains and scores GAR model with W=1 first using Standard Poisson likelihood, then using Generalized Poisson likelihood.

Reads all files in the directory specified on line 16 of the script. To change the directory name and target column name, modify lines 16, 18, and 28.

Command

python poisson_vs_genpoisson.py

Dependencies

  • GenPoisson.py
  • GenPoissonAutoregression.py

Output

  • Heldout log likelihood for each model to standard output

Experiment #2: Single-Site GGP vs GAR

Divides sequence of counts into training, validation, and test windows. Runs a grid search over a predefined set of hyperparameters for each model, evaluating on validation set. Takes the best parameters and trains the training and validation set together, then makes forecasts on the test window.

Command

python grid_search.py <input_csv_file>

  • <input_csv_file> must have a column date with dates in ISO format, and a target column with integer counts.
  • Use the flag -a to only run the GAR, or the flag -g to only run the GGP. Otherwise defaults to running both models and producing side-by-side plots.

Optional arguments and their defaults

-c, --target_col_name       'hospitalized_total_covid_patients_suspected_and_confirmed_including_icu'
-m, --gar_model_file        'gar_model.json'
-o, --ggp_model_file        'ggp_model.json'
-p, --performance_plot_file 'performance.pdf'
-f, --forecast_plot_file    'heldout_forecasts.pdf'

Dependencies

  • gar_grid_search.py
  • ggp_grid_search.py
  • GenPoissonAutoregression.py
  • GenPoissonGaussianProcess.py
  • GenPoisson.py
  • arg_types.py
  • plot_forecasts.py

Output

  • Plot of heldout log likelihood for each window size (GAR)
  • Plot of heldout log likelihood for each timescale prior mean (GGP)
  • JSON file for each model with best model parameters found
  • Plot of summary statistics of forecasts against true observed counts

Experiment #3: Multi-Site GAR

Trains and scores multi-site GAR model and makes forecasts on test window.

Reads all files in directory specified on line 20 of the script. To change the directory name and target column name, modify lines 20, 23, and 33. CSV files must have a column named date with dates in ISO format, and a target column with integer counts. Saves traceplot and forecast plots in directory specified on lines 80 and 125 of the script.

Command

python multi_site_gar.py

Dependencies

  • GenPoisson.py
  • plot_forecasts.py

Output

  • Heldout log likelihood for each dataset to standard output
  • Traceplot of MCMC chains for key variables
  • Forecast plot for each dataset

Single-Site Future Forecasting

After running Experiment #2, use resulting JSON files with best model parameters to make future forecasts.

Command

python run_simple_forecast.py <input_csv_file>

  • <input_csv_file> must have a column date with dates in ISO format, and a target column with integer counts.
  • Use the flag -a to only run the GAR, or the flag -g to only run the GGP. Otherwise defaults to running both models and producing side-by-side plots.

Optional arguments and their defaults

-c, --target_col_name           'hospitalized_total_covid_patients_suspected_and_confirmed_including_icu'
-m, --gar_model_file            'gar_model.json'
-f, --ggp_model_file            'ggp_model.json'
-o, --gar_csv_file_pattern      'gar_samples/output-*.csv'
-u, --ggp_csv_file_pattern      'ggp_samples/output-*.csv'
-s, --day_forecasts_start       day after last day of data
-d, --n_days_ahead              14
-p, --plot_file                 'forecasts.pdf'

Dependencies

  • gar_forecast.py
  • ggp_forecast.py
  • GenPoissonAutoregression.py
  • GenPoissonGaussianProcess.py
  • GenPoisson.py
  • arg_types.py
  • plot_forecasts.py

Output

  • Plot of summary statistics of forecasts
  • CSV file for each sampled set of forecasts

About

Methods to forecast counts of hospital utilization at a single hospital site during COVID-19 pandemic.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published