Skip to content

zhenchenwang/latent_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Latent Model Supplementary Code

This repository contains the supplementary code for the paper: "Generating High-Fidelity Synthetic Patient Data for Assessing Machine Learning Healthcare Software" by Allan Tucker, Zhenchen Wang, Ylenia Rotalinti, and Puja Myles.

The code is used to perform a latent variable model analysis, generate synthetic patient data, and compare it with the original ground truth data.

Project Structure

The project is organized into the following directories:

  • data/: This directory should contain the input data files. You need to place the cvdgt.txt file here.
  • results/: This directory will store the output files generated by the scripts, including intermediate data samples, tables, and figures.
  • scripts/: This directory contains the R scripts for running the analysis.

Requirements

The following R packages are required to run the scripts:

  • bnlearn
  • pcalg
  • LaplacesDemon
  • Rgraphviz
  • ggplot2
  • gridExtra
  • pracma
  • missForest
  • gRain
  • cluster
  • arules
  • RevoScaleR
  • summarytools
  • kernlab
  • dplyr
  • SuperLearner
  • precrec
  • lmtest

You can install these packages using the install.packages() function in R. For example:

install.packages(c("bnlearn", "pcalg", "LaplacesDemon", "Rgraphviz", "ggplot2", "gridExtra", "pracma", "missForest", "gRain", "cluster", "arules", "RevoScaleR", "summarytools", "kernlab", "dplyr", "SuperLearner", "precrec", "lmtest"))

Note: Rgraphviz requires additional installation steps from Bioconductor. Please refer to the Bioconductor website for instructions.

How to Run

  1. Place your data: Put the cvdgt.txt file into the data/ directory.
  2. Configure the analysis: Open the scripts/config.R file and adjust the parameters if needed. The default settings are based on the original study.
  3. Run the latent model script: Execute the scripts/latentModel.R script. This script will perform the main analysis and generate the ground truth and synthetic data samples in the results/ directory.
    Rscript scripts/latentModel.R
  4. Run the tables and figures script: Execute the scripts/tables_figures.R script. This script will perform the comparison analysis and generate the tables and figures. The function calls at the end of the script are commented out. You can uncomment them to run the specific experiments you are interested in.
    Rscript scripts/tables_figures.R

Testing

This project uses the testthat package for unit testing. The tests are located in the tests/testthat/ directory.

Installing testthat

If you don't have testthat installed, you can install it from CRAN:

install.packages("testthat")

Running Tests

To run all the unit tests, you can execute the run_tests.R script from the project root directory:

Rscript tests/run_tests.R

Scripts Description

  • scripts/config.R: This file contains all the configuration parameters for the analysis, such as file paths and model parameters.
  • scripts/latentModel.R: This is the main script for the latent variable model analysis. It loads the data, learns the model, and generates synthetic data.
  • scripts/tables_figures.R: This script is used to generate the tables and figures that compare the synthetic data with the ground truth data. It includes various statistical tests and visualizations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages