Skip to content

Data and code for "Predictive Modeling on Data with Severe Class Imbalance: Applications on Electronic Health Records"

Notifications You must be signed in to change notification settings

topepo/ICHPS2015_Class_Imbalance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

This repository contains data and code for Workshop 4: Predictive Modeling on Data with Severe Class Imbalance: Applications on Electronic Health Records. The course was conducted for the International Conference on Health Policy Statistics (ICHPS) on Wed, Oct 7, from 10:15 AM - 12:15 PM.

Instructor(s): Birol Emir, Pfizer Inc and Columbia University; Max Kuhn, Pfizer Inc

Abstract:

Healthcare records are used more and more often for making health care decisions and policies. Particularly, Electronic Health Care (EHR) data are collected by either specialized private companies such as Humedica (US) and Cegedim THIN (UK) or publicly available such as Behavioral Risk Factor Surveillance System (BRFSS), and Health and Retirement Survey (HRS). EHR data are useful in understanding insights in patient management. As data has become more readily available, companies and institutions desire to harness this information for predictive purposes. Prediction of undiagnosed fibromyalgia (FM) patients, for example, seeks to uncover relationship between predictors such as demographics, healthcare resources and FM. In many cases, the event of interest is observed with relatively small frequencies, leading to a class imbalance that can confound modelers. This workshop discussed ways to mitigate the effects of severe class imbalances. The course outline is:

  • Description of the problem with class imbalances (with illustrative data)
  • A short refresher on predictive models, parameter tuning and resampling
  • A description of tree-based classification models (single models and ensembles)
  • Sampling methods for combating class imbalances
  • Cost-Sensitive learning methods.

Participants should have some experience with classification models (e.g. logistic regression, linear discriminant analysis, etc.). Although software is not explicitly described to solve the class imbalance issue, class participants will receive a copy of the illustrative data as well as R code to reproduce all of the analyses shown in the workshop.

The slides will not be posted here and were given during the session.

About

Data and code for "Predictive Modeling on Data with Severe Class Imbalance: Applications on Electronic Health Records"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages