GitHub

Addressing batch effect with ensemble learning

This repository stores all scripts to reproduce results and figures in the following manuscript:

Zhang, Y., Johnson, W. E., & Parmigiani, G. (2019). Robustifying genomic classifiers to batch effects via ensemble learning. bioRxiv, 703587.

Folders in this directory

Scripts are stored under ./code
Data used in our simulation and real data examples are stored as R data files under ./data
./figures contains all figures in the main article and in supplementary materials
./results_* contain result files generated by the pipelines for both simulations, and real data application example using 6 or 4 studies. They can be used to reproduce figures under ./figures

Reproduce results in the paper

Download this GitHub repository
In R, set the current working directory to the GitHub repository: setwd("<parent path>/bea_ensemble/")
Run ./code/make_pub_figures_mainpaper.R to reproduce figures in the main paper. Result files from the pipeline, which were used for the figures in the paper, are provided in ./results_*
Run ./code/make_pub_figures_supplement.R to reproduce results in the supplementary materials

Simulation

To run the simulation pipeline, execute the following code in command line:

Rscript 1_simpipe.R <sample size per batch> <mean batch effect> <variance batch effect>

This will output performance metrics of models in a sub-directory named ./results. This sub-directory will be created if it does not exist. The result files can be used to generate figures in the paper. The scripts can also be ran on HPC environment, with bash scripts as encluded in the code directory.

Real data application

./code/2_TB_getdata.R generates the data under ./data. Real data are downlowded from GEO using GEOquery, annotated, and cleaned by this script.

./code/3_real_data_pipe.R is used to perform bootstrap sampling on the test data, and evaluated the trained model on each bootstrap. Simply execute the pipeline with

Rscript 3_real_data_pipe.R

Model performance metrics will be output to a sub-directory named ./results_real (automatically created if not exist).

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
code		code
data		data
figures		figures
results_real_4studies		results_real_4studies
results_real_6studies		results_real_6studies
results_sim		results_sim
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

figures

figures

results_real_4studies

results_real_4studies

results_real_6studies

results_real_6studies

results_sim

results_sim

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Addressing batch effect with ensemble learning

Folders in this directory

Reproduce results in the paper

Simulation

Real data application

About

Releases

Packages

Languages

zhangyuqing/bea_ensemble

Folders and files

Latest commit

History

Repository files navigation

Addressing batch effect with ensemble learning

Folders in this directory

Reproduce results in the paper

Simulation

Real data application

About

Resources

Stars

Watchers

Forks

Languages