This is supervised clustering with integrated weighting of both GO semantic similarity and statistical association effect, and mine the mutational clusters with strong survival outcome stratifications.
# Docker installation
docker build -t supervised-clustering-survival:main .
# R installation
library(devtools)
install_github("tzhang-nmdp/Supervised-clustering-survival@main")
Rscript Supervised-clustering-survival/R/SCCW_supervised_clustering.R \
-i ${genomic_data}.RData \ # input matrix for genomic data
-o ***variant/gene \ # running model option ( 'variant' for variant level of common variant analysis, 'gene' for gene level of rare variant analysis)
-d ${outdir} \ # output directory
-k 5 # k_fold setting for cross-validation
Rscript Supervised-clustering-survival/R/SCCW_supervised_clustering.R \
-i Supervised-clustering-survival/Example/genomic_data_vcf.RData \ # input matrix for genomic data (simulated example for demo only)
-o test_gene \ # running model option ( 'variant' for variant level of common variant analysis, 'gene' for gene level of rare variant analysis)
-d Supervised-clustering-survival/Example \ # output directory
-k 5 # k_fold setting for cross-validation
- Clustering model file: ***.model.RData
- Model metrics file: ***.csv
- Survival plot file: ***.tiff
Please cite our paper in Journal of Hematology & Oncology: Whole-genome sequencing identifies novel predictors for hematopoietic cell transplant outcomes for patients with myelodysplastic syndrome: a CIBMTR study.