How to use iGenSig-Rx R modules to build multi-omics models for predicting therapeutic responses based on GDSC dataset

title	author	date	output
iGenSig-Rx	SanghoonLee	2024-03-26	html_document

knitr::opts_chunk$set(echo = TRUE)

How to use iGenSig-Rx R modules to build multi-omics models for predicting therapeutic responses based on GDSC dataset

iGenSig-Rx: an integral genomic signature based white-box method for modeling clinical therapeutic responses using genome-wide sequencing data

• The current version of iGenSig-Rx is beta 3.2.3 (March 11th, 2024).

• The iGenSig-Rx was built on R version 4.1.0 and tested on Linux and Windows environment.

A. Introduction

• Here we developed an integral genomic signature (iGenSig-Rx) analysis as a new class of transparent, interpretable, and resilient modeling methods for big data-based precision medicine with improved cross-dataset applicability and tolerance to sequencing bias.

• We define genomic features that significantly correlate with a clinical phenotype (such as therapeutic response) as genomic correlates, and an integral genomic signature as the integral set of redundant genomic correlates for a given clinical phenotype such as therapeutic response.

• We postulate that the redundant genomic features, which are usually eliminated through dimensionality reduction or feature removal during multi-omics modeling, may help overcome the sequencing bias.

• Using genomic dataset for chemical perturbations, we developed the iGenSig-Rx models predicting clinical response to HER2-targeted therapy and tested the cross-dataset performance of selected models based on independent multi-omics datasets for breast cancer HER2-positive patients assessing their therapeutic responses.

B. How to build iGenSig-Rx models based on CALGB dataset and predict the therapeutic response in ACOSOG and NOAH datasets.

Find "iGenSigRx_RunCalculation_b3.2.3.R" file in your folder. This file contains the script to perform iGenSig-Rx modeling.
the iGenSig-Rx module has been tested on the latest R version 4.3.2 with MacOS Sonoma 14.4
The iGenSig-Rx runs for one permutation and whole tranining set, instead of all 10 permutations.
Total running time on Windows 10 was ~ 2 hours. It can vary according to your computer's spec.

C. Installing R packages

install.packages("devtools")
library(devtools)

devtools::install_github("wangxlab/iGenSig-Rx")
library(iGenSig-Rx)

D. Running iGenSig-Rx

Step1. Setting parameters and output directory

the most important parameter is the sen.weightcut and res.weightcut. This parameter represents the signal to noise threshold. We have tested sen.weightcut and res.weightcut in the range 0.05~2.5. We found that 0.13 was the best to obtain signal against noise.
the power can be set between 1 or 2, and ECNpenalty can be set as 0.5 or 1. We used an universal parameter root=1, power=1, and ECNpenalty=1 in the manuscript.

sen.weightcut<- 0.13; res.weightcut<- 0.13; root<- 1; ECNpenalty<- 1; power=1

#Define the parameters dictionary
parameters=list(genSig.outprefix="pearson",feature.select=NULL, confound=NULL,fold.assign=fold.assign,
                 FoldAssignFile=FoldAssignFile,method="pearson",
                 redun.method="ochiai",redun.cut=0.1,minsize=10,sen.weightcut=sen.weightcut,
                 res.weightcut=res.weightcut,sen.pcut=1,res.pcut=1,power=power,root=root,
                 rm.equivocal=TRUE,highlevel.minsize=15,ECNpenalty=ECNpenalty,by.cluster=FALSE,
                 dgensig.method="math") 
list2env(parameters, .GlobalEnv)
OutTopDir<-"YourOutputDirectory"     ## <<<=========== You should set this up.
dir.create(OutTopDir) 
SubFolder<-paste0(OutTopDir,"/",sen.weightcut,"_RWC",res.weightcut,"_rt",root,"_ENCpn",ECNpenalty,"_Pw", power)
dir.create(SubFolder)
folder.prefix=paste0("iGenSig_CALGB")  
CALGB.gensigdir=paste(SubFolder, "/",  folder.prefix,sep="")
dir.create(CALGB.gensigdir)  # This is output sub-directory

#Save the parameters to a file
save(parameters,file=paste0(CALGB.gensigdir,"/iGenSig.parameters.rda"))

Step2. Calculate iGenSig-Rx score based on similarity method: CALGB clinical trial

CALGB clinical data is restricted by data use agreement, thus we can't release the clincal data to the public. Users can obtain the data from dbGaP using accession: phs001570**

Perform weighted K-S tests for each permuted training/testing set or all CALGB subjects as training set

If you want to calculate iGenSig-Rx scores for 10 permutations, please run the for look like "for (i in 1:nrow(fold.assign)) {"

for (i in c(1)) {
      outprefix=paste0("trainset_", genSig.outprefix,"_Fold",rownames(fold.assign)[i])
      trainset=colnames(fold.assign)[which(fold.assign[i,]==0)]
      trainset=trainset[trainset %in% CALGB.phenoData$SUBJECT_ID]
      weightfile=paste0(CALGB.gensigdir,"/",outprefix,"_Weight.xls")
      if (file.exists(weightfile)) {
          print(paste("calcluating Gensig using",weightfile))
          batchCal.GenSig.similarity(subject.sensitive=NULL,feature.select=feature.select,weightfile=weightfile,
            trainset=trainset,outprefix=outprefix, subject.genotype.list=CALGB.genotype.list,
            preCalmatrix=CALGB.preCalmatrix,tcga.genotype.list=TCGA.genotype.list,
            subject.gensigdir=CALGB.gensigdir,method=method,redun.method=redun.method,redun.cut=redun.cut,
            minsize=minsize,sen.weightcut=sen.weightcut,res.weightcut=res.weightcut,power=power,root=root,
            ECNpenalty=ECNpenalty,rm.equivocal=rm.equivocal,highlevel.minsize=highlevel.minsize,
            by.cluster=by.cluster) 
            #p.cut should be assigned NULL if similarity index is used instead of correlation statistic
      }else{
        subject.sensitive=CALGB.phenoData$SUBJECT_ID[(CALGB.phenoData$SUBJECT_ID %in% trainset) &
                                                       CALGB.phenoData$pCR_Breast==1]
        batchCal.GenSig.similarity(subject.sensitive=subject.sensitive,feature.select=feature.select,
            trainset=trainset,outprefix=outprefix,subject.genotype.list=CALGB.genotype.list,
            preCalmatrix=CALGB.preCalmatrix,tcga.genotype.list=TCGA.genotype.list,
            subject.gensigdir=CALGB.gensigdir,method=method,redun.method=redun.method,
            redun.cut=redun.cut,minsize=minsize,sen.weightcut=sen.weightcut,res.weightcut=res.weightcut,
            power=power,root=root,ECNpenalty=ECNpenalty,rm.equivocal=rm.equivocal,
            highlevel.minsize=highlevel.minsize,confound.factor=confound,by.cluster=by.cluster) 
            #p.cut should be assigned NULL if similarity index is used instead of correlation statistics
  }
}

Step3. CALGB: dGenSig and benchmark test

batchCal.dGenSig.trial2trial.train(gensigdir=CALGB.gensigdir,test.matrix=fold.assign[,colnames(fold.assign) %in%
       CALGB.phenoData$SUBJECT_ID],phenoData = CALGB.phenoData, method=dgensig.method)

batchbenchmark.dGenSig.trial(gensig.dir=subject.gensigdir, phenoData=CALGB.phenoData, by.cluster=by.cluster)

Step4. Preparation for ACOSOG analysis

train.gensigdir=CALGB.gensigdir
ACOSOG.gensigdir=paste0(train.gensigdir,"_ACOSOG")
dir.create(ACOSOG.gensigdir)
dgensig.model=mget(load(paste(train.gensigdir,"/iGenSig.parameters.rda",sep=""), 
                        envir=(tmp<- new.env())), envir=tmp)
list2env(dgensig.model$parameters, .GlobalEnv)

Step5. Option 1: calculate iGenSig-Rx scores for Validation dataset: ACOSOG clinical trial data # this takes about 2 min.

for (i in c(1)) {
    outprefix=paste0("trainset_", genSig.outprefix,"_Fold",rownames(fold.assign)[i])
    weightfile=paste0(train.gensigdir,"/",outprefix,"_Weight.xls")
    print(paste("calcluating Gensig using",weightfile))
    batchCal.GenSig.similarity(subject.sensitive=NULL,feature.select=feature.select,weightfile=weightfile,
              trainset=NULL,outprefix=outprefix,subject.genotype.list=ACOSOG.genotype.list,
              preCalmatrix=ACOSOG.preCalmatrix,tcga.genotype.list=TCGA.genotype.list,
              subject.gensigdir=ACOSOG.gensigdir,method=method,redun.method=redun.method,
              redun.cut=redun.cut,minsize=minsize,sen.weightcut=sen.weightcut,res.weightcut=res.weightcut,
              power=power,root=root,ECNpenalty=ECNpenalty,rm.equivocal=rm.equivocal,
              highlevel.minsize=highlevel.minsize,by.cluster=by.cluster,validationset=TRUE) 
              # p.cut should be assigned NULL if similarity index is used instead of correlation statistic
}

Step6. Calculate dGenSig scores and benchmark for Validation dataset: ACOSOG clinical trial data

batchCal.dGenSig.trial2trial.validation (train.gensigdir=train.gensigdir, validation.gensigdir=ACOSOG.gensigdir,
                 method=dgensig.method)
batchbenchmark.dGenSig.trial(gensig.dir=ACOSOG.gensigdir, phenoData=ACOSOG.phenoData, by.cluster=by.cluster, 
                event.col=15, time.col=16, validationset=TRUE)

Step7. preparation for NOAH analysis

train.gensigdir=CALGB.gensigdir
NOAH.gensigdir=paste0(train.gensigdir,"_NOAH")
dir.create(NOAH.gensigdir)
dgensig.model=mget(load(paste(train.gensigdir,"/iGenSig.parameters.rda",sep=""), 
                        envir=(tmp<- new.env())), envir=tmp)
list2env(dgensig.model$parameters, .GlobalEnv)

Step8. Calculated GenSig scores for Validation data: NOAH clinical trial data

If you want to calculate iGenSig-Rx scores for 10 permutations, please run the for look like "for (i in 1:nrow(fold.assign)) {"

for (i in c(1)) {
   outprefix=paste0("trainset_", genSig.outprefix,"_Fold",rownames(fold.assign)[i])
   weightfile=paste0(train.gensigdir,"/",outprefix,"_Weight.xls")
   print(paste("calcluating Gensig using",weightfile))
   batchCal.GenSig.similarity(subject.sensitive=NULL,feature.select=feature.select,weightfile=weightfile,
            trainset=NULL,outprefix=outprefix,subject.genotype.list=NOAH.genotype.list,
            preCalmatrix=NOAH.preCalmatrix,tcga.genotype.list=TCGA.genotype.list,
            subject.gensigdir=NOAH.gensigdir,method=method,redun.method=redun.method,
            redun.cut=redun.cut,minsize=minsize,sen.weightcut=sen.weightcut,res.weightcut=res.weightcut,
            power=power,root=root,ECNpenalty=ECNpenalty,rm.equivocal=rm.equivocal,
            highlevel.minsize=highlevel.minsize,by.cluster=by.cluster,validationset=TRUE) 
            #p.cut should be assigned NULL if similarity index is used instead of correlation statistic
 }

Step9. calculate dGenSig and benchmark for Validation data: NOAH clinical trial data

batchCal.dGenSig.trial2trial.validation (train.gensigdir=train.gensigdir, 
                             validation.gensigdir=NOAH.gensigdir, method=dgensig.method)
batchbenchmark.dGenSig.trial(gensig.dir=NOAH.gensigdir, phenoData=NOAH.phenoData, 
                             by.cluster=by.cluster, validationset=TRUE)

E. Expected outcome

Please find "./Result_iGenSigRx/iGenSig_CALGB_ACOSOG/2024-03-26.dGenSig_45_benchmark.result.xls" file. The file has the summary of the prediction performance AUROC.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
R		R
Result_iGenSigRx		Result_iGenSigRx
data-raw		data-raw
data		data
man		man
._.Rproj.user		._.Rproj.user
._.git		._.git
._.gitignore		._.gitignore
._DESCRIPTION		._DESCRIPTION
._LICENSE		._LICENSE
._NAMESPACE		._NAMESPACE
._R		._R
._README.html		._README.html
._README.md		._README.md
._Result_iGenSigRx		._Result_iGenSigRx
._data		._data
._data-raw		._data-raw
._iGenSigRx_v1.3.Rproj		._iGenSigRx_v1.3.Rproj
._man		._man
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.html		README.html
README.md		README.md
iGenSigRx_v1.3.Rproj		iGenSigRx_v1.3.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to use iGenSig-Rx R modules to build multi-omics models for predicting therapeutic responses based on GDSC dataset

iGenSig-Rx: an integral genomic signature based white-box method for modeling clinical therapeutic responses using genome-wide sequencing data

A. Introduction

B. How to build iGenSig-Rx models based on CALGB dataset and predict the therapeutic response in ACOSOG and NOAH datasets.

C. Installing R packages

D. Running iGenSig-Rx

Step1. Setting parameters and output directory

Step2. Calculate iGenSig-Rx score based on similarity method: CALGB clinical trial

Step3. CALGB: dGenSig and benchmark test

Step4. Preparation for ACOSOG analysis

Step5. Option 1: calculate iGenSig-Rx scores for Validation dataset: ACOSOG clinical trial data # this takes about 2 min.

Step6. Calculate dGenSig scores and benchmark for Validation dataset: ACOSOG clinical trial data

Step7. preparation for NOAH analysis

Step8. Calculated GenSig scores for Validation data: NOAH clinical trial data

Step9. calculate dGenSig and benchmark for Validation data: NOAH clinical trial data

E. Expected outcome

About

Releases

Packages

Contributors 2

Languages

License

wangxlab/iGenSig-Rx

Folders and files

Latest commit

History

Repository files navigation

How to use iGenSig-Rx R modules to build multi-omics models for predicting therapeutic responses based on GDSC dataset

iGenSig-Rx: an integral genomic signature based white-box method for modeling clinical therapeutic responses using genome-wide sequencing data

A. Introduction

B. How to build iGenSig-Rx models based on CALGB dataset and predict the therapeutic response in ACOSOG and NOAH datasets.

C. Installing R packages

D. Running iGenSig-Rx

Step1. Setting parameters and output directory

Step2. Calculate iGenSig-Rx score based on similarity method: CALGB clinical trial

Step3. CALGB: dGenSig and benchmark test

Step4. Preparation for ACOSOG analysis

Step5. Option 1: calculate iGenSig-Rx scores for Validation dataset: ACOSOG clinical trial data # this takes about 2 min.

Step6. Calculate dGenSig scores and benchmark for Validation dataset: ACOSOG clinical trial data

Step7. preparation for NOAH analysis

Step8. Calculated GenSig scores for Validation data: NOAH clinical trial data

Step9. calculate dGenSig and benchmark for Validation data: NOAH clinical trial data

E. Expected outcome

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages