Here, we provide an analysis pipeline for the best combinatorial treatment selection in combination therapy:
- Apply missing indicator method to handle missing values in the data;
- Apply virtual multiple matching to adjust for possible confounding across multiple treatment groups (SMOTE is applied with the presence of class imbalance);
- Conduct permutation test of overall treatment efficacy (i.e., sharp null hypothesis testing);
- Determine patient stratification strategy with identified effect modifiers in a data-driven manner;
- Perform multiple comparisons with the best to select the best drug combination strategy within each subgroup.
The original data include: Y (treatment failure or recurrence status), DRUG (combinatorial drug categories), X1 (feature, e.g., age), X2 (feature, e.g., BMI), X3 (feature, e.g., disease severity) ...
Perform MIM
function (in missing_indicator_method.R) to add missing indicators. The data structure after MIM is: Y, DRUG, X1, I.X1, X2, I.X2, X3, I.X3 ...
Perform VMM
function (in virtual_multiple_matching.R) to adjust for possible confounding across different treatment groups. Set smote = TRUE with the presence of class imbalance. The VMM
function is based on R functions randomForest
and SMOTE
.
Select top features with the largest prognostic value (i.e., candidate effect modifiers) through variable importance ranking from randomForest
in Step 3. Then further select effect modifiers from candidate ones based on point estimate of treatment failure rate.
Perform sharp.null.test
function (in sharp_null_test.R) to test if there exists an overall treatment efficacy.
Perform MCB
function (in multiple_comparison_with_best.R) in the whole dataset, as well as in different patient subgroups stratified by selected effect modifiers in Step 4, to select the best drug combination strategy.