-
Notifications
You must be signed in to change notification settings - Fork 6
/
moss_intro.rmd
171 lines (143 loc) · 6.07 KB
/
moss_intro.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
title: "How to use MOSS package"
author: "[Weixin Cai](https://wilsoncai1992.github.io/)"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Model survival curves using TMLE}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
# tl;dr
Using MOSS to analyze survival data and estimate survival curve falls into the following steps:
1. clean survival data into right-censored format
2. perform SuperLearner fit of the conditional survival function of failure event, conditional survival function of censoring event, propensity scores (`initial_sl_fit`)
3. perform TMLE adjustment of the conditional survival fit (`MOSS_hazard`)
4. simultaneous confidence band (`compute_simultaneous_ci`)
# clean survival data into right-censored format
You will need a matrix of baseline covariates W, a binary treatment A, $\widetilde T \triangleq \min(T_A, C_A)$ is the last measurement time of the subject, and $\Delta \triangleq I(T_A \leqslant C_A)$ is the censoring indicator.
```{r}
library(simcausal)
D <- DAG.empty()
D <- D +
node("W1", distr = "rbinom", size = 1, prob = .5) +
node("W", distr = "runif", min = 0, max = 1.5) +
node("A", distr = "rbinom", size = 1, prob = .15 + .5 * as.numeric(W > .75)) +
node("Trexp", distr = "rexp", rate = 1 + .7 * W^2 - .8 * A) +
node("Cweib", distr = "rweibull", shape = 1 + .5 * W, scale = 75) +
node("T", distr = "rconst", const = round(Trexp * 2)) +
node("C", distr = "rconst", const = round(Cweib * 2)) +
# Observed random variable (follow-up time):
node("T.tilde", distr = "rconst", const = ifelse(T <= C, T, C)) +
# Observed random variable (censoring indicator, 1 - failure event, 0 - censored):
node("Delta", distr = "rconst", const = ifelse(T <= C, 1, 0))
setD <- set.DAG(D)
dat <- sim(setD, n = 2e2)
# only grab ID, W's, A, T.tilde, Delta
Wname <- grep("W", colnames(dat), value = TRUE)
df <- dat[, c("ID", Wname, "A", "T.tilde", "Delta")]
# The simulator will generate death at time 0.
# our package only allow positive integer time, so I add one to all times
df$T.tilde <- df$T.tilde + 1
```
Here I simulate a survival data using `simcausal` package. The baseline covariate
# perform SuperLearner fit of the conditional survival function of failure event, conditional survival function of censoring event, and propensity scores (`initial_sl_fit`)
- T_tilde: vector of last follow up time
- Delta: vector of censoring indicator
- A: vector of treatment
- W: data.frame of baseline covariates
- t_max: you always set as the maximum time
The following three can take a vector of strings in the following sets:
https://github.com/ecpolley/SuperLearner/tree/master/R
- sl_failure: SuperLearner library for failure event hazard
- sl_censoring: SuperLearner library for censoring event hazard
- sl_treatment: SuperLearner library for propensity score
```{r}
library(MOSS)
sl_lib_g <- c("SL.mean", "SL.glm")
sl_lib_censor <- c("SL.mean", "SL.glm")
sl_lib_failure <- c("SL.mean", "SL.glm", "SL.step.forward")
sl_fit <- initial_sl_fit(
T_tilde = df$T.tilde,
Delta = df$Delta,
A = df$A,
W = data.frame(df[, c("W", "W1")]),
t_max = max(df$T.tilde),
sl_treatment = sl_lib_g,
sl_censoring = sl_lib_censor,
sl_failure = sl_lib_failure
)
```
```{r}
print(names(sl_fit))
```
the `sl_fit` will contain the fitted conditional densities for the failure events (`density_failure_1` for treatment group, `density_failure_0` for control group), censoring events (`density_censor_1` for treatment, `density_censor_0` for control), and propensity scores (a vector `g1W`)
```{r}
sl_fit$density_failure_1$hazard_to_survival()
sl_fit$density_failure_0$hazard_to_survival()
# a quick hack in case there is no data where T_tilde = 1 (time start from 1)
k_grid <- 1:max(df$T.tilde)
sl_fit$density_failure_1$t <- k_grid
sl_fit$density_failure_0$t <- k_grid
```
We need to call `hazard_to_survival` method to always do a tranformation from conditional hazard to conditional survival probabilities (one-to-one transformation).
# perform TMLE adjustment of the conditional survival fit (`MOSS_hazard`)
First we set the inputs
- T_tilde: same as before
- Delta: same as before
- A: same as before
- density_failure: use `sl_fit$density_failure_1` if you want to estimate treatment group survival curve; use `sl_fit$density_failure_0` for control group
- density_censor: use `sl_fit$density_censor_1` or `sl_fit$density_censor_0`
- g1W: use `sl_fit$g1W`
- A_intervene: set `1` if you want to estimate treatment group survival curve; set `0` for control group
- k_grid: `1:max(T_tilde)`
```{r}
moss_hazard_fit <- MOSS_hazard$new(
A = df$A,
T_tilde = df$T.tilde,
Delta = df$Delta,
density_failure = sl_fit$density_failure_1,
density_censor = sl_fit$density_censor_1,
g1W = sl_fit$g1W,
A_intervene = 1,
k_grid = k_grid
)
```
Perform TMLE step.
```{r}
psi_moss_hazard_1 <- moss_hazard_fit$iterate_onestep(
epsilon = 1e-2, max_num_interation = 1e1, verbose = FALSE, method = "l2"
)
```
TIPS:
- set `epsilon` smaller if the stopping criteria fluctuation is noisy; should smoothly decrease
```{r}
moss_hazard_fit_1 <- survival_curve$new(t = k_grid, survival = psi_moss_hazard_1)
moss_hazard_fit_1$display(type = 'survival')
```
You don't have to, but this wraps the estimated survival curve `psi_moss_hazard_1` nicely with its corresponding time.
# simultaneous confidence band (`compute_simultaneous_ci`)
use the following script to compute the standard error for each t on the survival curve.
```{r}
survival_curve_estimate <- as.vector(moss_hazard_fit_1$survival)
eic_fit <- eic$new(
A = df$A,
T_tilde = df$T.tilde,
Delta = df$Delta,density_failure = moss_hazard_fit$density_failure,
density_censor = moss_hazard_fit$density_censor,
g1W = moss_hazard_fit$g1W,
psi = survival_curve_estimate,
A_intervene = 1
)
eic_matrix <- eic_fit$all_t(k_grid = k_grid)
std_err <- compute_simultaneous_ci(eic_matrix)
upper_bound <- survival_curve_estimate + 1.96 * std_err
lower_bound <- survival_curve_estimate - 1.96 * std_err
print(survival_curve_estimate)
print(upper_bound)
print(lower_bound)
```
## Session Information
```{r sessionInfo, echo=FALSE}
sessionInfo()
```