-
Notifications
You must be signed in to change notification settings - Fork 0
/
logisticRR.Rmd
388 lines (262 loc) · 21.9 KB
/
logisticRR.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
---
title: "Deriving relative risk from logistic regression"
author: "Youjin Lee"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{logisticRR}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Relative risk v.s. odds ratio
Relative risk and odds ratio are often confused or misinterpreted. Especially while coefficients in logistic regression are directly interpreted as (adjusted) odds ratio, they are unwittingly translated as (adjusted) relative risks in many public health studies. In that relative risks are useful in many thousands of applications, along with odds ratio, we propose a software tool to easily convert from odds ratio to relative risks under logistic regression. Unlike adjusted odds ratio conditional on other confounders, adjusted relative risks may vary depending on other confounders in the logistic model so we also analytically examine the effect of those confounders on the adjusted relative risk.
### Binary or continuous exposure variable
Let us first define adjusted relative risks of binary exposure $X$ on binary outcome $Y$ conditional on $\mathbf{Z}$.
$$\frac{p(Y = 1 \mid X = 1, \mathbf{Z} )}{p(Y = 1 \mid X = 0, \mathbf{Z})}$$
Generally speaking, when exposure variable of $X$ is continuous or ordinal, we can define adjusted relative risks as ratio between probability of observing $Y = 1$ when $X = x+1$ over $X = x$ conditional on $\mathbf{Z}$. Unlike adjusted odds ratio, these ratio depend on baseline value of exposure $x$ under logistic regression.
$$\frac{p(Y = 1 \mid X = x+1, \mathbf{Z} )}{p(Y = 1 \mid X = x, \mathbf{Z})}$$
### Nominal exposure variable
On the other hand, when exposure variable is nominal, it is impossible to compare the probabilities in one unit change. Therefore, when a type of exposure variable is `factor`, we allow users to specify two values of exposure variable including baseline ($x_{0}$) and comparative level ($x_{1}$) and derive the relative risks given those two exposure levels.
$$\frac{p(Y = 1 \mid X = x_{1}, \mathbf{Z} )}{p(Y = 1 \mid X = x_{0}, \mathbf{Z})}$$
The above is more generalized version. By setting $x_{1} = 1$ and $x_{0} = 0$ we can go back to binary case.
## Multinomial logistic regression
Consider nominal outcome of interest that could take more than two values. Denote a value of outcome of $Y$ as $0, 1, 2, \ldots, K$ and treat $Y=0$ as reference.
$$\ln\left( \frac{p(Y_{i} = j | X_{i} = x, \mathbf{Z})}{p(Y_{i} = 0 | X_{i} = x, \mathbf{Z})} \right) = \alpha_{j} + \beta_{j} x + \mathbf{\gamma}^{T} \mathbf{z}_{i}$$
Multinomial logistic regression is widely used for studies from diverse disciplines but unfortunately, we have commonly found the literatures that used relative risk from multinomial logistic regression without full discussion of its derivation or its varying value of conditioning covariates.
In this case, relative risks of reference group ($Y_{j} = 0$) for subject $i$ is :
$$\frac{p( Y_{i} = 0 | X_{i} = x_{1}, \mathbf{Z} = \mathbf{z}_{i} ) }{p(Y_{i} = 0 | X_{i} = x_{0}, \mathbf{Z} = \mathbf{z}_{i} )} = \frac{1}{ 1 + \sum_{k=1}^{K} \exp(a_{k} + \beta_{k} x_{1} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z}_{i} ) } \left( \frac{1}{ 1 + \sum_{k=1}^{K} \exp(a_{k} + \beta_{k} x_{0} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z}_{i} ) } \right)^{-1} \\ = \left( 1 + \sum\limits_{k=1}^{K} \exp( \alpha_{k} + \beta_{k} x_{0} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z}_{i} ) \right) \big/ \left(1 + \sum\limits_{k=1}^{K} \exp(\alpha_{k} + \beta_{k}x_{1} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z}_{i} ) \right)$$
In this case, relative risks of $Y_{j} = j$ ($j=1,2,\ldots, K$)
$$\frac{p( Y_{i} = j | X_{i} = x_{1}, \mathbf{Z} = \mathbf{z}_{i} ) }{p(Y_{i} = j | X_{i} = x_{0}, \mathbf{Z} = \mathbf{z}_{i} )} = \frac{\exp(a_{j} + \beta_{j} x_{1} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z}_{i} ) }{ 1 + \sum_{k=1}^{K} \exp(a_{k} + \beta_{k} x_{1} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z}_{i} ) } \left( \frac{\exp(a_{j} + \beta_{j} x_{0} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z}_{i} ) }{ 1 + \sum_{k=1}^{K} \exp(a_{k} + \beta_{k} x_{0} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z}_{i} ) } \right)^{-1} \\ = \frac{\exp( \beta_{j} (x_{1} - x_{0} )) \left( 1 + \sum\limits_{k=1}^{K} \exp( \alpha_{k} + \beta_{k} x_{0} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z}_{i} ) \right) }{ 1 + \sum\limits_{k=1}^{K} \exp(\alpha_{k} + \beta_{k}x_{1} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z}_{i} ) }$$
Observe that relative risks for each of $K+1$ possible outcomes are all dependent on the regression coefficients of other groups and conditioning coefficient values ($\mathbf{z}_{i}$).
Other than relative risks, relative risk ratio (RRR) between response of j and response of 0 is often of interest.
\[
\frac{p( Y_{i} = j | X_{i} = x_{1}, \mathbf{Z} = \mathbf{z}_{i} ) }{p(Y_{i} = j | X_{i} = x_{0}, \mathbf{Z} = \mathbf{z}_{i} )} \big/ \frac{p( Y_{i} = 0 | X_{i} = x_{1}, \mathbf{Z} = \mathbf{z}_{i} ) }{p(Y_{i} = 0 | X_{i} = x_{0}, \mathbf{Z} = \mathbf{z}_{i} )} = \exp\left( \beta_{j}(x_{1} - x_{0}) \right)
\]
## Estimated variance of relative risk under binary response
In case of (adjusted) odds ratio derived from logistic regression, we can directly obtain variance-covariance matrix for coefficients using `glm` function in `R`. However, deriving variance of adjusted relative risks, as a function of those coefficients, is more challenging.
We first provide a estimated variance of relative risk using Delta method upon estimated variance of odds ratio from `glm`. The second method to estimate variance is using sampling variance of bootstrap samples.
### Delta method
Let $\boldsymbol{\beta}$ be a vector of coefficients used in logistic regression and among them $\beta_{1}$ is a coefficient associated with an exposure variable of interest taking a value of $x_{0}$ as baseline level and $x_{1}$ as comparative level. Then we can represent the adjusted relative risk as a function of $\boldsymbol{\beta}$ conditional on $\mathbf{Z} = \mathbf{z}$:
$$g(\boldsymbol{\beta}) = \frac{1 + \exp(-\beta_{0} - \beta_{1} x_{0} - \boldsymbol{\beta}^{T}_{2:p} \mathbf{z}) }{ 1 + \exp (-\beta_{0} - \beta_{1} x_{1} - \boldsymbol{\beta}^{T}_{2:p} \mathbf{z}) }$$
Then by Delta method,
$$var[g(\boldsymbol{\beta})] = \left\{\frac{\partial g(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} \right\}^{T} var(\boldsymbol{\beta}) \left\{\frac{\partial g(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} \right\}$$
Note that $\frac{\partial g(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}$ is $p \times 1$ and $var(\boldsymbol{\beta})$ is $p \times p$, so $var[g(\boldsymbol{\beta})]$ is a scalar value.
A $p \times p$ matrix of $var{(\boldsymbol{\beta})}$ is obtained by `summary(fit)$cov.unscaled` when `fit` is a `glm` object.
Let $e_{0} = \exp(-\beta_{0} - \beta_{1} x_{0} - \boldsymbol{\beta}^{T}_{2:p} \mathbf{z})$ and $e_{1} = \exp (-\beta_{0} - \beta_{1} x_{1} - \boldsymbol{\beta}^{T}_{2:p} \mathbf{z})$.
$$\frac{\partial g(\boldsymbol{\beta})}{\partial \beta_{0}} = \frac{- e_{1} + e_{0}}{(1 + e_{1})^2 } = \frac{e_{0}(1 - \exp(-\beta_{1}(x_{1} - x_{0}) ) ) }{(1 + e_{1})^2}$$
$$\frac{\partial g(\boldsymbol{\beta})}{\partial \beta_{1}} = \frac{-x_{1} e_{1}( 1 + e_{0}) + x_{0} e_{0}(1 + e_{1}) }{(1 + e_{1})^2 }$$
For any $j = 2,3,\ldots, p$ where $z_{j}$ is a covariate of which effect is associated with $\beta_{j}$:
$$\frac{\partial g(\boldsymbol{\beta})}{\partial \beta_{j}} = \frac{z_{j}(e_{0} - e_{1} ) }{ (1+e_{1})^2} = \frac{1 - \exp(-(x_{1} - x_{0})\beta_{1}) }{(1 + e_{1})^2}$$
By combining information of estimated $var{(\boldsymbol{\beta})}$ and $\frac{\partial g(\boldsymbol{\hat{\beta}})}{\partial \boldsymbol{\hat{\beta}}}$, we can derive the estimated variance of $g(\boldsymbol{\beta})$.
## Estimated variance of relative risk under nominal response
Similar to `glm()`, `multinom()` also produces Hessian matrix (`fit$Hessian` for `multinom` object of `fit`) of the coefficients of which inverse is covariance matrix. Therefore, similar to binary logistic regression, we can use Delta method to estimate the variance of relative risk.
Let us consider the response variable of $Y$ taking $(K+1)$ categorical values, $Y=0,1,2, \ldots, K$, and having one binary exposure variable $X$ and $p$ conditional covariates $\mathbf{Z}$. Then we can represent the adjusted relative risk of $k \in \{1,2, \ldots, K\}$ response as a function of $\boldsymbol{\Theta} = (\boldsymbol{\alpha}, \boldsymbol{\beta}, \boldsymbol{\Gamma})$ where $\boldsymbol{\alpha} = (\alpha_{1}, \alpha_{2}, \cdots, \alpha_{K})$; $\boldsymbol{\beta} = (\beta_{1}, \beta_{2}, \cdots, \beta_{K})$; and $\Gamma$ is $(K \times p)$ matrix of which $k^{th}$ row has $\boldsymbol{\gamma}_{k} = (\beta_{1}, \beta_{2}, \cdots, \beta_{K})$.
For $j \in \{1,2,\ldots, K\}$, let $g_{j}(\boldsymbol{\Theta})$ be a relative risk of $Y = j$ conditional on $\mathbf{Z} = \mathbf{z}$.
$$g_{j}(\boldsymbol{\Theta}) = \frac{p( Y = j | X = x_{1}, \mathbf{Z} = \mathbf{z}) }{p(Y = j| X = x_{0}, \mathbf{Z} = \mathbf{z})}$$
For simplicity, $e_{1k}(\mathbf{z}) = \exp(\alpha_{k} + \beta_{k}x_{1} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z})$ and $e_{0k} = \exp(\alpha_{k} + \beta_{k}x_{1} + \boldsymbol{\gamma}^{T}_{k} \mathbf{z})$.
$$\frac{\partial g_{j}(\boldsymbol{\Theta})}{\partial \alpha_{i}} = (1 + \sum\limits_{k=1}^{K} e_{1k}(\mathbf{z}))^{-2} \exp(\beta_{j}(x_{1} - x_{0}) ) \left\{ e_{1i}(\mathbf{z}) (1 + \sum\limits_{k=1}^{K} e_{0k}(\mathbf{z}) ) - e_{0i}(\mathbf{z}) ( 1 + \sum\limits_{k=1}^{K} e_{1k}(\mathbf{Z}) ) \right\} $$
If $i \neq j$:
$$\frac{\partial g_{j}(\boldsymbol{\Theta})}{\partial \beta_{i}} = (1 + \sum\limits_{k=1}^{K} e_{1k}(\mathbf{z}))^{-2} \exp(\beta_{j}(x_{1} - x_{0}) ) \left\{ x_{1} e_{1i}(\mathbf{z}) (1 + \sum\limits_{k=1}^{K} e_{0k}(\mathbf{z}) ) - x_{0} e_{0i}(\mathbf{z}) ( 1 + \sum\limits_{k=1}^{K} e_{1k}(\mathbf{Z}) ) \right\} $$
If $i=j$:
$$\frac{\partial g(\boldsymbol{\Theta})}{\partial \beta_{j}} =(1 + \sum\limits_{k=1}^{K} e_{1k}(\mathbf{z}))^{-2} \exp(\beta_{j}(x_{1} - x_{0}) ) \left[ x_{1} e_{1i}(\mathbf{z}) (1 + \sum\limits_{k=1}^{K} e_{0k}(\mathbf{z}) ) - \left\{ (x_{1} - x_{0}) ( 1 + \sum\limits_{k=1}^{K} e_{0k}(\mathbf{Z})) + x_{0} e_{0i}(\mathbf{z}) \right\}( 1 + \sum\limits_{k=1}^{K} e_{1k}(\mathbf{Z}) ) \right]$$
For $l = 1,2,\ldots,p$:
$$\frac{\partial g_{j}(\boldsymbol{\Theta})}{\partial \gamma_{jl}} = (1 + \sum\limits_{k=1}^{K} e_{1k}(\mathbf{z}))^{-2} \exp(\beta_{j}(x_{1} - x_{0}) ) z_{l} \left\{ e_{1i}(\mathbf{z}) (1 + \sum\limits_{k=1}^{K} e_{0k}(\mathbf{z}) ) - e_{0i}(\mathbf{z}) ( 1 + \sum\limits_{k=1}^{K} e_{1k}(\mathbf{Z}) ) \right\} $$
On the other hand, relative risks of $Y = 0$, $g_{0}(\boldsymbol{\theta})$ can be also derived as similar way without $\exp(\beta_{j}(x_{1} - x_{0}) )$ in the nominator.
Then a total of $q = (K + K + K\times p)$ vector of $\frac{\partial g_{j}(\boldsymbol{\Theta}) }{\partial \boldsymbol{\Theta}}$ for $j=0,1,2,\ldots, K$ can be constructed and let $\mathbf{G}$ be a $(K \times q)$ matrix of each row is $\frac{\partial g_{j}(\boldsymbol{\Theta}) }{\partial \boldsymbol{\Theta}}$.
Then by Delta method,
$$\begin{bmatrix} var\left[ g_{0}(\boldsymbol{\Theta}) \right] \\ var\left[ g_{1}(\boldsymbol{\Theta}) \right] \\ \vdots \\ var\left[ g_{K}(\boldsymbol{\Theta}) \right] \end{bmatrix} = \mathbf{G}~var[\boldsymbol{\Theta}]~ \mathbf{G}^{-1}$$,
where $var[\boldsymbol{\Theta}]$ is from Hessian matrix of `fit$Hessian`.
In both of logistic regression and multinomial logistic regression, having nominal exposure variable makes derivation more complicated but we can extend the binary exposure variable case.
## Bootstrap
All of `logisticRR`, `nominalRR`, `multiRR`, and `multinRR`, we add a logical input of `boot`: by setting `boot = TRUE` those functions print out a vector of `n.boot` number of (adjusted) relative risks.
## Examplary Data
As a first example, we generate hypothetical data of size $n=500$.
```{r}
library(logisticRR)
n <- 500
set.seed(1234)
X <- rbinom(n, 1, 0.3)
W <- rbinom(n, 1, 0.3); W[sample(1:n, n/3)] = 2
Z <- rep(0, n)
Z[sample(1:n, n/2)] <- "female"; Z <- ifelse(Z == 0, "male", Z)
dummyZ <- ifelse(Z == "female", 1, 0)
Y <- rbinom(n, 1, plogis(X - W + 2*dummyZ))
dat <- as.data.frame(cbind(Y, X, W, Z))
dat$X <- as.numeric(dat$X); dat$X <- ifelse(dat$X == 2, 1, 0)
dat$Y <- as.numeric(dat$Y); dat$Y <- ifelse(dat$Y == 2, 1, 0)
dat$W <- as.factor(dat$W)
dat$Z <- as.factor(dat$Z)
head(dat)
```
The code below estimates variance of adjusted relative risks of binary $X$ on binary outcome of $Y$ by generating `n.boot = 200` bootstrap samples. Because we do not specify baseline level of exposure variable (`basecov`) nor the value of conditioning covariates of `W` and `Z` (`fixcov`), baseline exposure level is set to `0` as default. Since `W` and `Z` are
both factor, they are fixed to their first level which are `0` and `female`.
```{r}
simresult200 <- logisticRR(Y ~ X + W + Z, data = dat, boot = TRUE, n.boot = 200)
simresult200$RR
var(simresult200$boot.rr)
simresult200$delta.var
## print out conditioning
simresult200$fix.cov
```
This time we increase the number of bootstrap samples to `n.boot = 1000`. Note that sampling variance gets closer to the estimated variance using Delta method (`delta.var`).
```{r}
simresult1000 <- logisticRR(Y ~ X + W + Z, data = dat, boot = TRUE, n.boot = 1000)
var(simresult1000$boot.rr)
simresult1000$delta.var
```
### Adjusted relative risks depending on confounders
We have a total of six combination of confounder variables. By the assumption made in logistic regression, adjusted odds ratio is consistent against of these confounders but adjusted relative risk is not.
```{r}
levels(dat$W)
levels(dat$Z)
adjusted <- cbind(rep(levels(dat$W), 2), rep(levels(dat$Z), each = 3))
adjusted <- as.data.frame(adjusted)
names(adjusted) <- c("W", "Z")
```
Adjusted relative risk tends to be higher for male and for higher level of `W`.
```{r}
## compare with odds ratio
results <- list()
for(i in 1:nrow(adjusted)){
results[[i]] <- logisticRR(Y ~ X + W + Z, data = dat, fixcov = adjusted[i,], boot = FALSE)
}
## adjusted relative risk
# female
print(c(results[[1]]$RR, results[[2]]$RR, results[[3]]$RR))
# male
print(c(results[[4]]$RR, results[[5]]$RR, results[[6]]$RR))
## adjusted odds ratio
## all the same : by the assumption of logistic regression
print(exp(coefficients(results[[1]]$fit)[2]))
# betas <- coefficients(fit)
# exposed <- exp(-predict(fit, expose.cov, type = "link"))
# unexposed <- exp(-predict(fit, unexpose.cov, type = "link"))
# RR <- (1 + unexposed) / (1 + exposed)
```
Let us change the prevalence of exposure variable ($X$).
```{r}
dat2 <- dat
dat2$Y <- ifelse(dat$Y == 1, rbinom(n, 1, 0.2), rbinom(n, 1, 0.01))
## compare with odds ratio
results2 <- list()
for(i in 1:nrow(adjusted)){
results2[[i]] <- logisticRR(Y ~ X + W + Z, data = dat2, fixcov = adjusted[i,], boot = TRUE, n.boot = 1000)
}
## adjusted relative risk
# female
print(c(results2[[1]]$RR, results2[[2]]$RR, results2[[3]]$RR))
# male
print(c(results2[[4]]$RR, results2[[5]]$RR, results2[[6]]$RR))
## adjusted odds ratio
## all the same : by the assumption of logistic regression
print(exp(coefficients(results2[[1]]$fit)[2]))
# betas <- coefficients(fit)
# exposed <- exp(-predict(fit, expose.cov, type = "link"))
# unexposed <- exp(-predict(fit, unexpose.cov, type = "link"))
# RR <- (1 + unexposed) / (1 + exposed)
```
```{r, fig.width=8, fig.height=6, fig.cap = "\\label{fig:figs} Plotting the distribution of adjusted relative risks for each combination of confounders. Each black dot denotes point estimate of adjusted relative risk and red horizontal line denotes adjusted odds ratio which is independent of the levels of confounders.", echo = FALSE}
RRboot = data.frame(rr = c(results2[[1]]$boot.rr, results2[[2]]$boot.rr, results2[[3]]$boot.rr,
results2[[4]]$boot.rr, results2[[5]]$boot.rr, results2[[6]]$boot.rr),
Z = c(rep("female", 3000), rep("male", 3000)),
W = c( rep(c(rep("0", 1000), rep("1", 1000), rep("2", 1000)), 2)))
#pdf("Figure/results2.pdf", width = 19, height = 9)
par(mfrow = c(1,1), mar = c(3,3,3,3), cex.lab = 2,
cex.main = 1, cex.axis = 1, tcl = 0.5, omi = c(0.5,0.5,0,0))
bx = boxplot(rr ~ interaction(Z, W), data = RRboot, ylim = c(0,2.5),
col = rep(c("palegreen", "dodgerblue"), 3),
outcol = rep(c("palegreen", "dodgerblue"), 3),
main= "Adjusted relative risks",
xlab="", ylab= "", mgp = c(4,1,0),
yaxt = 'n', at = seq(1,6,1), outline = FALSE)
mtext("(Z,W)", 1, 4, cex = 1)
mtext("Adjusted relative risk", 2, 4, cex = 1)
axis(2, at = seq(0, 2.5, 0.5), labels = seq(0, 2.5, 0.5))
for(p in 1:6){
points(p, results2[[p]]$RR, type = "p", pch = 20, cex = 3)
}
abline(h = exp(coefficients(results2[[1]]$fit)[2]), col = "red", lwd = 2)
```
### multinomial logistic regression
When reponse variable takes more than two values, multinomial logistic regression is widely used to reveal association between the response variable and exposure variable. In that case, relative risk of each category compared to the reference category can be considered, conditional on other fixed covariates. Other than (adjusted) relative risk, relative risks ratio (RRR) is often of interest in multinomial logistic regression.
```{r}
library(logisticRR)
dat$multiY <- ifelse(dat$X == 1, rbinom(n, 1, 0.8) + dat$Y, rbinom(n, 1, 0.2) + dat$Y)
multiresult <- multiRR(multiY ~ X + W + Z, data = dat, boot = TRUE, n.boot = 1000)
apply(multiresult$boot.rr, 2, sd)
sqrt(multiresult$delta.var)
## relative risk ratio (RRR)
print(multiresult$RRR)
## relative risk (RR)
print(multiresult$RR)
```
Similar to the binary reponse, in multinomial logistic regression model, categorical exposure variable can be introduced; in this case, baseline value and comparative value of exposure variable should be specified.
```{r}
multinresult <- multinRR(multiY ~ W + X + Z, data = dat, basecov = 0, comparecov = 1, boot = TRUE, n.boot = 1000)
apply(multinresult$boot.rr, 2, sd)
sqrt(multinresult$delta.var)
## relative risk ratio (RRR)
print(multinresult$RRR)
## relative risk (RR)
print(multinresult$RR)
```
## Airquality example
We introduce [`airquality`](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/airquality.html) data to illustrate the use of `logisticRR`. You can download the data set by :
```{r}
data("airquality")
```
Because outcome variable of `Ozone` is continuous, we are going to binarize this variable into `ozone1` (top 10\% take 1 and 0 otherwise), `ozone2` (top 20\% take 1 and 0 otherwise), and `ozone3` (top 30\% take 1 and 0 otherwise).
```{r}
# delete observations having NAs
ozonedat <- na.omit(airquality)
# define binary ozone level
ozonedat$ozone1 <- ifelse(ozonedat$Ozone < quantile(ozonedat$Ozone, prob = 0.1), 1, 0)
ozonedat$ozone2 <- ifelse(ozonedat$Ozone < quantile(ozonedat$Ozone, prob = 0.2), 1, 0)
ozonedat$ozone3 <- ifelse(ozonedat$Ozone < quantile(ozonedat$Ozone, prob = 0.3), 1, 0)
```
As an exposure variable of main interest, we use numerical `Temp` (temperature). Because a range of `Temp` is wide we devide it by 10 and use `Temp2` in the model so that one unit change of `Temp2` is ten unit (10 Fahrenheit) in `Temp`.
```{r}
summary(ozonedat$Temp)
ozonedat$Temp2 <- ozonedat$Temp / 10
summary(ozonedat$Temp2)
```
As other confounding variables, we chose solar radiation (`Solar.R`) and average wind speed (`Wind`) so that `formula` used for `glm` is `ozone1 ~ Temp2 + Solar.R + Wind`, for example. We specify conditioning confounder values as an average of each variable by `fixcov = data.frame(Solar.R = mean(ozonedat$Solar.R), Wind = mean(ozonedat$Wind))`.
```{r}
ozone.fit1 <- logisticRR(ozone1 ~ Temp2 + Solar.R + Wind, data = ozonedat, basecov = min(ozonedat$Temp2),
fixcov = data.frame(Solar.R = mean(ozonedat$Solar.R), Wind = mean(ozonedat$Wind)),
boot = FALSE)
ozone.fit2 <- logisticRR(ozone2 ~ Temp2 + Solar.R + Wind, data = ozonedat, basecov = min(ozonedat$Temp2),
fixcov = data.frame(Solar.R = mean(ozonedat$Solar.R), Wind = mean(ozonedat$Wind)),
boot = FALSE)
ozone.fit3 <- logisticRR(ozone3 ~ Temp2 + Solar.R + Wind, data = ozonedat, basecov = min(ozonedat$Temp2),
fixcov = data.frame(Solar.R = mean(ozonedat$Solar.R), Wind = mean(ozonedat$Wind)),
boot = FALSE)
```
As prevalence of outcome is smaller (`ozone1` < `ozone2` < `ozone3`), estimated adjusted relative risk is closer to adjusted odds ratio.
```{r}
print(c(ozone.fit1$RR, ozone.fit2$RR, ozone.fit3$RR))
## odds ratio
exp(ozone.fit1$fit$coefficients[2])
```
Next we are going to use `nominalRR` when an exposure variable is converted into nominal variable of `Temp.factor` having three categories -- `low`, `medium`, and `high`.
Note that adjusted relative risk when `basecov = "low", comparecov = "medium"` is the reciprocal of that when `basecov = "medium", comparecov = "low"`.
```{r}
# define binary ozone level
ozonedat$ozone1 <- ifelse(ozonedat$Ozone < quantile(ozonedat$Ozone, prob = 0.1), 1, 0)
ozonedat$Temp.factor <- ifelse(ozonedat$Temp <= quantile(ozonedat$Temp, prob = 0.25), "low",
ifelse(ozonedat$Temp > quantile(ozonedat$Temp, prob = 0.8), "high", "medium"))
ozonedat$Temp.factor <- as.factor(ozonedat$Temp.factor)
ozone.fit.factor <- nominalRR(ozone1 ~ Temp.factor + Solar.R + Wind, data = ozonedat,
basecov = "low", comparecov = "medium",
fixcov = data.frame(Solar.R = mean(ozonedat$Solar.R), Wind = mean(ozonedat$Wind)), boot = FALSE)
ozone.fit.factor2 <- nominalRR(ozone1 ~ Temp.factor + Solar.R + Wind, data = ozonedat,
basecov = "medium", comparecov = "low",
fixcov = data.frame(Solar.R = mean(ozonedat$Solar.R), Wind = mean(ozonedat$Wind)), boot = FALSE)
print(c(ozone.fit.factor$RR, ozone.fit.factor2$RR))
print(1/ozone.fit.factor2$RR)
```