/
MASHvFLASHsims2.Rmd
213 lines (147 loc) · 9.35 KB
/
MASHvFLASHsims2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
---
title: "MASH v FLASH detailed simulation study"
output:
workflowr::wflow_html:
code_folding: hide
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction
Here I study simulations from a MASH model that extends the "model with independent, unique, and shared effects" from my larger [simulation study](MASHvFLASHsims.html). For the code used in this analysis, see [below](#code).
## Simulations
I run 10 simulations, each of which simulates data for 20 conditions and 1200 tests. I use 17 different covariance structures, each of which are used to simulate 25 tests. The other 775 tests are null across all conditions.
### Independent effects
Effects were nonnull for all conditions and generated independently from a $N(0, \sigma^2)$ distribution. I simulated $(1)$ small independent effects ($\sigma^2 = 2^2$), $(2)$ large independent effects ($\sigma^2 = 5^2$), and $(3)$ independent effects of varying sizes (with $\sigma^2$ ranging from $1^2$ to $5^2$).
Notice that cases 1 and 2 are covered by "canonical" covariance matrices in MASH, but 3 is not.
### Identical effects
Effects were nonnull for all conditions, with an effect size that was identical across conditions. The unique effect size was generated from a $N(0, \sigma^2)$ distribution. Similar to the above, I simulated $(4)$ small identical effects ($\sigma^2 = 2^2$) and $(5)$ large identical effects ($\sigma^2 = 5^2$).
Both of these cases are covered by canonical covariance matrices in MASH.
### Rank-one effects
These are similar to "identical effects" in that the covariance matrix has rank one (so that conditions 2-20 are always fixed multiples of condition 1), but here, the effect sizes vary.
$(6)$ The effect size for condition 1 was generated from a $N(0, 1)$ distribution, and conditions 1-20 were multiples evenly spaced between 1 and 5.
Unlike identical effects, rank-one effects are not directly modeled by canonical covariance matrices.
### Unique effects
Effects were nonnull in one condition only, with the nonnull effect simulated from a $N(0, \sigma^2)$ distribution. I simulated $(7)$ small effects ($\sigma^2 = 3^2$) unique to condition 1 and $(8)$ large effects ($\sigma^2 = 8^2$) unique to condition 2.
These effects are directly modeled by canonical covariance matrices.
### Shared effects
Effects were nonnull in several conditions. The nonnull effects were either identical across conditions or fixed multiples of one another. I included $(9)$ medium effects ($\sigma^2 = 3^2$) identical across 3 conditions (3-5), $(10)$ small effects ($\sigma^2 = 2^2$) identical across 10 conditions (1-10), and $(11)$ effects of differing sizes over 5 conditions (6-10), with variances ranging from $2^2$ to $4^2$.
None of these types of effects are modeled by canonical covariance matrices.
### Random covariance
I included three random covariance structures in which effects were nonnull across all conditions. In each case the random covariance matrix $A^T A$ was generated by sampling the entries of $A$ independently from a $N(0, 2^2)$ distribution. I included $(12)$ a rank-5 random covariance matrix (with $A \in \mathbb{R}^{5 \times 44}$), $(13)$ a rank-10 random covariance matrix $(A \in \mathbb{R}^{10 \times 44})$, and $(14)$ a full-rank random covariance matrix $(A \in \mathbb{R}^{44 \times 44})$.
### Combinations of independent, identical, and unique effects
Finally, I included several combinations of the above types of effects. In particular, I simulated $(15)$ small identical effects (covariance type 4) plus large independent effects (type 2), $(16)$ small independent effects (type 1) plus a large unique effect (type 8), and $(17)$ small identical effects (type 4) plus a large unique effect (type 8).
## Fitting methods
The fitting methods are identical to those described in my [study of GTEx data](MASHvFLASHgtex.html).
## Results
I pre-run the script [below](#code) and load the results from file.
```{r load_res}
rrmses <- sqrt(readRDS("./output/MASHvFLASHsims2/mses.rds"))
pr <- readRDS("./output/MASHvFLASHsims2/pr.rds")
```
### RRMSE (nulls)
First I give a breakdown of the relative root mean-squared errors (that is, the RMSE for each fit object, divided by the RMSE that would be obtained by simply using the observed data $Y$ to estimate the "true effects" $X$). In addition to calculating the RRMSE separately for each covariance type, I also separately consider null effects and nonnull effects.
MASH does much better in shrinking null effects towards zero for tests that are null across all conditions or that are unique to a single condition (covariance types 7-8). However, the OHF method much better on nulls when nonnull effects are of medium size and are shared across several conditions (types 9 and 11). Here, it is possibly relevant to recall that cases 7 and 8 are covered by canonical covariance matrices in MASH but cases 9 and 11 are not.
```{r mse_null}
method.names <- c("pn.zero", "pn.OHF", "ash.zero", "ash.OHF", "mash")
plot.order <- c(5, 2, 4, 1, 3)
legend.args <- list(x="right", bty="n", inset=c(-0.25,0), xpd=T)
main.null <- list()
idx.null <- list()
names.null <- list()
main.null[[1]] <- "RRMSE for nulls (null/unique effects)"
idx.null[[1]] <- c(23, 7, 9)
names.null[[1]] <- c("null", "small (7)", "large (8)")
main.null[[2]] <- "RRMSE for nulls (shared effects)"
idx.null[[2]] <- c(11, 13, 15)
names.null[[2]] <- c("3 cond. (9)", "10 cond. (10)", "5 cond. (11)")
par(mar = c(5,4,4,6))
for (i in 1:length(main.null)) {
barplot(rrmses[plot.order, idx.null[[i]]],
names.arg = names.null[[i]], beside=T,
ylim=c(0, 1), ylab="RRMSE",
main=main.null[[i]], legend.text=method.names[plot.order],
args.legend=legend.args)
}
```
### RRMSE (nonnulls)
Results for nonnull effects are more even. MASH generally does a bit better than FLASH, with the notable exception of type 17 (a combination of identical and unique effects) and the possible exception of unique effects (types 7-8).
```{r mse_nonnull}
main.nonnull <- list()
idx.nonnull <- list()
names.nonnull <- list()
main.nonnull[[1]] <- "RRMSE for independent effects"
idx.nonnull[[1]] <- 1:3
names.nonnull[[1]] <- c("small (1)", "large (2)", "various (3)")
main.nonnull[[2]] <- "RRMSE for identical and rank-1 effects"
idx.nonnull[[2]] <- 4:6
names.nonnull[[2]] <- c("sm. ident. (4)", "lg. ident. (5)", "rank-1 (6)")
main.nonnull[[3]] <- "RRMSE for unique effects"
idx.nonnull[[3]] <- c(8, 10)
names.nonnull[[3]] <- c("small (7)", "large (8)")
main.nonnull[[4]] <- "RRMSE for shared effects"
idx.nonnull[[4]] <- c(12, 14, 16)
names.nonnull[[4]] <- c("3 cond. (9)", "10 cond. (10)", "5 cond. (11)")
main.nonnull[[5]] <- "RRMSE for random covariance"
idx.nonnull[[5]] <- 17:19
names.nonnull[[5]] <- c("rank-5 (12)", "rank-10 (13)", "full-rank (14)")
main.nonnull[[6]] <- "RRMSE for combinations of effects"
idx.nonnull[[6]] <- 20:22
names.nonnull[[6]] <- c("ident. + ind.", "ind. + uniq.", "ident. + uniq.")
par(mar = c(5,4,4,6))
for (i in 1:length(main.nonnull)) {
barplot(rrmses[plot.order, idx.nonnull[[i]]],
names.arg = names.nonnull[[i]], beside=T,
ylim=c(0, 2), ylab="RRMSE",
main=main.nonnull[[i]], legend.text=method.names[plot.order],
args.legend=legend.args)
}
```
### FPR/TPR
As in my previous [simulation study](MASHvFLASHsims.html), I evaluate true and false positive rates using the built-in function `get_lfsr()` for MASH and by simulating from the posterior for FLASH. For each covariance structure, I plot the true positive rate for a given covariance structure against the *overall* false positive rate.
Results are again fairly even, with a few exceptions where MASH clearly dominates FLASH. In particular, FLASH does poorly on independent effects (types 1-3) and randomly generated covariance structures (types 12-14)
```{r tpr}
get_fpr <- function(pr) {
nullidx <- c(7, 9, 11, 13, 15)
fp <- 25 * rowSums(pr[, nullidx]) + 775 * (pr[, 23])
fp / (25 * length(nullidx) + 775)
}
plot_fprvtpr <- function(idx, typename) {
colors <- c("orangered", "blue", "orange", "sky blue", "seagreen")
plot.order <- c(5, 1, 3, 2, 4)
plot(get_fpr(pr$mash), pr$mash[, idx], type='l',
col=colors[length(method.names)], lty=1,
xlab="FPR", ylab="TPR", ylim=c(0, 1), main=typename)
for (i in 1:(length(method.names) - 1)) {
next.pr <- pr[[method.names[i]]]
lines(get_fpr(next.pr), next.pr[, idx], col=colors[i], lty=1)
}
legend("bottomright", legend=method.names[plot.order],
col=colors[plot.order], lty=1)
}
par(mfrow=c(1, 2))
plot_fprvtpr(1, "Small independent (1)")
plot_fprvtpr(2, "Large independent (2)")
plot_fprvtpr(3, "Independent of varying size (3)")
plot_fprvtpr(6, "Rank-one (6)")
plot_fprvtpr(4, "Small identical (4)")
plot_fprvtpr(5, "Large identical (5)")
plot_fprvtpr(8, "Small unique (7)")
plot_fprvtpr(10, "Large unique (8)")
plot_fprvtpr(12, "Shared (3 conditions) (9)")
plot_fprvtpr(14, "Shared (10 conditions) (10)")
plot_fprvtpr(16, "Shared (varying sizes) (11)")
plot_fprvtpr(17, "Random rank-5 (12)")
plot_fprvtpr(18, "Random rank-10 (13)")
plot_fprvtpr(19, "Random full-rank (14)")
plot_fprvtpr(20, "Identical plus independent (15)")
plot_fprvtpr(21, "Independent plus unique (16)")
plot_fprvtpr(22, "Identical plus unique (17)")
```
## Code
Click "Code" to view the code used to obtain the above results.
```{r code, echo=F, include=F}
knitr::read_chunk("./code/MASHvFLASHsims2.R")
```
```{r sims2, eval=F}
```