-
Notifications
You must be signed in to change notification settings - Fork 0
/
baci-n0.Rmd
308 lines (228 loc) · 12.5 KB
/
baci-n0.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
---
title: "BACI analysis for taxonomic richness (N0)"
author: "Valentin Stefan"
date: "21 Feb 2021"
output:
html_document:
toc: true
toc_float: true
toc_collapsed: false
toc_depth: 3
number_sections: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
# Set root directory as two directories above the current location (where the
# *.Rproj is located)
knitr::opts_knit$set(root.dir = '../../')
```
# Load packages and helper functions
```{r packages, message=FALSE}
# Source scripts that contains various helper functions for reading and plotting
# data.
source(file = "analysis/baci_glmm/helper-functions.r")
load_install_packages()
```
# Prepare date
```{r prepare-data}
# Read data
baci_dt <- fread("data/variables_for_BACI.csv", stringsAsFactors = TRUE)
# Rename some columns:
setnames(baci_dt,
old = c("BACItime", "BACIpos", "Site", "Ind./m2", "% EPT", "Total organisms", "Total EPT organisms"),
new = c("period_ba", "treatment_ci", "site_f", "ind_m2", "ept_prc", "N", "N_EPT"))
# Create a factor column `year_f` where I rounded the year to integer and then
# converted to factor type.
baci_dt[, year_f := Year %>% round(digits = 0) %>% factor]
# This will be needed for testing for overdispersation
baci_dt[, obs := 1:.N]
# Variable of interest
varb <- "N0"
```
# Exploratory plots
## Histogram of the explained variable
```{r}
hist(baci_dt[[varb]],
main = paste("Histogram of" , varb),
xlab = varb)
```
## Preliminary plot
```{r fig-1, fig.cap = "Fig. 1 - Observed values and trends. Thicker lines represent mean lines. Thinner lines stand for each site. There is only one control site which actually overlaps with the mean line for control."}
preliminary_plot <- exploratory_plot(varb)
preliminary_plot
```
We can observe already that there is no clear difference between "impact" and "control". The two mean lines are mostly parallel.
# GLMM fitting
The taxonomic richness (N0) represents counts. We can use the Poisson error distribution.
## Fixed and random effects
Specify fixed and random effects and test for random effects structure.
The general model formula with fixed effects is `varb ~ period_ba + treatment_ci + period_ba:treatment_ci`
Note the interaction term `period_ba:treatment_ci` which is the "BACI effect" (see Schwarz, 2015). Testing for its statistical significance is equivalent to testing for an environmental impact (Schwarz, 2015).
To the fixed effects model, random effects are added: `site_f` and `year_f`.
Below, it was tested if `site_f:year_f` interaction should be kept in the model.
```{r warning=FALSE}
# Create an empty list to be populated with models
models <- list()
# Fit model without the interaction of random effects (site_f and year_f)
models[[1]] <- glmmTMB(N0 ~ period_ba * treatment_ci + (1|site_f) + (1|year_f),
data = baci_dt, family = poisson(link = "log"))
# Model with interaction in the random effects structure
models[[2]] <- glmmTMB(N0 ~ period_ba * treatment_ci + (1|site_f) + (1|year_f) + (1|site_f:year_f),
data = baci_dt, family = poisson(link = "log"))
```
Note that the warnings `'giveCsparse' has been deprecated; setting 'repr = "T"' for you` are expected - see comment of Ben Bolker [here](https://github.com/glmmTMB/glmmTMB/issues/615#issuecomment-763183844).
Model summary:
```{r}
summary(models[[1]])
summary(models[[2]])
```
The models are compared using the Akaike Information Criterion (AIC). The preferred model should be the one with a smaller AIC value. A model selection approach similar to the one proposed by Burnham & Anderson (2002) is used. The authors suggest the rule of thumb according to which models with AIC difference smaller or equal than 2 (delta-AIC ≤ 2) have substantial empirical support, those with 4 ≤ delta-AIC ≤ 7 have considerably less and those with delta-AIC > 10 have essentially none. On the other hand, they also suggest that the model with delta-AIC > 10 might still be used for inference if the sample size is large.
```{r}
# AICc comparison
aictab(cand.set = models,
modnames = c("1-without site_f:year_f",
"2-with site_f:year_f"),
second.ord = TRUE)
```
Additionally to the AIC test, we can also run the likelihood ratio test (LRT) as per Bates (2015).
```{r}
anova(models[[1]], models[[2]])
```
The AIC test favors that the less complex model (without `site_f:year_f`). The LRT test indicates that there are no significant differences between the two models. According to the rule of parsimony, we pick the simpler model for further analysis.
## Overdispersion
Testing for overdispersion in the mixed model.
"Overdispersion: the occurrence of more variance in the data than predicted by a statistical model." (Bolker, 2009). In case of overdispersed then we can fit a model with observation-level random effects see Harrison XA (2014) or Harrison XA (2015). This approach was also suggested by Ben Bolker [here](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#overdispersion), where further references are mentioned.
The tests below indicate no significant overdispersion.
```{r warning=FALSE}
# Test for overdispersion with DHARMa residuals diagnostics
simulation <- DHARMa::simulateResiduals(fittedModel = models[[1]])
plot(simulation)
# A non-parametric test on the simulated residuals
DHARMa::testDispersion(simulation)
```
We keep the model without any overdispersion adjustments for further analysis.
```{r}
model <- models[[1]]
```
## Diagnostic plots
```{r warning=FALSE}
# Residuals
qqnorm(residuals(model), main = "Q-Q plot - residuals")
qqline(residuals(model), col = "red")
# inspecting the random effects (see also Bolker, 2009 - supp 1)
qqnorm(unlist(ranef(model)), main = "Q-Q plot, random effects")
qqline(unlist(ranef(model)), col = "red")
# fitted vs residuals
scatter.smooth(fitted(model),
residuals(model, type = "pearson"),
main = "fitted vs residuals",
xlab = "Fitted Values",
ylab = "Residuals")
abline(h = 0, col = "red")
# fitted vs observed
scatter.smooth(fitted(model), baci_dt[[varb]],
xlab = "Fitted Values",
ylab = "Observed")
abline(0, 1, col = "red")
```
## Significance of BACI
We will test for statistical significance of BACI interaction term using the likelihood ratio test (LRT).
The nested models that we compare are:
- the model without the interaction between the fixed effects (the BACI effect)
- model with the BACI effect
```{r warning=FALSE}
# Model without the BACI interaction term
model.no.interaction <- glmmTMB(N0 ~ period_ba + treatment_ci + (1|site_f) + (1|year_f),
data = baci_dt, family = poisson(link = "log"))
# LRT
lrt <- anova(model, model.no.interaction)
lrt
```
The likelihood ratio (LRT) gives no significant p-value. There are no significant differences between the two models. Therefore, the interaction term (BACI effect) is not statistically significant (there is no environmental impact).
## Model coefficients
Extract coefficients from final model.
Remove the intercept for extracting group means and their confidence intervals as suggested in Schielzeth H (2010).
```{r warning=FALSE}
final.model <- model
final.model.noIntercept <- update(final.model, . ~ . -1)
```
To get the real proportions (Least-squares means/predicted marginal means/treatment means) one can do:
```{r}
estimates <- lsmeans::lsmeans(final.model.noIntercept, ~ treatment_ci:period_ba, type = "response")
# https://stats.stackexchange.com/questions/192062/issue-calculating-adjusted-means-for-glmer-model
# Confidence level used: 0.95
estimates
```
As indicated in Schwarz CJ (2015), the BACI effect is computed as BACI = avgCA - avgCB - (avgIA - avgIB):
```{r}
# get the estimates only (without CI)
est <- predict(ref.grid(final.model.noIntercept), type = "response")
# give names to the estimates vector
names(est) <- c("CA","CB","IA","IB"); est
baci <- est["CA"]-est["CB"]-(est["IA"]-est["IB"])
baci
```
One can also get the BACI effect like:
```{r}
contrast(regrid(estimates), list(baci=c(1,-1,-1,1)))
```
Or with asymptotic CI-s:
```{r}
baci_ci <- confint(contrast(regrid(estimates), list(baci=c(1,-1,-1,1))))
baci_ci
# Confidence level used: 0.95
# https://stats.stackexchange.com/questions/241523/testing-for-pairwise-proportion-differences
```
## Coefficient of determination (R^2)
Below is presented the coefficient of determination of the correlation between the fitted and observed values as suggested by Ben Bolker [here](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#simplecrude-solutions).
```{r warning=FALSE}
r2_corr <- function(m) {
lmfit <- lm(model.response(model.frame(m)) ~ fitted(m))
summary(lmfit)$r.squared
}
R2 <- r2_corr(final.model)
R2
```
## Interaction plot
This plot will confirm as well visually that there is no BACI effect. The confidence intervals of the means overlap and the two lines are almost parallel.
```{r fig-2, fig.cap = "Fig. 2 - Interaction plot for the model. Least square means values. Error bars show the ±95% CI."}
interaction_plot(estimates, varb)
```
We can compare the observed mean values (marked with x symbol in the graph below) with the estimated mean values (model coefficients, marked with a plain dot and connected with lines). The estimated means for `r varb` are close to the observed mean values.
```{r fig-3, fig.cap = "Fig. 3 - Interaction plot depicting both the observed mean values (marked with x) and the estimated mean values (model coefficents, marked with plain dot). Error bars show the ±95% CI."}
interaction_plot(estimates, varb) +
# Add observed means as dots with bootstrapped CIs
stat_summary(data = baci_dt,
aes(x = period_ba,
y = get(varb),
group = treatment_ci,
color = treatment_ci),
fun.data = "mean_cl_boot",
# geom = "point",
size = 0.5,
shape = 4,
position = position_dodge(width = 0.1),
show.legend = TRUE)
```
## Model summary
```{r, echo=FALSE}
# This chunk is helpful for preparing results to report them automatically R
# Markdown below
es_dt <- summary(estimates) %>% as.data.table()
# Depending on the link function, the 3rd column can be called "rate" or
# "response", so rename it to "pred" (from predicted)
colnames(es_dt)[3] <- "pred"
```
The final GLMM model explained `r (R2*100) %>% round(2)` % of the variance in `r varb`.
There was no significant BACI period × treatment effect (LRT, p = `r lrt[['Pr(>Chisq)']][2] %>% round(3)`).
The BACI effect estimated from the model, which is the difference of the two changes (control after − control before) − (impact after − impact before]), was `r round(baci, 2)` ± `r round(baci_ci$SE, 2)` standard error.
The estimated mean `r varb` in the control sites varied from `r es_dt[treatment_ci == "control"][period_ba == "before"][, pred] %>% round(2)` to `r es_dt[treatment_ci == "control"][period_ba == "after"][, pred] %>% round(2)`, and in the impact sites varied from `r es_dt[treatment_ci == "impact"][period_ba == "before"][, pred] %>% round(2)` to `r es_dt[treatment_ci == "impact"][period_ba == "after"][, pred] %>% round(2)`, before and after the construction of the dam (Fig 2 & Fig 3).
# References
Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. arXiv preprint arXiv:1506.04967.
Bolker BM et al. (2009) Generalized linear mixed models: a practical guide for ecology and evolution. Trends in ecology & evolution 24:127–135 at http://www.sciencedirect.com/science/article/pii/S0169534709000196
Burnham, K. & Anderson, D., 2002. Model selection and multimodel inference: a practical information-theoretic approach, New York: Springer.
Harrison XA (2014) Using observation-level random effects to model overdispersion in count data in ecology and evolution. PeerJ 2:e616 https://doi.org/10.7717/peerj.616
Harrison XA (2015) A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution. PeerJ 3:e1114 at https://peerj.com/articles/1114.pdf
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305-315.
Schwarz CJ (2015) Analysis of BACI experiments. In Course Notes for Beginning and Intermediate Statistics. at http://people.stat.sfu.ca/~cschwarz/Stat-650/Notes/PDFbigbook-R/R-part013.pdf
Schielzeth H (2010) Simple means to improve the interpretability of regression coefficients. Methods in Ecology and Evolution, 1(2), 103-113.