Sven Neulinger reported a problem with anova.cca with tied values. Here an excerpt of his email:
"Since anova.cca() uses permutation tests, there is a lower limit for the
P-value with small numbers of replicates per treatment group (n) even if
the F-value is very large [e.g., Clarke, K.R. 1993. Non-parametric
multivariate analysis of changes in community structure. Australian
Journal of Ecology 18: 117-143]. Specifically, with n=4 and 2 treatment
levels, the minimal P-value produced by a permutation-based ANOVA is
expected to be ~0.02857.
This outcome should be independent of the number of species (OTUs) in
the count table provided to anova.cca(). However, I noticed that
anova.cca() sometimes gives P-values far lower than what could be
expected."
The following code reproduces the problem:
A <- gl(2, 4)
set.seed(2)
y <- c(rnorm(4, mean=0), rnorm(4, mean=10))/7
mod <- rda(y ~ A)
# permutation matrix
p <- shuffleSet(8, control=how(complete = TRUE, nperm=40320, maxperm=50000))
anova(mod, permutations = p)
##Permutation test for rda under reduced model
##Permutation: free
##Number of permutations: 40319
##
##Model: rda(formula = y ~ A)
## Df Variance F Pr(>F)
##Model 1 0.60590 242.77 2.48e-05 ***
##Residual 6 0.01497
The reported P-value is the lowest possible value 1/(40319+1) meaning that all permutation F-values are lower than the observed statistic:
> out <- permutest(mod, permutations = p)
> max(out$F.perm) - out$F.0
Model
-9.947598e-13
> sum(out$F.perm >= out$F.0)
[1] 0
We round to 12 decimal places in permutest.cca to avoid false breaking of ties, but obviously we need more rounding -- but how much? We do not want to falsely equalize different values. I think we should also use signif (giving the requested number of significant digits) instead of round (giving the requested number of decimals).
The fix should be simple, but I made this a public issue to raise awareness: permutest.cca is not the only function that is similarly affected.
Sven Neulinger reported a problem with
anova.ccawith tied values. Here an excerpt of his email:"Since anova.cca() uses permutation tests, there is a lower limit for the
P-value with small numbers of replicates per treatment group (n) even if
the F-value is very large [e.g., Clarke, K.R. 1993. Non-parametric
multivariate analysis of changes in community structure. Australian
Journal of Ecology 18: 117-143]. Specifically, with n=4 and 2 treatment
levels, the minimal P-value produced by a permutation-based ANOVA is
expected to be ~0.02857.
This outcome should be independent of the number of species (OTUs) in
the count table provided to anova.cca(). However, I noticed that
anova.cca() sometimes gives P-values far lower than what could be
expected."
The following code reproduces the problem:
The reported P-value is the lowest possible value 1/(40319+1) meaning that all permutation F-values are lower than the observed statistic:
We round to 12 decimal places in
permutest.ccato avoid false breaking of ties, but obviously we need more rounding -- but how much? We do not want to falsely equalize different values. I think we should also usesignif(giving the requested number of significant digits) instead ofround(giving the requested number of decimals).The fix should be simple, but I made this a public issue to raise awareness:
permutest.ccais not the only function that is similarly affected.