Error in tmp[, seq_len(p) + p + 1] : subscript out of bounds #276

kdyson · 2018-05-02T20:04:03Z

A recent update (between 2.4.1 I think and current) has created an error in permutest.cca under specific conditions.

Minimum reproducible example:

troubleshoot <- as.data.frame(matrix(nrow = 40, ncol = 70,
                                               data = rnbinom(n = 40*70, size = 4, prob = .5)))
colnames(troubleshoot)[1:2] <- c("gc.simple", "acres")
species.site <- as.data.frame(matrix(nrow = 40, ncol = 10800,
                                               data = rnbinom(n = 40*10800, size = 4, prob = .5)))

adonis2(
              formula = sqrt(species.site) ~  gc.simple + acres,
              data = troubleshoot,
              parallel = 2,
              permutations = perm,
              method = "bray",
              by = "terms"
)

Changing by to "margin" or explicitly calling ~ troubleshoot$gc.simple etc. seems to fix the problem. Smaller tables and one variable models also work fine.

The text was updated successfully, but these errors were encountered:

gavinsimpson · 2018-05-02T20:09:24Z

Thanks @kdyson. What's perm? Without it we can't run your minimal example (without guessing).

kdyson · 2018-05-02T20:11:47Z

Oh shoot, yeah--that's just my var for how many permutations I want things to do. It's set at 99999 for the example.

kdyson · 2018-05-02T20:19:25Z

Also--this may end up being a parallel error? removing that argument also seems to fix things.

jarioksa · 2018-05-02T21:34:34Z

No error with me, but copy-pasting the example (and with parallel=2 and perm=99999) gives

Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 99999

adonis2(formula = sqrt(species.site) ~ gc.simple + acres, data = troubleshoot, permutations = perm, method = "bray", by = "terms", parallel = 2)
          Df SumOfSqs      R2      F Pr(>F)
gc.simple  1  0.02657 0.02485 0.9688 0.9729
acres      1  0.02773 0.02594 1.0110 0.2491
Residual  37  1.01469 0.94921              
Total     39  1.06898 1.00000

This was with vegan 2.5-1 using R 3.4.4 on macOS.

Obviously we need more detail. However, the error line reported in the header of this message is line 170 of R/permutest.cca.R.

jarioksa · 2018-05-02T21:47:33Z

@kdyson : are you on Windows? We use different parallel processing there than in Unix-like systems (Linux, macOS), and I have not tested that extensively since I have no Windows. It could be a bug in the code that we use when we fork to Windows-compatible type of parallel processing (which we can replicate in Unix-like systems if we set up the clusters before processing). However, it is past midnight over here, and I'll wait till sunrise before having a look at that.

kdyson · 2018-05-02T21:58:00Z

Yep, I am on Windows, v. 10.0.16299.371, vegan 2.5-1, parallel 3.4.4, R 3.4.4.

Thanks so much for looking into this, I appreciate it! Easy enough to turn off parallel for now but with next-gen sequencing data sets it's nice to have :)

psolymos · 2018-05-02T22:58:55Z

I could reproduce it on Mac as well by explicitly defining the cluster (which happens inside permutest.cca on Windows):

cl <- makeCluster(2)
adonis2(
              formula = sqrt(species.site) ~  gc.simple + acres,
              data = troubleshoot,
              parallel = cl,
              permutations = perm,
              method = "bray",
              by = "terms"
)
# Error in tmp[, seq_len(p) + p + 1] : subscript out of bounds
stopCluster(cl)

I suspect the dimensions of tmp are wrong and permutest.cca fails at F.perm <- tmp[, seq_len(p) + p + 1].

jarioksa · 2018-05-03T04:24:25Z

@psolymos that was my suspicion, too. Should happen in the parallel processing block well before crash line 170. Go to check this (and if this is a problem, we probably need to check the parallel code blocks in other functions, too).

psolymos · 2018-05-03T04:49:54Z

@jarioksa : in debug, dim(tmp) is 1665 x 3, but seq_len(p) + p + 1 is 4:5 for the MRE. getF(i) returns a vector of length 5, but the next line casts it into a matrix with 3 columns: tmp <- matrix(tmp, ncol = 3, byrow = TRUE). It needs to be the right dimension, maybe using parLapply and do.call as in the forking case would be safer, not needing to guess the dimension.

It is not a cluster vs. forking issue, but rather wrong dimension assumed in the the cluster case only.

With by="terms" the permutest.cca evaluates all contrasts within compiled code and the number of columns in return matrix depends on the number of contrasts. Socket clusters in parallel processing where not adapted to this, but assumed only tests of one term. Socket clusters are used as the only alternative in Windows, and optional with pre-set clusters in unix-like systems. Addresses github issue #276 where this was reported for adonis2. Obviously also influences by="onedf" in permutest, although this was not tested.

This may be a better fix to issue #276 than reverted ac12f62. The results are the same, but this may be slightly faster and avoids transpose.

jarioksa · 2018-05-04T06:47:46Z

I think this really is a bug in handling socket clusters in parallel processing. Socket clusters are the only available alternative in Windows, and they can be invoked in Linux and macOS as predefined clusters generated by parallel::makeCluster(). I think I have now fixed this in branch issue-#276. There are actually two alternative fixes with identical results, but I think I'll stick to the latter one with smaller change in the code. If possible, test the code. If nothing odd crops out, I'll merge it within 24 hours.

jarioksa · 2018-05-05T09:17:06Z

Probably fixed with 8003a4f, but needs testing also in Windows.

This may be a better fix to issue #276 than reverted ac12f62. The results are the same, but this may be slightly faster and avoids transpose. (cherry picked from commit 8003a4f)

psolymos added the bug label May 3, 2018

jarioksa pushed a commit that referenced this issue May 4, 2018

permutest.cca by=... failed with socket clusters: alternative fix

8003a4f

This may be a better fix to issue #276 than reverted ac12f62. The results are the same, but this may be slightly faster and avoids transpose.

jarioksa closed this as completed May 5, 2018

jarioksa mentioned this issue May 15, 2018

vegan 2.5-2 release #280

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in tmp[, seq_len(p) + p + 1] : subscript out of bounds #276

Error in tmp[, seq_len(p) + p + 1] : subscript out of bounds #276

kdyson commented May 2, 2018

gavinsimpson commented May 2, 2018

kdyson commented May 2, 2018

kdyson commented May 2, 2018

jarioksa commented May 2, 2018 •

edited

Loading

jarioksa commented May 2, 2018

kdyson commented May 2, 2018 •

edited

Loading

psolymos commented May 2, 2018

jarioksa commented May 3, 2018

psolymos commented May 3, 2018 •

edited

Loading

jarioksa commented May 4, 2018

jarioksa commented May 5, 2018

Error in tmp[, seq_len(p) + p + 1] : subscript out of bounds #276

Error in tmp[, seq_len(p) + p + 1] : subscript out of bounds #276

Comments

kdyson commented May 2, 2018

gavinsimpson commented May 2, 2018

kdyson commented May 2, 2018

kdyson commented May 2, 2018

jarioksa commented May 2, 2018 • edited Loading

jarioksa commented May 2, 2018

kdyson commented May 2, 2018 • edited Loading

psolymos commented May 2, 2018

jarioksa commented May 3, 2018

psolymos commented May 3, 2018 • edited Loading

jarioksa commented May 4, 2018

jarioksa commented May 5, 2018

jarioksa commented May 2, 2018 •

edited

Loading

kdyson commented May 2, 2018 •

edited

Loading

psolymos commented May 3, 2018 •

edited

Loading