Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in tmp[, seq_len(p) + p + 1] : subscript out of bounds #276

Closed
kdyson opened this issue May 2, 2018 · 11 comments
Closed

Error in tmp[, seq_len(p) + p + 1] : subscript out of bounds #276

kdyson opened this issue May 2, 2018 · 11 comments
Labels

Comments

@kdyson
Copy link

kdyson commented May 2, 2018

A recent update (between 2.4.1 I think and current) has created an error in permutest.cca under specific conditions.

Minimum reproducible example:

troubleshoot <- as.data.frame(matrix(nrow = 40, ncol = 70,
                                               data = rnbinom(n = 40*70, size = 4, prob = .5)))
colnames(troubleshoot)[1:2] <- c("gc.simple", "acres")
species.site <- as.data.frame(matrix(nrow = 40, ncol = 10800,
                                               data = rnbinom(n = 40*10800, size = 4, prob = .5)))

adonis2(
              formula = sqrt(species.site) ~  gc.simple + acres,
              data = troubleshoot,
              parallel = 2,
              permutations = perm,
              method = "bray",
              by = "terms"
)

Changing by to "margin" or explicitly calling ~ troubleshoot$gc.simple etc. seems to fix the problem. Smaller tables and one variable models also work fine.

@gavinsimpson
Copy link
Contributor

Thanks @kdyson. What's perm? Without it we can't run your minimal example (without guessing).

@kdyson
Copy link
Author

kdyson commented May 2, 2018

Oh shoot, yeah--that's just my var for how many permutations I want things to do. It's set at 99999 for the example.

@kdyson
Copy link
Author

kdyson commented May 2, 2018

Also--this may end up being a parallel error? removing that argument also seems to fix things.

@jarioksa
Copy link
Contributor

jarioksa commented May 2, 2018

No error with me, but copy-pasting the example (and with parallel=2 and perm=99999) gives

Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 99999

adonis2(formula = sqrt(species.site) ~ gc.simple + acres, data = troubleshoot, permutations = perm, method = "bray", by = "terms", parallel = 2)
          Df SumOfSqs      R2      F Pr(>F)
gc.simple  1  0.02657 0.02485 0.9688 0.9729
acres      1  0.02773 0.02594 1.0110 0.2491
Residual  37  1.01469 0.94921              
Total     39  1.06898 1.00000    

This was with vegan 2.5-1 using R 3.4.4 on macOS.

Obviously we need more detail. However, the error line reported in the header of this message is line 170 of R/permutest.cca.R.

@jarioksa
Copy link
Contributor

jarioksa commented May 2, 2018

@kdyson : are you on Windows? We use different parallel processing there than in Unix-like systems (Linux, macOS), and I have not tested that extensively since I have no Windows. It could be a bug in the code that we use when we fork to Windows-compatible type of parallel processing (which we can replicate in Unix-like systems if we set up the clusters before processing). However, it is past midnight over here, and I'll wait till sunrise before having a look at that.

@kdyson
Copy link
Author

kdyson commented May 2, 2018

Yep, I am on Windows, v. 10.0.16299.371, vegan 2.5-1, parallel 3.4.4, R 3.4.4.

Thanks so much for looking into this, I appreciate it! Easy enough to turn off parallel for now but with next-gen sequencing data sets it's nice to have :)

@psolymos
Copy link
Contributor

psolymos commented May 2, 2018

I could reproduce it on Mac as well by explicitly defining the cluster (which happens inside permutest.cca on Windows):

cl <- makeCluster(2)
adonis2(
              formula = sqrt(species.site) ~  gc.simple + acres,
              data = troubleshoot,
              parallel = cl,
              permutations = perm,
              method = "bray",
              by = "terms"
)
# Error in tmp[, seq_len(p) + p + 1] : subscript out of bounds
stopCluster(cl)

I suspect the dimensions of tmp are wrong and permutest.cca fails at F.perm <- tmp[, seq_len(p) + p + 1].

@jarioksa
Copy link
Contributor

jarioksa commented May 3, 2018

@psolymos that was my suspicion, too. Should happen in the parallel processing block well before crash line 170. Go to check this (and if this is a problem, we probably need to check the parallel code blocks in other functions, too).

@psolymos
Copy link
Contributor

psolymos commented May 3, 2018

@jarioksa : in debug, dim(tmp) is 1665 x 3, but seq_len(p) + p + 1 is 4:5 for the MRE. getF(i) returns a vector of length 5, but the next line casts it into a matrix with 3 columns: tmp <- matrix(tmp, ncol = 3, byrow = TRUE). It needs to be the right dimension, maybe using parLapply and do.call as in the forking case would be safer, not needing to guess the dimension.

It is not a cluster vs. forking issue, but rather wrong dimension assumed in the the cluster case only.

@psolymos psolymos added the bug label May 3, 2018
jarioksa pushed a commit that referenced this issue May 3, 2018
With by="terms" the permutest.cca evaluates all contrasts within
compiled code and the number of columns in return matrix depends
on the number of contrasts. Socket clusters in parallel processing
where not adapted to this, but assumed only tests of one term.
Socket clusters are used as the only alternative in Windows, and
optional with pre-set clusters in unix-like systems.

Addresses github issue #276 where this was reported for adonis2.
Obviously also influences by="onedf" in permutest, although this
was not tested.
jarioksa pushed a commit that referenced this issue May 4, 2018
This may be a better fix to issue #276 than reverted ac12f62.
The results are the same, but this may be slightly faster and avoids
transpose.
@jarioksa
Copy link
Contributor

jarioksa commented May 4, 2018

I think this really is a bug in handling socket clusters in parallel processing. Socket clusters are the only available alternative in Windows, and they can be invoked in Linux and macOS as predefined clusters generated by parallel::makeCluster(). I think I have now fixed this in branch issue-#276. There are actually two alternative fixes with identical results, but I think I'll stick to the latter one with smaller change in the code. If possible, test the code. If nothing odd crops out, I'll merge it within 24 hours.

@jarioksa
Copy link
Contributor

jarioksa commented May 5, 2018

Probably fixed with 8003a4f, but needs testing also in Windows.

@jarioksa jarioksa closed this as completed May 5, 2018
jarioksa pushed a commit that referenced this issue May 7, 2018
This may be a better fix to issue #276 than reverted ac12f62.
The results are the same, but this may be slightly faster and avoids
transpose.

(cherry picked from commit 8003a4f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants