adding parallel permutations for envfit? #348

nick-youngblut · 2020-03-11T09:38:01Z

Are there any plans to add parallel running of the permutations for envfit()? With a large env matrix (eg., 3500 x 3000), 999 permutations can take hours, if not days.

The text was updated successfully, but these errors were encountered:

gavinsimpson · 2020-03-23T20:12:13Z

I can take a look at this and model the parallel permutations on how we do it in permutest.cca and anova.cca()-related functions.

gavinsimpson · 2020-03-23T20:46:39Z

What would you want to parallelize over here? We could do the permutations in parallel with some changes to envfit.default, but that would still do all 3000 (in your example) environmental variables. I'm assuming that's what you're interested in?

nick-youngblut · 2020-03-24T08:22:44Z

I ended up just writing a wrapper script to run each envfit() on each individual variable in parallel. It would be nice if one did not have to write such a wrapper, but writing that extra bit of code is not very hard

heronoh · 2023-04-28T20:47:59Z

Hi @gavinsimpson! Thanks for this wonderful package!
Would you have any update on enfit parallelization?

If not, @nick-youngblut, could you share your wrapper?

I have few environmental variables, but lots of species, and would be great if I could use my server full potential in this time consuming step.

Thanks a lot!

gavinsimpson · 2023-04-29T10:41:26Z

@heronoh No updates; I guess this stalled when I realised that they way envfit is implemented, it wouldn't have lead to much improvement (I assumed) because the parallelisation I had in mind would simply chunk the permutations over the available cores/cluster. The internal code would still be doing matrix operations on the thousands of variables in Nick's example/use-case.

I'll see about implementing what I had in mind so it works like anova.cca to chunk the set of permutations and distribute those over cores.

jarioksa · 2023-04-29T12:28:13Z

@gavinsimpson it looks like a very simple thing (caveat!) to implement parallelization like you outlined. I only had a look at vectorfit, but there you have one sapply that goes through permutations, and all you need to do is to have a parallel *apply for this line. However, have a look at current permutest.cca to avoid pitfalls that actually can slow down calculations. The key point is to use splitIndices to send a chunk of permutations in one call. See commit 882f48b for the case in permutest.cca.

I haven't analysed the code, but another option is to split variables: this could have smaller memory footprint than splitting permutations but keeping all variables.

gavinsimpson · 2023-04-29T12:31:46Z

@jarioksa Yeah; after initially misunderstanding how mcapply works (as I was following permutest.cca too closely; that code actually ends up iterating over perms in C AFAICT), I have vectorfit() working with both kinds of parallel processing. I'll look at factorfit() shortly.

gavinsimpson self-assigned this Mar 23, 2020

gavinsimpson added the feature-request label Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding parallel permutations for envfit? #348

adding parallel permutations for envfit? #348

nick-youngblut commented Mar 11, 2020

gavinsimpson commented Mar 23, 2020

gavinsimpson commented Mar 23, 2020

nick-youngblut commented Mar 24, 2020

heronoh commented Apr 28, 2023

gavinsimpson commented Apr 29, 2023

jarioksa commented Apr 29, 2023

gavinsimpson commented Apr 29, 2023

adding parallel permutations for envfit? #348

adding parallel permutations for envfit? #348

Comments

nick-youngblut commented Mar 11, 2020

gavinsimpson commented Mar 23, 2020

gavinsimpson commented Mar 23, 2020

nick-youngblut commented Mar 24, 2020

heronoh commented Apr 28, 2023

gavinsimpson commented Apr 29, 2023

jarioksa commented Apr 29, 2023

gavinsimpson commented Apr 29, 2023