Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding parallel permutations for envfit? #348

Open
nick-youngblut opened this issue Mar 11, 2020 · 7 comments
Open

adding parallel permutations for envfit? #348

nick-youngblut opened this issue Mar 11, 2020 · 7 comments
Assignees

Comments

@nick-youngblut
Copy link

Are there any plans to add parallel running of the permutations for envfit()? With a large env matrix (eg., 3500 x 3000), 999 permutations can take hours, if not days.

@gavinsimpson
Copy link
Contributor

I can take a look at this and model the parallel permutations on how we do it in permutest.cca and anova.cca()-related functions.

@gavinsimpson
Copy link
Contributor

What would you want to parallelize over here? We could do the permutations in parallel with some changes to envfit.default, but that would still do all 3000 (in your example) environmental variables. I'm assuming that's what you're interested in?

@nick-youngblut
Copy link
Author

I ended up just writing a wrapper script to run each envfit() on each individual variable in parallel. It would be nice if one did not have to write such a wrapper, but writing that extra bit of code is not very hard

@heronoh
Copy link

heronoh commented Apr 28, 2023

Hi @gavinsimpson! Thanks for this wonderful package!
Would you have any update on enfit parallelization?

If not, @nick-youngblut, could you share your wrapper?

I have few environmental variables, but lots of species, and would be great if I could use my server full potential in this time consuming step.

Thanks a lot!

@gavinsimpson
Copy link
Contributor

@heronoh No updates; I guess this stalled when I realised that they way envfit is implemented, it wouldn't have lead to much improvement (I assumed) because the parallelisation I had in mind would simply chunk the permutations over the available cores/cluster. The internal code would still be doing matrix operations on the thousands of variables in Nick's example/use-case.

I'll see about implementing what I had in mind so it works like anova.cca to chunk the set of permutations and distribute those over cores.

@jarioksa
Copy link
Contributor

@gavinsimpson it looks like a very simple thing (caveat!) to implement parallelization like you outlined. I only had a look at vectorfit, but there you have one sapply that goes through permutations, and all you need to do is to have a parallel *apply for this line. However, have a look at current permutest.cca to avoid pitfalls that actually can slow down calculations. The key point is to use splitIndices to send a chunk of permutations in one call. See commit 882f48b for the case in permutest.cca.

I haven't analysed the code, but another option is to split variables: this could have smaller memory footprint than splitting permutations but keeping all variables.

@gavinsimpson
Copy link
Contributor

@jarioksa Yeah; after initially misunderstanding how mcapply works (as I was following permutest.cca too closely; that code actually ends up iterating over perms in C AFAICT), I have vectorfit() working with both kinds of parallel processing. I'll look at factorfit() shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants