some basics on doing permutations #4

mesamuels · 2023-06-18T21:35:42Z

Hi, I have just started to use your permutation test package. I am not a programmer, so it took a bit of time to figure out how to download and install on my system, where it is not standardly present. I am using the Digital Research Alliance of Canada system, used to be called Compute Canada. I have a couple of questions:

First, I was able to install the wheel version, but for some reason not the tar.gz version, my system had Path troubles it seems. Anyway, I can import the program in Python and run the test. Question 1, where is the package actually? I do not see any obvious new directories or files in my home directory, I wouldn't think it could be installed more generally. Does the installation update some path variables in my home account so python knows where to look for permutations_stats.permutations ?

More importantly, is there some general documentation on setting some parameters in the system, like the number of permutations, whether to permute both X and Y (in my case I am doing a Brunner Munzel test with cases and controls, so just two data variables)? I was able to run the test with default parameters when X and Y both had around a dozen values, but when I tried to use my real data, which has over 100 values for one of the two, the system "killed" the program so presumably I exceeded memory allocation. I can request more memory, but need to have some rough idea of how much memory is required for a certain large number of data points. Is it possible to permute only one of the two data variables? Is it meaningful to do that?

Some of this info can be found on the web for other permutation test packages (like for R), but your implementation may be different so it's best to ask at the source.

You don't seem to have a literature citation posted here, is there one now that I can cite when I write this up? Otherwise do I just mention it as software used in my Methods section with a link to here?

FYI, this is for a human genetics project. We sequenced a bunch of cases and controls to look for mutations causing a rare endocrine phenotype, focused on very rare genomic variants. As a result we are doing a kind of gene burden test, and want to compare the burden of variants in the cases vs controls. For particular reasons, a Fisher or chi-square 2x2 test is not appropriate here, rather we are using Mann-Whitney or Brunner-Munzel, essentially to compare the average per sample mutation burden by ranking the samples. Because the number of cases is small (12), I wanted to run a permutation version just in case the regular tests (BM or MW) are too forgiving. BM is probably better generally as we don't know whether the variances are the same in the cases and controls, although they likely are.

Thanks and cheers from Montreal!
Mark Samuels
Associate Professor in Medicine
University of Montreal

mesamuels · 2023-06-18T22:46:13Z

Aha, I figured out from you documentation (sorry it was a bit tricky to find), to set the number of permutations in "simulation" mode, I guess exact mode doesn't have limits. There was no problem running with my set of 12 cases and 300 controls doing 1 million permutations, with something like 64 gigabytes of memory. There is not a huge difference between the basic and permuted Brunner-Munzel p-values, guess that it is quite a good test. Still, it's great to have the validation. I'm assuming a million permutations is enough. BTW, I don't know how to use "numba", I imported numpy as recommended, we don't seem to have numba on our system unless it is hiding somewhere. Maybe with that I could run an exact test, but it's not really necessary.

Thanks,
Mark

trevismd · 2024-10-28T15:37:51Z

Thank you for reaching out and for posting your reply, @mesamuels.

Please don't hesitate if anything else comes up I could help you with, I'll try and answer faster this time.

To help the next reader if not you as I am so late, while numba or more resources could help run the simulations faster, I don't think it would enable you to compute all permutations either, there are just too many ways to select 12 out of 312 records.

Disclaimer: I am a trained physician and data science enthusiast converted to full-time software engineering, not a statistician. I recommend checking with a professional how the following comment can apply to your specific case: to evaluate if 1 million permutations are enough, you could run the simulation multiple times and assess the variability of the p-values you get. If they stay close together, you could be confident that the real value is likely close too. If they don't you can do this operation again with more permutations per simulation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some basics on doing permutations #4

some basics on doing permutations #4

mesamuels commented Jun 18, 2023

mesamuels commented Jun 18, 2023

trevismd commented Oct 28, 2024 •

edited

Loading

some basics on doing permutations #4

some basics on doing permutations #4

Comments

mesamuels commented Jun 18, 2023

mesamuels commented Jun 18, 2023

trevismd commented Oct 28, 2024 • edited Loading

trevismd commented Oct 28, 2024 •

edited

Loading