-
Notifications
You must be signed in to change notification settings - Fork 68
Closed
Labels
featurea feature request or enhancementa feature request or enhancement
Description
Hi,
I propose the addition of a function for row-wise permutation-based resampling. Happy to have a discussion about whether it fits into the vision of rsample. Basically, I want to consider resurrecting #13. recipes::step_shuffle() is helpful but not quite sufficient for large scale permutation resampling. infer::generate() is closer to what is desired, but it doesn't exactly follow the rsample paradigm. I should add, if there is interest, I would be happy to contribute this feature via a PR.
Motivation
- I think this type of resampling function fits into the natural ecosystem with
rsample - I am a proponent of using permutation-based resampling for generating null distributions for statistics (e.g. p-values), such as when the null distribution is unknown or if distributional assumptions for parametric tests may be violated
- Counterpoint: Unlike the other resampling methods in
rsamplewhich serve to generate samples that have the appearance of being new draws from the same underlying data-generating mechanism (or the alternative dist.), permutation resampling serves to generate samples under the null. Despite this difference, I still think it possibly fits in here. - Issue: Permutation sampling doesn't have defined splits, so what would
training()/testing()oranalysis()/assessment()return? This is a bit of a sticking point, perhaps.
Considerations
- Function could be named
permutations()to have the same feel asbootstraps(), but open for suggestions - Could function in one of two ways:
- Return a fixed number of permutations, akin to
bootstraps() - Return all possible permutations, as is common for permutation tests (could be an enormous amount of permutations for large data/multiple columns). In most cases one would only need to permute the response variable which simplifies things a great deal...
- Return a fixed number of permutations, akin to
- I think the function should also include a
strataargument to perform stratified permutation; probably should also have all the same arguments as bootstraps - I think the function should maybe contain a
colsargument specifying which columns should be permuted, while all other columns remain in their natural order.colscould accepttidyselectfunctions to select columns to permute. - Could add some helper functions to:
- Summarize permutation-based statistics with mean, SE, and confidence intervals (similar to
int_pctl()) - Plot the permutation distribution of a chosen statistic and show where the apparent value falls in relation
- Summarize permutation-based statistics with mean, SE, and confidence intervals (similar to
Proposed API
For bootstrap-like functionality...
permutations(data, times = 25, strata = NULL, cols = everything(), breaks = 4, apparent = FALSE, ...)For permutation-test like functionality...
permutations(data, strata = NULL, cols = everything(), breaks = 4, apparent = FALSE, ...)Metadata
Metadata
Assignees
Labels
featurea feature request or enhancementa feature request or enhancement