Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
More Efficient
group_by(...) %>% sample_*(...)
Improves `df %>% group_by(...) %>% sample_*(...)` performance by 10-100x for dataset with large number of groups. The motivation is that when performing stratified sampling using `group_by %>% sample_n` on 100k+ strata, it can take minutes or longer. A toy example shows every 1k groups increases runtime by ~2s. A quick profiling shows most of time is spent in `eval_tidy(weight, ...)` for each group. This PR performs the weight calculation using `mutate` instead to preserve the semantics but eliminates the repeated overscope lookup across groups.
- Loading branch information
Forest Fang
committed
Nov 13, 2017
1 parent
a1cbc89
commit 9fc9696
Showing
2 changed files
with
5 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters