Need to be able to sample groups #361

hadley · 2014-03-28T13:25:49Z

As well as individuals within groups

hadley · 2014-03-28T13:29:56Z

species <- iris %.% 
  group_by(Species) %.% 
  summarise(wt = sum(Sepal.Length)) %.%
  sample_n(5, replace = T, weight = wt) %.%
  select(-wt)

inner_join(species, iris)

rcorty · 2015-04-21T04:12:49Z

I wonder why this was closed? Seems like a potentially useful feature

iris %>%
    group_by(Species) %>%
    sample_n(1)

to pick all the data from a random species, e.g.

MarcusWalz · 2016-06-29T15:20:27Z

I don't think that sample_n's behavior should change for groups because sampling within groups is its intuitive behavior. However it's often handy to be able to sample groups as a whole. This should be a second function. Here is my implementation:

sample_n_groups = function(tbl, size, replace = FALSE, weight=NULL) {
   # regroup when done
   grps = tbl %>% groups %>% unlist %>% as.character
   # check length of groups non-zero
   keep = tbl %>% summarise() %>% sample_n(size, replace, weight)
   # keep only selected groups, regroup because joins change count.
   # regrouping may be unnecessary but joins do something funky to grouping variable
   tbl %>% semi_join(keep) %>% group_by_(grps) 
}

The example by @rcorty works just expected

iris %>% group_by(Species) %>% sample_n_groups(1)

kendonB · 2016-07-04T16:29:25Z

+1

drhagen · 2016-08-30T19:25:18Z

Edit: A change to dplyr broke this solution; scroll down for an updated version.

For those of you who arrived here via search engine looking for this functionality, the implementation by @MarcusWalz does not sample with replacement when replace = TRUE. The implementation needs to use right_join (or left_join or inner_join) to keep the duplicates:

sample_n_groups = function(tbl, size, replace = FALSE, weight=NULL) {
  # regroup when done
  grps = tbl %>% groups %>% unlist %>% as.character
  # check length of groups non-zero
  keep = tbl %>% summarise() %>% sample_n(size, replace, weight)
  # keep only selected groups, regroup because joins change count.
  # regrouping may be unnecessary but joins do something funky to grouping variable
  tbl %>% right_join(keep, by=grps) %>% group_by_(grps) 
}

kendonB · 2016-12-13T00:10:53Z

Cluster bootstrapping is a wide use case for this feature.

kendonB · 2016-12-13T00:40:50Z

@drhagen, in your implementation, do you have any suggestions for how to generate a new unique group id?

kendonB · 2016-12-13T00:52:07Z

Actually, this is quite easy:

sample_n_groups = function(tbl, size, replace = FALSE, weight=NULL) {
  # regroup when done
  grps = tbl %>% groups %>% unlist %>% as.character
  # check length of groups non-zero
  keep = tbl %>% summarise() %>% sample_n(size, replace, weight) %>% 
    mutate(unique_id = 1:NROW(.))
  # keep only selected groups, regroup because joins change count.
  # regrouping may be unnecessary but joins do something funky to grouping variable
  tbl %>% right_join(keep, by=grps) %>% group_by_(grps) 
}

kendonB · 2017-03-06T05:19:03Z

The answer above by @drhagen looks like it's out of date. This seems to work now:

sample_n_groups = function(tbl, size, replace = FALSE, weight = NULL) {
  # regroup when done
  grps = tbl %>% groups %>% lapply(as.character) %>% unlist
  # check length of groups non-zero
  keep = tbl %>% summarise() %>% ungroup() %>% sample_n(size, replace, weight)
  # keep only selected groups, regroup because joins change count.
  # regrouping may be unnecessary but joins do something funky to grouping variable
  tbl %>% right_join(keep, by=grps) %>% group_by_(.dots = grps)
}

hadley added this to the v0.2 milestone Mar 28, 2014

hadley modified the milestones: 0.3, v0.2 Apr 7, 2014

hadley closed this as completed Jul 28, 2014

cperk mentioned this issue Apr 5, 2018

sample_n until there are no more groups #3482

Closed

lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to be able to sample groups #361

Need to be able to sample groups #361

hadley commented Mar 28, 2014

hadley commented Mar 28, 2014

rcorty commented Apr 21, 2015

MarcusWalz commented Jun 29, 2016 •

edited

kendonB commented Jul 4, 2016

drhagen commented Aug 30, 2016 •

edited

kendonB commented Dec 13, 2016

kendonB commented Dec 13, 2016

kendonB commented Dec 13, 2016

kendonB commented Mar 6, 2017

Need to be able to sample groups #361

Need to be able to sample groups #361

Comments

hadley commented Mar 28, 2014

hadley commented Mar 28, 2014

rcorty commented Apr 21, 2015

MarcusWalz commented Jun 29, 2016 • edited

kendonB commented Jul 4, 2016

drhagen commented Aug 30, 2016 • edited

kendonB commented Dec 13, 2016

kendonB commented Dec 13, 2016

kendonB commented Dec 13, 2016

kendonB commented Mar 6, 2017

MarcusWalz commented Jun 29, 2016 •

edited

drhagen commented Aug 30, 2016 •

edited