Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upThinking about bootstrap grouping #269
Comments
hadley
commented
Feb 19, 2014
|
|
|
Might be useful to have:
So that internally we can potentially do something different. Although I guess we can use the For Alternatively we could materialise directly, but then I'm not sure what would the grouping mean. We could materialise data for each group adjacently in memory, which can be interesting too.
|
|
I'm not certain about this, but in answering this Stack Overflow question, I believe that I have discovered a bug in this function. Here are the steps to reproduce the bug: library(dplyr)
mboot <- bootstrap(mtcars, 10)
bootstrap(mtcars, 3) %>% do(data.frame(x=1:2))
# Error: index out of boundsAnd here is the proposed fix: bootstrap <- function(df, m) {
n <- nrow(df)
attr(df, "indices") <- replicate(m, sample(n, replace = TRUE),
simplify = FALSE)
attr(df, "drop") <- TRUE
attr(df, "group_sizes") <- rep(n, m)
attr(df, "biggest_group_size") <- n
attr(df, "labels") <- data.frame(replicate = 1:m)
attr(df, "vars") <- list(quote(replicate)) # Change
# attr(df, "vars") <- list(quote(boot)) # list(substitute(bootstrap(m)))
class(df) <- c("grouped_df", "tbl_df", "tbl", "data.frame")
df
}Which fixes the above case: bootstrap(mtcars, 3) %>% do(data.frame(x=1:2))
# Source: local data frame [6 x 2]
# Groups: replicate
# replicate x
# 1 1 1
# 2 1 2
# 3 2 1
# 4 2 2
# 5 3 1
# 6 3 2 |
|
It looks like there's another small bug: since a grouped_df's indices are 0-indexed, not 1-indexed,
should be
Otherwise
and
|
|
Now think this should go in separate partition package |