Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGrouped K Fold #540
Grouped K Fold #540
Comments
|
Here is some preliminary code. You would run this before group_cv <- function(x, k = length(unique(x))) {
dat <- data.frame(index = seq(along = x), group = x)
groups <- data.frame(group = unique(dat$group))
group_folds <- createFolds(groups$group, returnTrain = TRUE, k = k)
group_folds <- lapply(group_folds, function(x, y) y[x,,drop = FALSE], y = groups)
dat_folds <- lapply(group_folds, function(x, y) merge(x, y), y = dat)
lapply(dat_folds, function(x) sort(x$index))
}
groups <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3)
set.seed(242)
folds <- group_cv(groups)
## check:
for_mod <- lapply(folds, function(x, y) y[x], y = groups)
holdout <- lapply(folds, function(x, y) y[-unique(x)], y = groups)
for(i in seq(along = folds)) {
if(any(unique(for_mod[[i]]) %in% unique(holdout[[i]])))
cat("didn't work!")
}
## Test with more groups and smaller # folds
set.seed(91906)
other_groups <- sort(sample(letters[1:15], size = 50, replace = TRUE))
folds_1 <- group_cv(other_groups, k = 5)
for_mod <- lapply(folds_1, function(x, y) y[x], y = other_groups)
holdout <- lapply(folds_1, function(x, y) y[-unique(x)], y = other_groups)
for(i in seq(along = folds_1)) {
if(any(unique(for_mod[[i]]) %in% unique(holdout[[i]])))
cat("didn't work!")
}Please test this since I just wrote it on the fly |
|
That's amazing. Thank you so much. If you ever need anything let me know. I love love your library. Grouped K Fold to me is so important for machine learning. every interesting problem I've ever done and lots on Kaggle require it. Would it be possible to make this an official feature? |
|
After some more testing and documentation I’ll include it as a function but
not one that you could call as an option in `trainControl(method =
“groupCV”)` or similar. The reason for that is that there is no argument
for a grouping variable.
A few people have asked for it so I will add it in.
On December 1, 2016 at 1:28:48 PM, trebuchet90 (notifications@github.com) wrote:
That's amazing. Thank you so much. If you ever need anything let me know. I
love love your library.
Grouped K Fold to me is so important for machine learning. every
interesting problem I've ever done and lots on Kaggle require it. Would it
be possible to make this an official feature?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#540 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFdy48NxwDOKFq5nQ-aTggKOSJxRuDBBks5rDxHggaJpZM4LAwcr>
.
|
|
I think the way you did it as an index was perfect. That's what I was looking for anyway. |
|
Go ahead and leave it open so I can reference commits to it. Thanks |
Is there a GroupKFold feature in caret or R?
http://scikit-learn.org/stable/modules/cross_validation.html