Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lrnr_cv predictions broken #404

Open
rachaelvp opened this issue Jan 2, 2023 · 2 comments
Open

Lrnr_cv predictions broken #404

rachaelvp opened this issue Jan 2, 2023 · 2 comments
Labels

Comments

@rachaelvp
Copy link
Member

rachaelvp commented Jan 2, 2023

Learner's wrapped in Lrnr_cv and then trained do not return valid predictions under predict method. It does not matter what full_fit is set to.

The predict_fold method paritally works. It works when fold_number is set to a valid number (wrt total number of folds) and when it is set to "full" (assuming Lrnr_cv was specified with full_fit=TRUE). It does not work work when fold_number is set to "validation".

Here is an example:

data("mtcars")
mtcars_task <- make_sl3_Task(
  data = mtcars[1:10,], outcome = "mpg", 
  covariates = c( "cyl", "disp", "hp", "drat", "wt"), folds = 3
)
mtcars_task2 <- make_sl3_Task(
  data = mtcars[11:30,], outcome = "mpg", 
  covariates = c( "cyl", "disp", "hp", "drat", "wt")
)

lrnr_cv_glm <- Lrnr_cv$new(Lrnr_glm$new(), full_fit = TRUE)
cv_glm_fit <- lrnr_cv_glm$train(mtcars_task)

cv_glm_fit$predict(mtcars_task2)
# predictions1 predictions2 predictions3 predictions4 predictions5 predictions6 
# 13.35737     16.83348     24.41874     23.01480     22.83049     14.21186 

cv_glm_fit$predict_fold(mtcars_task2, 1)
# [1] 20.41417 17.08732 17.08732 17.08732 16.57757 16.05975 15.25448 23.92517 24.41874 23.92261
# [11] 22.67881 18.67844 18.56438 14.17586 17.71928 23.92614 22.83049 21.83903 13.20562 17.96716

cv_glm_fit$predict_fold(mtcars_task2, "validation")
# predictions1 predictions2 predictions3 predictions4 predictions5 predictions6 
# 19.16572     17.08732     14.59736     17.71928     25.60515     22.84417 

 cv_glm_fit$predict_fold(mtcars_task2, "full")
# [1] 19.22617 13.17804 13.81506 13.72138 17.68930 16.43985 15.62778 24.44795 29.84917 25.42954 22.21242 16.84889
# [13] 17.97999 15.40558 19.64627 24.95815 26.24525 22.08010 17.46842 14.75411
@rachaelvp rachaelvp changed the title Lrnr_cv predictions broken, except if specific fold number provided in predict_fold Lrnr_cv predictions broken, except with predict_fold where fold_number is "full" or a number Jan 2, 2023
@rachaelvp rachaelvp changed the title Lrnr_cv predictions broken, except with predict_fold where fold_number is "full" or a number Lrnr_cv predictions broken Jan 2, 2023
@jeremyrcoyle
Copy link
Collaborator

jeremyrcoyle commented Jan 3, 2023 via email

@Larsvanderlaan
Copy link
Contributor

Is the issue here not that the prediction and training tasks have different fold objects? By default, Lrnr_cv predicts cv-preds based on the folds of the prediction task. So for predicting out of sample if needs a compatible task. The following would work I think.

mtcars_task2 <- make_sl3_Task(
  data = mtcars[11:30,], outcome = "mpg", 
  covariates = c( "cyl", "disp", "hp", "drat", "wt"),
folds = 3
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants