`Lrnr_cv` predictions broken #404

rachaelvp · 2023-01-02T23:06:29Z

Learner's wrapped in Lrnr_cv and then trained do not return valid predictions under predict method. It does not matter what full_fit is set to.

The predict_fold method paritally works. It works when fold_number is set to a valid number (wrt total number of folds) and when it is set to "full" (assuming Lrnr_cv was specified with full_fit=TRUE). It does not work work when fold_number is set to "validation".

Here is an example:

data("mtcars")
mtcars_task <- make_sl3_Task(
  data = mtcars[1:10,], outcome = "mpg", 
  covariates = c( "cyl", "disp", "hp", "drat", "wt"), folds = 3
)
mtcars_task2 <- make_sl3_Task(
  data = mtcars[11:30,], outcome = "mpg", 
  covariates = c( "cyl", "disp", "hp", "drat", "wt")
)

lrnr_cv_glm <- Lrnr_cv$new(Lrnr_glm$new(), full_fit = TRUE)
cv_glm_fit <- lrnr_cv_glm$train(mtcars_task)

cv_glm_fit$predict(mtcars_task2)
# predictions1 predictions2 predictions3 predictions4 predictions5 predictions6 
# 13.35737     16.83348     24.41874     23.01480     22.83049     14.21186 

cv_glm_fit$predict_fold(mtcars_task2, 1)
# [1] 20.41417 17.08732 17.08732 17.08732 16.57757 16.05975 15.25448 23.92517 24.41874 23.92261
# [11] 22.67881 18.67844 18.56438 14.17586 17.71928 23.92614 22.83049 21.83903 13.20562 17.96716

cv_glm_fit$predict_fold(mtcars_task2, "validation")
# predictions1 predictions2 predictions3 predictions4 predictions5 predictions6 
# 19.16572     17.08732     14.59736     17.71928     25.60515     22.84417 

 cv_glm_fit$predict_fold(mtcars_task2, "full")
# [1] 19.22617 13.17804 13.81506 13.72138 17.68930 16.43985 15.62778 24.44795 29.84917 25.42954 22.21242 16.84889
# [13] 17.97999 15.40558 19.64627 24.95815 26.24525 22.08010 17.46842 14.75411

The text was updated successfully, but these errors were encountered:

jeremyrcoyle · 2023-01-03T12:22:06Z

Not sure what behavior you're expecting here, training and prediction have both different numbers of folds and observations. Validation should stack validation preds but it can only do it for the three folds that got trained, not the ten folds in the second task.

…

On Mon, Jan 2, 2023, 6:06 PM Rachael V. Phillips ***@***.***> wrote: Learner's wrapped in Lrnr_cv and then trained do not return valid predictions under predict method. It does not matter what full_fit is set to. The predict_fold method appears to work when a fold_number is set to fold number, but not when fold_number = "validation". Here is an example: data("mtcars") mtcars_task <- make_sl3_Task( data = mtcars[1:10,], outcome = "mpg", covariates = c( "cyl", "disp", "hp", "drat", "wt"), folds = 3 ) mtcars_task2 <- make_sl3_Task( data = mtcars[11:30,], outcome = "mpg", covariates = c( "cyl", "disp", "hp", "drat", "wt") ) lrnr_cv_glm <- Lrnr_cv$new(Lrnr_glm$new()) cv_glm_fit <- lrnr_cv_glm$train(mtcars_task) cv_glm_fit$predict(mtcars_task2) # predictions1 predictions2 predictions3 predictions4 predictions5 predictions6 # 13.35737 16.83348 24.41874 23.01480 22.83049 14.21186 cv_glm_fit$predict_fold(mtcars_task2, 1) # [1] 20.41417 17.08732 17.08732 17.08732 16.57757 16.05975 15.25448 23.92517 24.41874 23.92261 # [11] 22.67881 18.67844 18.56438 14.17586 17.71928 23.92614 22.83049 21.83903 13.20562 17.96716 cv_glm_fit$predict_fold(mtcars_task2, "validation") # predictions1 predictions2 predictions3 predictions4 predictions5 predictions6 # 19.16572 17.08732 14.59736 17.71928 25.60515 22.84417 — Reply to this email directly, view it on GitHub <#404>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA5WQJVMFLGIXGGZCH65HDTWQNNQBANCNFSM6AAAAAATPE7VOA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Larsvanderlaan · 2023-08-12T15:34:03Z

Is the issue here not that the prediction and training tasks have different fold objects? By default, Lrnr_cv predicts cv-preds based on the folds of the prediction task. So for predicting out of sample if needs a compatible task. The following would work I think.

mtcars_task2 <- make_sl3_Task(
  data = mtcars[11:30,], outcome = "mpg", 
  covariates = c( "cyl", "disp", "hp", "drat", "wt"),
folds = 3
)

rachaelvp changed the title ~~Lrnr_cv predictions broken, except if specific fold number provided in predict_fold~~ Lrnr_cv predictions broken, except with predict_fold where fold_number is "full" or a number Jan 2, 2023

rachaelvp changed the title ~~Lrnr_cv predictions broken, except with predict_fold where fold_number is "full" or a number~~ Lrnr_cv predictions broken Jan 2, 2023

nhejazi added the bug label Jul 19, 2023

rachaelvp mentioned this issue Aug 12, 2023

Bug fix Lrnr_cv.R #422

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Lrnr_cv` predictions broken #404

`Lrnr_cv` predictions broken #404

rachaelvp commented Jan 2, 2023 •

edited

Loading

jeremyrcoyle commented Jan 3, 2023 via email

Larsvanderlaan commented Aug 12, 2023

Lrnr_cv predictions broken #404

Lrnr_cv predictions broken #404

Comments

rachaelvp commented Jan 2, 2023 • edited Loading

jeremyrcoyle commented Jan 3, 2023 via email

Larsvanderlaan commented Aug 12, 2023

`Lrnr_cv` predictions broken #404

`Lrnr_cv` predictions broken #404

rachaelvp commented Jan 2, 2023 •

edited

Loading