Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collect_notes() for workflow_set #135

Closed
jkylearmstrong opened this issue Dec 21, 2023 · 4 comments · Fixed by #138
Closed

collect_notes() for workflow_set #135

jkylearmstrong opened this issue Dec 21, 2023 · 4 comments · Fixed by #138
Labels
feature a feature request or enhancement

Comments

@jkylearmstrong
Copy link

I noticed that workflowsets do not appear to have a method for collecting notes or errors:

> class(grid_results)
[1] "workflow_set" "tbl_df"       "tbl"          "data.frame"
> collect_notes(grid_results)
Error in `collect_notes()`:
! No `collect_notes()` exists for this type of object.
Run `rlang::last_trace()` to see where the error occurred.

whereas it does have methods for metrics / predictions

image

It would be handy to have something collect_notes implemented so that you can assess which preprocesses and models are causing errors in a pipeline.

@jkylearmstrong
Copy link
Author

jkylearmstrong commented Dec 21, 2023

I do see that there is a function within collect.R

collect_notes <- function(x, show = 1) {
  y <- purrr::map_dfr(x$.notes, I)
  show <- min(show, nrow(y))
  y <- paste0(y$.notes[1:show])
  gsub("[\r\n]", "", y)
}

however, this might be from an old version of the code.

If

grid_results <-
   all_workflows %>%
   workflow_map(
      seed = 1503,
      resamples = data_folds,
      grid = 25,
      control = grid_ctrl
   )

then

> glimpse(grid_results)
Rows: 65
Columns: 4
$ wflow_id <chr> "raw_data_with_na_XGBoost", "raw_data_NAomit_glmnet_spec",…
$ info     <list> [<tbl_df[1 x 4]>], [<tbl_df[1 x 4]>], [<tbl_df[1 x 4]>], …
$ option   <list> [[<vfold_cv[15 x 3]>], 25, [FALSE, TRUE, NULL, TRUE, NULL…
$ result   <list> [<tune_results[15 x 6]>], [<tune_results[15 x 6]>], [<tun…

however, the collect_notes function defined in collect does not reference the result column of the grid_results nor is collect_notes exported to the package.

@jkylearmstrong
Copy link
Author

I have made a prototype function that solve this issue in case anyone else would like to test or add into the package:

my_collect_notes <- function(wflow_set_results){
  wflow_set_rsl <- wflow_set_results %>%
    select(wflow_id, result)
  
  distinct_list_of_workflows <- wflow_set_rsl %>%
    select(wflow_id) %>%
    distinct() %>%
    pull(wflow_id)
  
  collect_note_helper <- function(x){
    wflow_set_rsl %>%
          filter(wflow_id == x) %>%
          pull(result) %>%
          pluck(1) %>%
          tune::collect_notes()
  }
  
  list_dfs_notes <- map_dfr(
    set_names(distinct_list_of_workflows),
    collect_note_helper,
    .id = "workflow_id")
  
  list_dfs_notes
}

my_collect_notes(grid_results)
> my_collect_notes(grid_results) %>%
+     glimpse()
Rows: 3,343
Columns: 6
$ workflow_id <chr> "raw_data_with_na_XGBoost", "raw_data_with_na_XGBoost", "raw_data_with_na…
$ id          <chr> "Repeat1", "Repeat1", "Repeat1", "Repeat1", "Repeat1", "Repeat1", "Repeat…
$ id2         <chr> "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "…
$ location    <chr> "internal", "internal", "internal", "internal", "internal", "internal", "…
$ type        <chr> "warning", "warning", "warning", "warning", "warning", "warning", "warnin…
$ note        <chr> "A correlation computation is required, but `estimate` is constant and ha…

@simonpcouch simonpcouch added the feature a feature request or enhancement label Dec 22, 2023
@simonpcouch
Copy link
Contributor

Thanks for the issue, @jkylearmstrong! I agree with you that workflowsets ought to have this functionality; I'll explore before the next release.

@jkylearmstrong
Copy link
Author

Thank you @simonpcouch ! I was getting some errors the other night and tracking them down was a little painful to know which error was coming from which process. The verbose == TRUE option also really helps with debugging so it might be worth putting in the documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants