`collect_notes()` for workflow_set #135

jkylearmstrong · 2023-12-21T01:53:18Z

I noticed that workflowsets do not appear to have a method for collecting notes or errors:

> class(grid_results)
[1] "workflow_set" "tbl_df"       "tbl"          "data.frame"

> collect_notes(grid_results)
Error in `collect_notes()`:
! No `collect_notes()` exists for this type of object.
Run `rlang::last_trace()` to see where the error occurred.

whereas it does have methods for metrics / predictions

It would be handy to have something collect_notes implemented so that you can assess which preprocesses and models are causing errors in a pipeline.

The text was updated successfully, but these errors were encountered:

jkylearmstrong · 2023-12-21T02:12:28Z

I do see that there is a function within collect.R

collect_notes <- function(x, show = 1) {
  y <- purrr::map_dfr(x$.notes, I)
  show <- min(show, nrow(y))
  y <- paste0(y$.notes[1:show])
  gsub("[\r\n]", "", y)
}

however, this might be from an old version of the code.

If

grid_results <-
   all_workflows %>%
   workflow_map(
      seed = 1503,
      resamples = data_folds,
      grid = 25,
      control = grid_ctrl
   )

then

> glimpse(grid_results)
Rows: 65
Columns: 4
$ wflow_id <chr> "raw_data_with_na_XGBoost", "raw_data_NAomit_glmnet_spec",…
$ info     <list> [<tbl_df[1 x 4]>], [<tbl_df[1 x 4]>], [<tbl_df[1 x 4]>], …
$ option   <list> [[<vfold_cv[15 x 3]>], 25, [FALSE, TRUE, NULL, TRUE, NULL…
$ result   <list> [<tune_results[15 x 6]>], [<tune_results[15 x 6]>], [<tun…

however, the collect_notes function defined in collect does not reference the result column of the grid_results nor is collect_notes exported to the package.

jkylearmstrong · 2023-12-21T03:12:55Z

I have made a prototype function that solve this issue in case anyone else would like to test or add into the package:

my_collect_notes <- function(wflow_set_results){
  wflow_set_rsl <- wflow_set_results %>%
    select(wflow_id, result)
  
  distinct_list_of_workflows <- wflow_set_rsl %>%
    select(wflow_id) %>%
    distinct() %>%
    pull(wflow_id)
  
  collect_note_helper <- function(x){
    wflow_set_rsl %>%
          filter(wflow_id == x) %>%
          pull(result) %>%
          pluck(1) %>%
          tune::collect_notes()
  }
  
  list_dfs_notes <- map_dfr(
    set_names(distinct_list_of_workflows),
    collect_note_helper,
    .id = "workflow_id")
  
  list_dfs_notes
}

my_collect_notes(grid_results)

> my_collect_notes(grid_results) %>%
+     glimpse()
Rows: 3,343
Columns: 6
$ workflow_id <chr> "raw_data_with_na_XGBoost", "raw_data_with_na_XGBoost", "raw_data_with_na…
$ id          <chr> "Repeat1", "Repeat1", "Repeat1", "Repeat1", "Repeat1", "Repeat1", "Repeat…
$ id2         <chr> "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "…
$ location    <chr> "internal", "internal", "internal", "internal", "internal", "internal", "…
$ type        <chr> "warning", "warning", "warning", "warning", "warning", "warning", "warnin…
$ note        <chr> "A correlation computation is required, but `estimate` is constant and ha…

simonpcouch · 2023-12-22T13:58:08Z

Thanks for the issue, @jkylearmstrong! I agree with you that workflowsets ought to have this functionality; I'll explore before the next release.

jkylearmstrong · 2023-12-22T18:05:03Z

Thank you @simonpcouch ! I was getting some errors the other night and tracking them down was a little painful to know which error was coming from which process. The verbose == TRUE option also really helps with debugging so it might be worth putting in the documentation

simonpcouch added the feature a feature request or enhancement label Dec 22, 2023

simonpcouch mentioned this issue Jan 9, 2024

add collect_notes() method for workflow sets #138

Merged

simonpcouch closed this as completed in #138 Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`collect_notes()` for workflow_set #135

`collect_notes()` for workflow_set #135

jkylearmstrong commented Dec 21, 2023

jkylearmstrong commented Dec 21, 2023 •

edited

jkylearmstrong commented Dec 21, 2023

simonpcouch commented Dec 22, 2023

jkylearmstrong commented Dec 22, 2023

collect_notes() for workflow_set #135

collect_notes() for workflow_set #135

Comments

jkylearmstrong commented Dec 21, 2023

jkylearmstrong commented Dec 21, 2023 • edited

jkylearmstrong commented Dec 21, 2023

simonpcouch commented Dec 22, 2023

jkylearmstrong commented Dec 22, 2023

`collect_notes()` for workflow_set #135

`collect_notes()` for workflow_set #135

jkylearmstrong commented Dec 21, 2023 •

edited