Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finalize_estimator fails in metric summarizers when any column is named name #382

Closed
mikemahoney218 opened this issue Mar 2, 2023 · 1 comment · Fixed by #383
Closed
Labels
bug an unexpected problem or unintended behavior

Comments

@mikemahoney218
Copy link
Member

The problem

If any column of a data frame passed to data is named name, finalize_estimator will fail. This seems to be happening because name is being evaluated in the context of data inside of summarise:

yardstick/R/template.R

Lines 119 to 120 in 3099e99

.metric = name,
.estimator = finalize_estimator(.data[[truth]], metric_class = name),

Changing both of those lines to use .env[["name"]] seems to fix this behavior:

Orange2 <- Orange
Orange2$name <- Orange$circumference
library(rlang)
assignInNamespace(
  "numeric_metric_summarizer", 
  function(name,
           fn,
           data,
           truth,
           estimate,
           ...,
           na_rm = TRUE,
           case_weights = NULL,
           fn_options = list(),
           error_call = caller_env()) {
    rlang::check_dots_empty()
    
    truth <- enquo(truth)
    estimate <- enquo(estimate)
    case_weights <- enquo(case_weights)
    
    truth <- yardstick:::yardstick_eval_select(
      expr = truth,
      data = data,
      arg = "truth",
      error_call = error_call
    )
    estimate <- yardstick:::yardstick_eval_select(
      expr = estimate,
      data = data,
      arg = "estimate",
      error_call = error_call
    )
    
    if (!quo_is_null(case_weights)) {
      case_weights <- yardstick:::yardstick_eval_select(
        expr = case_weights,
        data = data,
        arg = "case_weights",
        error_call = error_call
      )
      
      case_weights <- expr(.data[[!!case_weights]])
    }
    
    out <- dplyr::summarise(
      data,
      .metric = .env[["name"]], # change 1
      .estimator = yardstick::finalize_estimator(
        .data[[truth]], 
        metric_class = .env[["name"]]), # change 2
      .estimate = fn(
        truth = .data[[truth]],
        estimate = .data[[estimate]],
        case_weights = !!case_weights,
        na_rm = na_rm,
        !!!fn_options
      )
    )
    
    dplyr::as_tibble(out)
  },
  "yardstick"
)
yardstick::rmse(Orange2, name, circumference)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 rmse    standard           0

Created on 2023-03-02 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 Patched (2022-11-10 r83330)
#>  os       Ubuntu 22.04.2 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-03-02
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  cli           3.6.0      2023-01-09 [1] CRAN (R 4.2.2)
#>  digest        0.6.31     2022-12-11 [1] CRAN (R 4.2.2)
#>  dplyr         1.1.0      2023-01-29 [1] CRAN (R 4.2.2)
#>  evaluate      0.20       2023-01-17 [1] CRAN (R 4.2.2)
#>  fansi         1.0.4      2023-01-22 [1] CRAN (R 4.2.2)
#>  fastmap       1.1.1      2023-02-24 [1] CRAN (R 4.2.2)
#>  fs            1.6.0      2023-01-23 [1] CRAN (R 4.2.2)
#>  generics      0.1.3      2022-07-05 [1] CRAN (R 4.2.2)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.2.2)
#>  htmltools     0.5.4      2022-12-07 [1] CRAN (R 4.2.2)
#>  knitr         1.42       2023-01-25 [1] CRAN (R 4.2.2)
#>  lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.2.2)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.2.2)
#>  pillar        1.8.1      2022-08-19 [1] CRAN (R 4.2.2)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.2.2)
#>  purrr         1.0.1      2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache       0.16.0     2022-07-21 [1] CRAN (R 4.2.2)
#>  R.methodsS3   1.8.2      2022-06-13 [1] CRAN (R 4.2.2)
#>  R.oo          1.25.0     2022-06-12 [1] CRAN (R 4.2.2)
#>  R.utils       2.12.2     2022-11-11 [1] CRAN (R 4.2.2)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2      2022-08-17 [1] CRAN (R 4.2.2)
#>  rlang       * 1.0.6      2022-09-24 [1] CRAN (R 4.2.2)
#>  rmarkdown     2.20       2023-01-19 [1] CRAN (R 4.2.2)
#>  rstudioapi    0.14       2022-08-22 [1] CRAN (R 4.2.2)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.2.2)
#>  styler        1.8.1      2022-11-07 [1] CRAN (R 4.2.2)
#>  tibble        3.1.8      2022-07-22 [1] CRAN (R 4.2.2)
#>  tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.2.2)
#>  utf8          1.2.3      2023-01-31 [1] CRAN (R 4.2.2)
#>  vctrs         0.5.2      2023-01-23 [1] CRAN (R 4.2.2)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.2.2)
#>  xfun          0.36       2022-12-21 [1] CRAN (R 4.2.2)
#>  yaml          2.3.7      2023-01-23 [1] CRAN (R 4.2.2)
#>  yardstick     1.1.0.9000 2023-03-02 [1] Github (tidymodels/yardstick@3099e99)
#> 
#>  [1] /home/mikemahoney218/R/x86_64-pc-linux-gnu-library/4.2
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

I only tested with numeric metrics but I believe this is the same for the other summarizers as well.

Reproducible example

Orange2 <- Orange
Orange2$name <- Orange$circumference
yardstick::rmse(Orange2, circumference, name)
#> Error in `dplyr::summarise()`:
#> ℹ In argument: `.estimator =
#>   finalize_estimator(.data[["circumference"]], metric_class = name)`.
#> Caused by error in `attributes(.Data) <- c(attributes(.Data), attrib)`:
#> ! attempt to set invalid 'class' attribute

#> Backtrace:
#>      ▆
#>   1. ├─yardstick::rmse(Orange2, circumference, name)
#>   2. ├─yardstick:::rmse.data.frame(Orange2, circumference, name)
#>   3. │ └─yardstick::numeric_metric_summarizer(...)
#>   4. │   ├─dplyr::summarise(...)
#>   5. │   └─dplyr:::summarise.data.frame(...)
#>   6. │     └─dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
#>   7. │       ├─base::withCallingHandlers(...)
#>   8. │       └─dplyr:::map(quosures, summarise_eval_one, mask = mask)
#>   9. │         └─base::lapply(.x, .f, ...)
#>  10. │           └─dplyr (local) FUN(X[[i]], ...)
#>  11. │             └─mask$eval_all_summarise(quo)
#>  12. │               └─dplyr (local) eval()
#>  13. ├─yardstick::finalize_estimator(.data[["circumference"]], metric_class = name)
#>  14. │ └─yardstick:::make_dummy(metric_class)
#>  15. │   └─base::structure(list(), class = metric_class)
#>  16. └─base::.handleSimpleError(...)
#>  17.   └─dplyr (local) h(simpleError(msg, call))
#>  18.     └─dplyr (local) handler(cnd)
#>  19.       └─rlang::abort(message, class = error_class, parent = parent, call = error_call)
yardstick::rmse(Orange2, name, circumference)
#> Error in `dplyr::summarise()`:
#> ℹ In argument: `.estimator = finalize_estimator(.data[["name"]],
#>   metric_class = name)`.
#> Caused by error in `attributes(.Data) <- c(attributes(.Data), attrib)`:
#> ! attempt to set invalid 'class' attribute

#> Backtrace:
#>      ▆
#>   1. ├─yardstick::rmse(Orange2, name, circumference)
#>   2. ├─yardstick:::rmse.data.frame(Orange2, name, circumference)
#>   3. │ └─yardstick::numeric_metric_summarizer(...)
#>   4. │   ├─dplyr::summarise(...)
#>   5. │   └─dplyr:::summarise.data.frame(...)
#>   6. │     └─dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
#>   7. │       ├─base::withCallingHandlers(...)
#>   8. │       └─dplyr:::map(quosures, summarise_eval_one, mask = mask)
#>   9. │         └─base::lapply(.x, .f, ...)
#>  10. │           └─dplyr (local) FUN(X[[i]], ...)
#>  11. │             └─mask$eval_all_summarise(quo)
#>  12. │               └─dplyr (local) eval()
#>  13. ├─yardstick::finalize_estimator(.data[["name"]], metric_class = name)
#>  14. │ └─yardstick:::make_dummy(metric_class)
#>  15. │   └─base::structure(list(), class = metric_class)
#>  16. └─base::.handleSimpleError(...)
#>  17.   └─dplyr (local) h(simpleError(msg, call))
#>  18.     └─dplyr (local) handler(cnd)
#>  19.       └─rlang::abort(message, class = error_class, parent = parent, call = error_call)
# "name" doesn't need to be passed to the function:
yardstick::rmse(Orange2, age, circumference)
#> Error in `dplyr::summarise()`:
#> ℹ In argument: `.estimator = finalize_estimator(.data[["age"]],
#>   metric_class = name)`.
#> Caused by error in `attributes(.Data) <- c(attributes(.Data), attrib)`:
#> ! attempt to set invalid 'class' attribute

#> Backtrace:
#>      ▆
#>   1. ├─yardstick::rmse(Orange2, age, circumference)
#>   2. ├─yardstick:::rmse.data.frame(Orange2, age, circumference)
#>   3. │ └─yardstick::numeric_metric_summarizer(...)
#>   4. │   ├─dplyr::summarise(...)
#>   5. │   └─dplyr:::summarise.data.frame(...)
#>   6. │     └─dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
#>   7. │       ├─base::withCallingHandlers(...)
#>   8. │       └─dplyr:::map(quosures, summarise_eval_one, mask = mask)
#>   9. │         └─base::lapply(.x, .f, ...)
#>  10. │           └─dplyr (local) FUN(X[[i]], ...)
#>  11. │             └─mask$eval_all_summarise(quo)
#>  12. │               └─dplyr (local) eval()
#>  13. ├─yardstick::finalize_estimator(.data[["age"]], metric_class = name)
#>  14. │ └─yardstick:::make_dummy(metric_class)
#>  15. │   └─base::structure(list(), class = metric_class)
#>  16. └─base::.handleSimpleError(...)
#>  17.   └─dplyr (local) h(simpleError(msg, call))
#>  18.     └─dplyr (local) handler(cnd)
#>  19.       └─rlang::abort(message, class = error_class, parent = parent, call = error_call)

Created on 2023-03-02 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 Patched (2022-11-10 r83330)
#>  os       Ubuntu 22.04.2 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-03-02
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  cli           3.6.0      2023-01-09 [1] CRAN (R 4.2.2)
#>  digest        0.6.31     2022-12-11 [1] CRAN (R 4.2.2)
#>  dplyr         1.1.0      2023-01-29 [1] CRAN (R 4.2.2)
#>  evaluate      0.20       2023-01-17 [1] CRAN (R 4.2.2)
#>  fansi         1.0.4      2023-01-22 [1] CRAN (R 4.2.2)
#>  fastmap       1.1.1      2023-02-24 [1] CRAN (R 4.2.2)
#>  fs            1.6.0      2023-01-23 [1] CRAN (R 4.2.2)
#>  generics      0.1.3      2022-07-05 [1] CRAN (R 4.2.2)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.2.2)
#>  htmltools     0.5.4      2022-12-07 [1] CRAN (R 4.2.2)
#>  knitr         1.42       2023-01-25 [1] CRAN (R 4.2.2)
#>  lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.2.2)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.2.2)
#>  pillar        1.8.1      2022-08-19 [1] CRAN (R 4.2.2)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.2.2)
#>  purrr         1.0.1      2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache       0.16.0     2022-07-21 [1] CRAN (R 4.2.2)
#>  R.methodsS3   1.8.2      2022-06-13 [1] CRAN (R 4.2.2)
#>  R.oo          1.25.0     2022-06-12 [1] CRAN (R 4.2.2)
#>  R.utils       2.12.2     2022-11-11 [1] CRAN (R 4.2.2)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2      2022-08-17 [1] CRAN (R 4.2.2)
#>  rlang         1.0.6      2022-09-24 [1] CRAN (R 4.2.2)
#>  rmarkdown     2.20       2023-01-19 [1] CRAN (R 4.2.2)
#>  rstudioapi    0.14       2022-08-22 [1] CRAN (R 4.2.2)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.2.2)
#>  styler        1.8.1      2022-11-07 [1] CRAN (R 4.2.2)
#>  tibble        3.1.8      2022-07-22 [1] CRAN (R 4.2.2)
#>  tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.2.2)
#>  utf8          1.2.3      2023-01-31 [1] CRAN (R 4.2.2)
#>  vctrs         0.5.2      2023-01-23 [1] CRAN (R 4.2.2)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.2.2)
#>  xfun          0.36       2022-12-21 [1] CRAN (R 4.2.2)
#>  yaml          2.3.7      2023-01-23 [1] CRAN (R 4.2.2)
#>  yardstick     1.1.0.9000 2023-03-02 [1] Github (tidymodels/yardstick@3099e99)
#> 
#>  [1] /home/mikemahoney218/R/x86_64-pc-linux-gnu-library/4.2
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@EmilHvitfeldt EmilHvitfeldt added the bug an unexpected problem or unintended behavior label Mar 3, 2023
mikemahoney218 added a commit that referenced this issue Mar 7, 2023
EmilHvitfeldt added a commit that referenced this issue Mar 13, 2023
@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants