Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join warnings generated during model fitting #526

Closed
mattwarkentin opened this issue Jul 14, 2022 · 5 comments
Closed

Join warnings generated during model fitting #526

mattwarkentin opened this issue Jul 14, 2022 · 5 comments
Labels
upkeep maintenance, infrastructure, and similar

Comments

@mattwarkentin
Copy link
Contributor

mattwarkentin commented Jul 14, 2022

Hi,

I am not actually sure if this belong in {tune} or not...but anyway...a new warning is popping up when fitting models.

I think the issues happens due to the newer versions of the *_join() functions from {dplyr} which now have the argument multiple which handles what happens when there are 1-to-many matches. If there are multiple matches it now emits a warning, by default. Not sure why the warning is being emitted in the example below, but it is.

library(tidymodels)
workflow(
  preprocessor = mpg ~ .,
  spec = linear_reg(engine = 'glmnet', penalty = tune(), mixture = 1)
) %>% 
  tune_grid(resamples = vfold_cv(mtcars))
#> ! Fold01: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold02: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold03: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold04: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold05: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold06: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold07: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold08: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold09: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold10: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> # Tuning results
#> # 10-fold cross-validation 
#> # A tibble: 10 × 4
#>    splits         id     .metrics          .notes          
#>    <list>         <chr>  <list>            <list>          
#>  1 <split [28/4]> Fold01 <tibble [20 × 5]> <tibble [1 × 3]>
#>  2 <split [28/4]> Fold02 <tibble [20 × 5]> <tibble [1 × 3]>
#>  3 <split [29/3]> Fold03 <tibble [20 × 5]> <tibble [1 × 3]>
#>  4 <split [29/3]> Fold04 <tibble [20 × 5]> <tibble [1 × 3]>
#>  5 <split [29/3]> Fold05 <tibble [20 × 5]> <tibble [1 × 3]>
#>  6 <split [29/3]> Fold06 <tibble [20 × 5]> <tibble [1 × 3]>
#>  7 <split [29/3]> Fold07 <tibble [20 × 5]> <tibble [1 × 3]>
#>  8 <split [29/3]> Fold08 <tibble [20 × 5]> <tibble [1 × 3]>
#>  9 <split [29/3]> Fold09 <tibble [20 × 5]> <tibble [1 × 3]>
#> 10 <split [29/3]> Fold10 <tibble [20 × 5]> <tibble [1 × 3]>
#> 
#> There were issues with some computations:
#> 
#>   - Warning(s) x10: Each row in `x` should match at most 1 row in `y`.
#> 
#> Run `show_notes(.Last.tune.result)` for more information.
library(dplyr)

x <- tibble(id = 1)
y <- tibble(id = c(1, 1))

left_join(x, y)
#> Joining, by = "id"
#> Warning: Each row in `x` should match at most 1 row in `y`.
#> ℹ Row 1 of `x` matches multiple rows.
#> ℹ If multiple matches are expected, specify `multiple = "all"` in the join call
#>   to silence this warning.
#> # A tibble: 2 × 1
#>      id
#>   <dbl>
#> 1     1
#> 2     1

left_join(x, y, multiple = 'all')
#> Joining, by = "id"
#> # A tibble: 2 × 1
#>      id
#>   <dbl>
#> 1     1
#> 2     1
@EmilHvitfeldt
Copy link
Member

Hello @mattwarkentin 👋 Thanks for the small reprex!

What version of {tune} do you have installed? I'm not able to reproduce your error using the most recent CRAN version of {tune}

library(tidymodels)
workflow(
  preprocessor = mpg ~ .,
  spec = linear_reg(engine = 'glmnet', penalty = tune(), mixture = 1)
) %>% 
  tune_grid(resamples = vfold_cv(mtcars))
#> # Tuning results
#> # 10-fold cross-validation 
#> # A tibble: 10 × 4
#>    splits         id     .metrics          .notes          
#>    <list>         <chr>  <list>            <list>          
#>  1 <split [28/4]> Fold01 <tibble [20 × 5]> <tibble [0 × 3]>
#>  2 <split [28/4]> Fold02 <tibble [20 × 5]> <tibble [0 × 3]>
#>  3 <split [29/3]> Fold03 <tibble [20 × 5]> <tibble [0 × 3]>
#>  4 <split [29/3]> Fold04 <tibble [20 × 5]> <tibble [0 × 3]>
#>  5 <split [29/3]> Fold05 <tibble [20 × 5]> <tibble [0 × 3]>
#>  6 <split [29/3]> Fold06 <tibble [20 × 5]> <tibble [0 × 3]>
#>  7 <split [29/3]> Fold07 <tibble [20 × 5]> <tibble [0 × 3]>
#>  8 <split [29/3]> Fold08 <tibble [20 × 5]> <tibble [0 × 3]>
#>  9 <split [29/3]> Fold09 <tibble [20 × 5]> <tibble [0 × 3]>
#> 10 <split [29/3]> Fold10 <tibble [20 × 5]> <tibble [0 × 3]>

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       macOS Monterey 12.2.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Los_Angeles
#>  date     2022-07-14
#>  pandoc   2.17.1.1 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date (UTC) lib source
#>  assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  backports      1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  broom        * 1.0.0      2022-07-01 [1] CRAN (R 4.2.0)
#>  class          7.3-20     2022-01-16 [1] CRAN (R 4.2.1)
#>  cli            3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  codetools      0.2-18     2020-11-04 [1] CRAN (R 4.2.1)
#>  colorspace     2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon         1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
#>  DBI            1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dials        * 1.0.0      2022-06-14 [1] CRAN (R 4.2.0)
#>  DiceDesign     1.9        2021-02-13 [1] CRAN (R 4.2.0)
#>  digest         0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  dplyr        * 1.0.9      2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate       0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi          1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap        1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  foreach        1.5.2      2022-02-02 [1] CRAN (R 4.2.0)
#>  fs             1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  furrr          0.3.0      2022-05-04 [1] CRAN (R 4.2.0)
#>  future         1.26.1     2022-05-27 [1] CRAN (R 4.2.0)
#>  future.apply   1.9.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  generics       0.1.3      2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2      * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
#>  glmnet       * 4.1-4      2022-04-15 [1] CRAN (R 4.2.0)
#>  globals        0.15.1     2022-06-24 [1] CRAN (R 4.2.0)
#>  glue           1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  gower          1.0.0      2022-02-03 [1] CRAN (R 4.2.0)
#>  GPfit          1.0-8      2019-02-08 [1] CRAN (R 4.2.0)
#>  gtable         0.3.0      2019-03-25 [1] CRAN (R 4.2.0)
#>  hardhat        1.2.0      2022-06-30 [1] CRAN (R 4.2.1)
#>  highr          0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools      0.5.2      2021-08-25 [1] CRAN (R 4.2.0)
#>  infer        * 1.0.2      2022-06-10 [1] CRAN (R 4.2.0)
#>  ipred          0.9-13     2022-06-02 [1] CRAN (R 4.2.0)
#>  iterators      1.0.14     2022-02-05 [1] CRAN (R 4.2.0)
#>  knitr          1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lattice        0.20-45    2021-09-22 [1] CRAN (R 4.2.1)
#>  lava           1.6.10     2021-09-02 [1] CRAN (R 4.2.0)
#>  lhs            1.1.5      2022-03-22 [1] CRAN (R 4.2.0)
#>  lifecycle      1.0.1      2021-09-24 [1] CRAN (R 4.2.0)
#>  listenv        0.8.0      2019-12-05 [1] CRAN (R 4.2.0)
#>  lubridate      1.8.0      2021-10-07 [1] CRAN (R 4.2.0)
#>  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  MASS           7.3-57     2022-04-22 [1] CRAN (R 4.2.0)
#>  Matrix       * 1.4-1      2022-03-23 [1] CRAN (R 4.2.1)
#>  modeldata    * 1.0.0      2022-07-01 [1] CRAN (R 4.2.0)
#>  munsell        0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  nnet           7.3-17     2022-01-16 [1] CRAN (R 4.2.1)
#>  parallelly     1.32.0     2022-06-07 [1] CRAN (R 4.2.0)
#>  parsnip      * 1.0.0      2022-06-16 [1] CRAN (R 4.2.0)
#>  pillar         1.7.0      2022-02-01 [1] CRAN (R 4.2.0)
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  prodlim        2019.11.13 2019-11-17 [1] CRAN (R 4.2.0)
#>  purrr        * 0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
#>  R.cache        0.15.0     2021-04-30 [1] CRAN (R 4.2.0)
#>  R.methodsS3    1.8.2      2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo           1.25.0     2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils        2.12.0     2022-06-28 [1] CRAN (R 4.2.0)
#>  R6             2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp           1.0.8.3    2022-03-17 [1] CRAN (R 4.2.0)
#>  recipes      * 1.0.1      2022-07-07 [1] CRAN (R 4.2.1)
#>  reprex         2.0.1      2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang          1.0.4      2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown      2.14       2022-04-25 [1] CRAN (R 4.2.0)
#>  rpart          4.1.16     2022-01-24 [1] CRAN (R 4.2.1)
#>  rsample      * 1.0.0      2022-06-24 [1] CRAN (R 4.2.0)
#>  rstudioapi     0.13       2020-11-12 [1] CRAN (R 4.2.0)
#>  scales       * 1.2.0      2022-04-13 [1] CRAN (R 4.2.0)
#>  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  shape          1.4.6      2021-05-19 [1] CRAN (R 4.2.0)
#>  stringi        1.7.8      2022-07-11 [1] CRAN (R 4.2.1)
#>  stringr        1.4.0      2019-02-10 [1] CRAN (R 4.2.0)
#>  styler         1.7.0      2022-03-13 [1] CRAN (R 4.2.0)
#>  survival       3.3-1      2022-03-03 [1] CRAN (R 4.2.1)
#>  tibble       * 3.1.7      2022-05-03 [1] CRAN (R 4.2.0)
#>  tidymodels   * 1.0.0      2022-07-13 [1] CRAN (R 4.2.1)
#>  tidyr        * 1.2.0      2022-02-01 [1] CRAN (R 4.2.0)
#>  tidyselect     1.1.2      2022-02-21 [1] CRAN (R 4.2.0)
#>  timeDate       3043.102   2018-02-21 [1] CRAN (R 4.2.0)
#>  tune         * 1.0.0      2022-07-07 [1] CRAN (R 4.2.1)
#>  utf8           1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs          0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
#>  withr          2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  workflows    * 1.0.0      2022-07-05 [1] CRAN (R 4.2.0)
#>  workflowsets * 1.0.0      2022-07-12 [1] CRAN (R 4.2.1)
#>  xfun           0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml           2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#>  yardstick    * 1.0.0      2022-06-06 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2022-07-14 by the reprex package (v2.0.1)

@simonpcouch
Copy link
Contributor

Seconding Emil, thanks for the small reprex. :)

Just dropping a note that I'm able to reproduce with dev dplyr:

library(tidymodels)

workflow(
  preprocessor = mpg ~ .,
  spec = linear_reg(engine = 'glmnet', penalty = tune(), mixture = 1)
) %>% 
  tune_grid(resamples = vfold_cv(mtcars))
#> ! Fold01: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold02: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold03: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold04: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold05: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold06: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold07: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold08: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold09: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> ! Fold10: preprocessor 1/1, model 1/1 (predictions): Each row in `x` should match at most 1 row in `y`.
#> # Tuning results
#> # 10-fold cross-validation 
#> # A tibble: 10 × 4
#>    splits         id     .metrics          .notes          
#>    <list>         <chr>  <list>            <list>          
#>  1 <split [28/4]> Fold01 <tibble [20 × 5]> <tibble [1 × 3]>
#>  2 <split [28/4]> Fold02 <tibble [20 × 5]> <tibble [1 × 3]>
#>  3 <split [29/3]> Fold03 <tibble [20 × 5]> <tibble [1 × 3]>
#>  4 <split [29/3]> Fold04 <tibble [20 × 5]> <tibble [1 × 3]>
#>  5 <split [29/3]> Fold05 <tibble [20 × 5]> <tibble [1 × 3]>
#>  6 <split [29/3]> Fold06 <tibble [20 × 5]> <tibble [1 × 3]>
#>  7 <split [29/3]> Fold07 <tibble [20 × 5]> <tibble [1 × 3]>
#>  8 <split [29/3]> Fold08 <tibble [20 × 5]> <tibble [1 × 3]>
#>  9 <split [29/3]> Fold09 <tibble [20 × 5]> <tibble [1 × 3]>
#> 10 <split [29/3]> Fold10 <tibble [20 × 5]> <tibble [1 × 3]>
#> 
#> There were issues with some computations:
#> 
#>   - Warning(s) x10: Each row in `x` should match at most 1 row in `y`.
#> 
#> Run `show_notes(.Last.tune.result)` for more information.

packageVersion("dplyr")
#> [1] '1.0.99.9000'

Created on 2022-07-15 by the reprex package (v2.0.1)

From their PR 5910 and 6269.

@EmilHvitfeldt EmilHvitfeldt added the upkeep maintenance, infrastructure, and similar label Jul 15, 2022
@simonpcouch
Copy link
Contributor

A partial fix is at tidymodels/parsnip#772! Unfortunately, this doesn't actually arise from the tune grid paths, so this bug might live in many places. Comes from multi_predict methods when predicting across submodels.🫠

Thanks for pointing this out, @mattwarkentin.

@simonpcouch
Copy link
Contributor

simonpcouch commented Jul 22, 2022

Some notes at https://gist.github.com/simonpcouch/b47e618fa6ebac6ed4995765169a87bb. This round of join errors came up in multi_predict methods rather than tuning code paths. Needed PRs have been filed in parsnip, censored, and poissonreg, so I'll go ahead and close here. :)

Related issue at #528.

@github-actions
Copy link

github-actions bot commented Aug 6, 2022

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
upkeep maintenance, infrastructure, and similar
Projects
None yet
Development

No branches or pull requests

3 participants