Skip to content

juice() should be able to return a 0 column data frame rather than abort() #298

@DavisVaughan

Description

@DavisVaughan

I think it would be appropriate for juice() to return a 0 column tibble rather than abort when you try and use a selector that doesn't return any columns. This would match the behavior of dplyr::select() and would be useful for me in hardhat.

suppressPackageStartupMessages(library(recipes))

rec <- recipe(~ Sepal.Width, iris) %>%
  prep(iris)

juice(rec, all_predictors())
#> # A tibble: 150 x 1
#>    Sepal.Width
#>          <dbl>
#>  1         3.5
#>  2         3  
#>  3         3.2
#>  4         3.1
#>  5         3.6
#>  6         3.9
#>  7         3.4
#>  8         3.4
#>  9         2.9
#> 10         3.1
#> # … with 140 more rows

# should return tibble with 0 cols and 150 rows
juice(rec, all_outcomes())
#> Error: No variables or terms were selected.

If you look at dplyr::select(), a wrongly spelled column is an error, but a selector that returns 0 cols is fine.

dplyr::select(iris, "x")
#> Error: Unknown column `x`

dplyr::select(iris, dplyr::matches("x"))
#> data frame with 0 columns and 150 rows

juice() would still maintain the ability to error if someone did juice(rec, non_existant_column).

I think we can get this behavior by simply removing the abort from terms_select() here:
https://github.com/tidymodels/recipes/blob/master/R/selections.R#L196

That is a pretty commonly used function though, so maybe we'd want a version that is strict (the current behavior), and a version that isn't as strict (the suggested behavior).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions