Unexpectedly different behavior for factors/dummy variables between parsnip and workflows

When training what *seems* like the same model (same model specification, same formula, same data) using parsnip vs. using [workflows](https://github.com/tidymodels/workflows), it is surprising to see different results. I found this behavior quite unexpected, especially what workflows did.

Some options to reduce user surprise 😮 would be more clarity in the functions either in parsnip, in workflows, or both.

``` r
lm(Sepal.Length ~ ., iris)
#> 
#> Call:
#> lm(formula = Sepal.Length ~ ., data = iris)
#> 
#> Coefficients:
#>       (Intercept)        Sepal.Width       Petal.Length        Petal.Width  
#>            2.1713             0.4959             0.8292            -0.3152  
#> Speciesversicolor   Speciesvirginica  
#>           -0.7236            -1.0235

library(parsnip)
lm_spec <- linear_reg() %>%
  set_engine(engine = "lm") 

## parsnip version looks the same as lm
lm_spec %>%
  fit(Sepal.Length ~ ., data = iris)
#> parsnip model object
#> 
#> Fit time:  2ms 
#> 
#> Call:
#> stats::lm(formula = formula, data = data)
#> 
#> Coefficients:
#>       (Intercept)        Sepal.Width       Petal.Length        Petal.Width  
#>            2.1713             0.4959             0.8292            -0.3152  
#> Speciesversicolor   Speciesvirginica  
#>           -0.7236            -1.0235

## workflows version has made a different choice about dummy variables
library(workflows)
workflow() %>%
  add_model(lm_spec) %>%
  add_formula(Sepal.Length ~ .) %>%
  fit(data = iris)
#> ══ Workflow [trained] ═══════════════════════════════════════════════════════════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: linear_reg()
#> 
#> ── Preprocessor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> Sepal.Length ~ .
#> 
#> ── Model ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> 
#> Call:
#> stats::lm(formula = formula, data = data)
#> 
#> Coefficients:
#>       (Intercept)        Sepal.Width       Petal.Length        Petal.Width  
#>            1.1478             0.4959             0.8292            -0.3152  
#>     Speciessetosa  Speciesversicolor   Speciesvirginica  
#>            1.0235             0.2999                 NA
```

<sup>Created on 2020-02-06 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpectedly different behavior for factors/dummy variables between parsnip and workflows #326

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpectedly different behavior for factors/dummy variables between parsnip and workflows #326

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions