-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpectedly different behavior for factors/dummy variables between parsnip and workflows #326
Comments
This comment has been minimized.
This comment has been minimized.
Yep, we are aware of where the difference arises. Current thoughts on resolving the unexpected differences are in #290. |
We addressed the issue with indicator/dummy variables in #319 and tidymodels/workflows#51. Next, we need to address the difference in one-hot encoding vs. indicator/dummy variables (i.e. the intercept handling). |
This has been closed in #332 and tidymodels/workflows#53. library(parsnip)
lm_spec <- linear_reg() %>%
set_engine(engine = "lm")
lm_spec %>%
fit(Sepal.Length ~ ., data = iris)
#> parsnip model object
#>
#> Fit time: 5ms
#>
#> Call:
#> stats::lm(formula = Sepal.Length ~ ., data = data)
#>
#> Coefficients:
#> (Intercept) Sepal.Width Petal.Length Petal.Width
#> 2.1713 0.4959 0.8292 -0.3152
#> Speciesversicolor Speciesvirginica
#> -0.7236 -1.0235
library(workflows)
workflow() %>%
add_model(lm_spec) %>%
add_formula(Sepal.Length ~ .) %>%
fit(data = iris)
#> ══ Workflow [trained] ═══════════════════════════════════════
#> Preprocessor: Formula
#> Model: linear_reg()
#>
#> ── Preprocessor ─────────────────────────────────────────────
#> Sepal.Length ~ .
#>
#> ── Model ────────────────────────────────────────────────────
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#> (Intercept) Sepal.Width Petal.Length Petal.Width
#> 2.1713 0.4959 0.8292 -0.3152
#> Speciesversicolor Speciesvirginica
#> -0.7236 -1.0235 Created on 2020-07-02 by the reprex package (v0.3.0.9001) |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
When training what seems like the same model (same model specification, same formula, same data) using parsnip vs. using workflows, it is surprising to see different results. I found this behavior quite unexpected, especially what workflows did.
Some options to reduce user surprise 😮 would be more clarity in the functions either in parsnip, in workflows, or both.
Created on 2020-02-06 by the reprex package (v0.3.0)
The text was updated successfully, but these errors were encountered: