-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A more pipeable fit() interface #33
Comments
I think I would call |
That's funny, the two alternatives that Hadley and I discussed were: fit(model, data, formula)
fit_xy(model, x, y)
# and
fit(model, data, recipe/formula)
fit(model, data_xy(x, y)) I think that I'm leaning towards adding One issue is that In any case, I'll probably make two branches with each and see what is best (or least worse). |
In the case of An alternative design would be to require specification of input and output columns in the model object, and require the transformations needed to be encoded in the data processing, so the user would call This is related to #19 -- I think this design decision needs to be clearly justified and articulated. |
> library(tidymodels)
> library(parsnip)
>
> rf_fit <- rand_forest(mode = "regression") %>%
+ fit(engine = "randomForest", mpg ~ ., data = mtcars)
> names(rf_fit)
[1] "lvl" "spec" "fit" "preproc"
> names(rf_fit$preproc)
[1] "terms" "xlevels" "offset_expr" "options"
I agree that it is neither of these things. We generally don't tune the formula per se.
maybe we could call that a recipe :-)
|
Re) An alternative design would be to require specification of input and output columns in the I had a thought while looking at this. Are recipes going to be support as an input to fit(model_spec, log(Sales_Price) ~ Latitude + Longitude, data = ames)
recip <- recipe(Sales_Price ~ Latitude + Longitude, data = ames) %>%
step_log(Sales_Price, base = 10)
fit(model_spec, recip) |
Originally, the three interfaces for I decoupled them because, if you were doing model tuning, you would want to train the recipe once and then apply the results to many models (or submodels when tuning) without remaking the recipe. So, once you have a trained recipe, the recip <- recipe(Sales_Price ~ Latitude + Longitude, data = ames) %>%
step_log(Sales_Price, base = 10) %>%
prep(training = ames, retain = TRUE)
rf_fit <-
rf_model_spec %>%
fit(x = juice(recip, all_predictors()), y = juice(recip, all_outcomes())
glm_fit <-
glm_model_spec %>%
fit(x = juice(recip, all_predictors()), y = juice(recip, all_outcomes())
# etc. That was the reason for:
You can still get to a fit object from a recipe (vas above), but that will be made really easy using a more general data structure (=pipeline or workflow or protocol or whatever we call it). It is in my nature to try to have a single call to do many things (a la # something like
my_analysis <- workflow() %>%
add_recipe(recip) %>%
add_model(rf_model_spec) %>%
# pipe in other elements
fit(training = training_set)
|
😲 very cool! Definitely a balance of wanting to make that I'm like 80% convinced that |
Yeah. That'll change for the next version. I'm tried of typing it. |
There is a |
+1 for this one. |
I am a fan of |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
I gave a little thought to the
fit()
interface problem and this is what I came up with. I don't really like theinterface
arg name but thats just a naming thing.The text was updated successfully, but these errors were encountered: