Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upA more pipeable fit() interface #33
Comments
|
I think I would call |
|
That's funny, the two alternatives that Hadley and I discussed were: fit(model, data, formula)
fit_xy(model, x, y)
# and
fit(model, data, recipe/formula)
fit(model, data_xy(x, y))I think that I'm leaning towards adding One issue is that In any case, I'll probably make two branches with each and see what is best (or least worse). |
|
In the case of An alternative design would be to require specification of input and output columns in the model object, and require the transformations needed to be encoded in the data processing, so the user would call This is related to #19 -- I think this design decision needs to be clearly justified and articulated. |
> library(tidymodels)
> library(parsnip)
>
> rf_fit <- rand_forest(mode = "regression") %>%
+ fit(engine = "randomForest", mpg ~ ., data = mtcars)
> names(rf_fit)
[1] "lvl" "spec" "fit" "preproc"
> names(rf_fit$preproc)
[1] "terms" "xlevels" "offset_expr" "options"
I agree that it is neither of these things. We generally don't tune the formula per se.
maybe we could call that a recipe :-)
|
|
Re) An alternative design would be to require specification of input and output columns in the I had a thought while looking at this. Are recipes going to be support as an input to fit(model_spec, log(Sales_Price) ~ Latitude + Longitude, data = ames)
recip <- recipe(Sales_Price ~ Latitude + Longitude, data = ames) %>%
step_log(Sales_Price, base = 10)
fit(model_spec, recip) |
|
Originally, the three interfaces for I decoupled them because, if you were doing model tuning, you would want to train the recipe once and then apply the results to many models (or submodels when tuning) without remaking the recipe. So, once you have a trained recipe, the recip <- recipe(Sales_Price ~ Latitude + Longitude, data = ames) %>%
step_log(Sales_Price, base = 10) %>%
prep(training = ames, retain = TRUE)
rf_fit <-
rf_model_spec %>%
fit(x = juice(recip, all_predictors()), y = juice(recip, all_outcomes())
glm_fit <-
glm_model_spec %>%
fit(x = juice(recip, all_predictors()), y = juice(recip, all_outcomes())
# etc. That was the reason for:
You can still get to a fit object from a recipe (vas above), but that will be made really easy using a more general data structure (=pipeline or workflow or protocol or whatever we call it). It is in my nature to try to have a single call to do many things (a la # something like
my_analysis <- workflow() %>%
add_recipe(recip) %>%
add_model(rf_model_spec) %>%
# pipe in other elements
fit(training = training_set)
|
|
I'm like 80% convinced that |
Yeah. That'll change for the next version. I'm tried of typing it. |
|
There is a |
+1 for this one. |
|
I am a fan of |
I gave a little thought to the
fit()interface problem and this is what I came up with. I don't really like theinterfacearg name but thats just a naming thing.