New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invertible transformations #264
Comments
I don't know that
My thought is to have a specific step type that can be used to undo a previous step using the recipe(y ~ ., data = dat) %>%
step_center(a, b, id = "center a and b") %>%
step_scale(a, b, id = "scale a and b") %>%
step_impute_method(a, b,) %>%
step_undo(id = "scale a and b") %>%
step_undo(id = "center a and b") %>%
step_blab_blah_blah() My thinking is that we create S3 methods for steps that can be inverted using something along the lines of invert <- function (x, ...) {
UseMethod("invert")
}
invert.step <- function(x, ...) {
stop("This type of step cannot be inverted.", call. = FALSE)
}
invert.step_sqrt <- function(x, ...) {
# stuff
}
Can you give an example where those two pieces of information would be different? |
I'm on board. Are you thinking of a having a single recipe with that different generics can be applied to, or a forward recipe and an undo recipe?
I thought this was the case for PCA but have since realized this was a brain fart. |
At would be done at the step level (I think) with the usual recipes/prep/bake workflow. |
I think you need a separate undo recipe if this is trying to also solve the problem of back transforming predictions. suppressPackageStartupMessages({
library(AmesHousing)
library(recipes)
library(rsample)
library(parsnip)
})
ames <- make_ames()
split <- initial_split(ames)
train <- training(split)
test <- testing(split)
rec <- recipe(Sale_Price ~ Longitude, data = train) %>%
step_log(Sale_Price, id = "log")
p_rec <- prep(rec, training = train, retain = TRUE)
model <- linear_reg() %>%
set_engine("lm") %>%
fit(Sale_Price ~ Longitude, data = juice(p_rec))
pred <- predict(model, new_data = bake(p_rec, test))
pred
#> # A tibble: 732 x 1
#> .pred
#> <dbl>
#> 1 11.9
#> 2 11.9
#> 3 12.0
#> 4 12.0
#> 5 12.0
#> 6 12.0
#> 7 12.0
#> 8 11.9
#> 9 11.9
#> 10 11.9
#> # ... with 722 more rows
# need to undo log transform on the predictions Once you have |
I find that the most intuitive as well |
Has there been any progress on this issue? |
A little. We have step ID values that we can refer back to. Otherwise no. There probably won't be for a few months. |
@topepo, would you review a PR which began to implement |
Hi, has this already been implemented? This seems like a crucial step in every modelling framework... |
Will this include things like back transforming an outcome variable if it was log transformed? This would seem important for assessing models using yardstick, right? You would want metrics in the original scale of the outcome, not the log transformed scale. |
Side note: naive backtransformation of predictions is not in general
consistent for conditional means, although it is potentially useful. For
backtransformed predictions in the general case you really should Duan's
Smearing Estimator (which requires the ability to backtransform!).
…On Tue, Jun 2, 2020 at 3:25 PM Matt ***@***.***> wrote:
Will this include things like back transforming an outcome variable if it
was log transformed? This would seem important for assessing models using
yardstick, right? You would want metrics in the original scale of the
outcome, not the log transformed scale.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#264 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADTBG2YNK6ZZ2HYZHRFB2N3RUVN3XANCNFSM4GGIAKMQ>
.
|
This will eventually be a post-processor in |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue. |
In essence recipes currently define a forward map. You
fit()
the forward map withprep()
and apply it to data withbake()
. The backwards direction should work the same way, i.e. you should just apply some generic likeunbake()
to the data.But now there are these issues of fidelity in the recovery. Three possibilities:
The last two should definitely error or give a warning. This would also require modifying all the
prep()
methods to learn the backwards map atprep()
time (i.e. when youprep()
a PCA-step, if you support inversion, you now need to save the information associated with the back transform, which is distinct from the information associated with the forward transform).Side-note: these generics have nice analogues in the
sklearn
world:prep()
isfit()
juice()
isfit_transform()
bake()
istransform()
unbake()
would beuntransform()
, although I'm not sure how many inverse transformations (if any) are support forsklearn
The text was updated successfully, but these errors were encountered: