-
Notifications
You must be signed in to change notification settings - Fork 16
/
forge.R
146 lines (139 loc) · 4.45 KB
/
forge.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
#' Forge prediction-ready data
#'
#' @description
#'
#' `forge()` applies the transformations requested by the specific `blueprint`
#' on a set of `new_data`. This `new_data` contains new predictors
#' (and potentially outcomes) that will be used to generate predictions.
#'
#' All blueprints have consistent return values with the others, but each is
#' unique enough to have its own help page. Click through below to learn
#' how to use each one in conjunction with `forge()`.
#'
#' * XY Method - [default_xy_blueprint()]
#'
#' * Formula Method - [default_formula_blueprint()]
#'
#' * Recipes Method - [default_recipe_blueprint()]
#'
#' @details
#'
#' If the outcomes are present in `new_data`, they can optionally be processed
#' and returned in the `outcomes` slot of the returned list by setting
#' `outcomes = TRUE`. This is very useful when doing cross validation where
#' you need to preprocess the outcomes of a test set before computing
#' performance.
#'
#' @param new_data A data frame or matrix of predictors to process. If
#' `outcomes = TRUE`, this should also contain the outcomes to process.
#'
#' @param blueprint A preprocessing `blueprint`.
#'
#' @param outcomes A logical. Should the outcomes be processed and returned
#' as well?
#'
#' @param ... Not used.
#'
#' @return
#'
#' A named list with 3 elements:
#'
#' - `predictors`: A tibble containing the preprocessed
#' `new_data` predictors.
#'
#' - `outcomes`: If `outcomes = TRUE`, a tibble containing the preprocessed
#' outcomes found in `new_data`. Otherwise, `NULL`.
#'
#' - `extras`: Either `NULL` if the blueprint returns no extra information,
#' or a named list containing the extra information.
#'
#' @examples
#' # See the blueprint specific documentation linked above
#' # for various ways to call forge with different
#' # blueprints.
#'
#' train <- iris[1:100, ]
#' test <- iris[101:150, ]
#'
#' # Formula
#' processed <- mold(
#' log(Sepal.Width) ~ Species,
#' train,
#' blueprint = default_formula_blueprint(indicators = "none")
#' )
#'
#' forge(test, processed$blueprint, outcomes = TRUE)
#' @export
forge <- function(new_data, blueprint, ..., outcomes = FALSE) {
UseMethod("forge")
}
#' @export
forge.default <- function(new_data, blueprint, ..., outcomes = FALSE) {
glubort("The class of `new_data`, '{class1(new_data)}', is not recognized.")
}
#' @export
forge.data.frame <- function(new_data, blueprint, ..., outcomes = FALSE) {
check_dots_empty0(...)
check_blueprint(blueprint)
run_forge(
blueprint,
new_data = new_data,
outcomes = outcomes
)
}
#' @export
forge.matrix <- forge.data.frame
# ------------------------------------------------------------------------------
#' `forge()` according to a blueprint
#'
#' @description
#' This is a developer facing function that is _only_ used if you are creating
#' your own blueprint subclass. It is called from [forge()] and dispatches off
#' the S3 class of the `blueprint`. This gives you an opportunity to forge the
#' new data in a way that is specific to your blueprint.
#'
#' `run_forge()` is always called from `forge()` with the same arguments, unlike
#' [run_mold()], because there aren't different interfaces for calling
#' `forge()`. `run_forge()` is always called as:
#'
#' `run_forge(blueprint, new_data = new_data, outcomes = outcomes)`
#'
#' If you write a blueprint subclass for [new_xy_blueprint()],
#' [new_recipe_blueprint()], [new_formula_blueprint()], or [new_blueprint()],
#' then your `run_forge()` method signature must match this.
#'
#' @inheritParams forge
#'
#' @return
#' `run_forge()` methods return the object that is then immediately returned
#' from `forge()`. See the return value section of [forge()] to understand what
#' the structure of the return value should look like.
#'
#' @name run-forge
#' @order 1
#' @export
#' @examples
#' bp <- default_xy_blueprint()
#'
#' outcomes <- mtcars["mpg"]
#' predictors <- mtcars
#' predictors$mpg <- NULL
#'
#' mold <- run_mold(bp, x = predictors, y = outcomes)
#'
#' run_forge(mold$blueprint, new_data = predictors)
run_forge <- function(blueprint,
new_data,
...,
outcomes = FALSE) {
UseMethod("run_forge")
}
#' @export
run_forge.default <- function(blueprint,
new_data,
...,
outcomes = FALSE) {
class <- class(blueprint)[[1L]]
message <- glue("No `run_forge()` method provided for an object of type <{class}>.")
abort(message)
}