Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Imports:
tidyr,
globals
Roxygen: list(markdown = TRUE)
RoxygenNote: 6.1.0.9000
RoxygenNote: 6.1.1
Suggests:
testthat,
knitr,
Expand All @@ -39,4 +39,3 @@ Suggests:
xgboost,
covr,
sparklyr

6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# parsnip 0.0.1.9000

## New Features

* A "null model" is now available that fits a predictor-free model (using the mean of the outcome for regression or the mode for classification).

## Other Changes

* `varying_args()` now has a `full` argument to control whether the full set
Expand All @@ -26,6 +30,8 @@ column names once (#107).
* For multinomial regression using glmnet, `multi_predict()` now pulls the
correct default penalty (#108).



# parsnip 0.0.1

First CRAN release
Expand Down
10 changes: 5 additions & 5 deletions R/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,27 +33,27 @@ list(

`func` describes the function call (instead of having it in open code). `protect` identifies the arguments that the user should _not_ be allowed to modify, and `defaults` is a list of values that should be set but the user _can_ override.

To create the model fit call, the `protect` arguments are populated with the appropriate objects (usually from the data set), and `rlang::call2` is used to create a call that can be executed. The `translate` function can be used to show the call prototype if there is need to see it (or debugging).
To create the model fit call, the `protect` arguments are populated with the appropriate objects (usually from the data set), and `rlang::call2` is used to create a call that can be executed. The `translate()` function can be used to show the call prototype if there is need to see it (or debugging).

In the chunk above, the value of the `family` object is quoted (i.e., `expr(binomial)`). If this is not quotes, R will execute the value of the option when the package is compiled. In this case, the full function definition of the binomial family object will be embedded into the model call. Arguments are frequently quoted when making the call so that data objects or objects that don't exist when the package is compiled will not be embedded. (also see the enviromnets section below)

Additional notes:

* In cases where the model fit function is not a single function call, a wrapper function can be written to deal with this. See `parsnip::keras_mlp` and `parsnip::xgb_train`. this usually triggers package dependencies though.
* The `defaults` argument is not the only place to set defaults. The `translate` method for an model specification gets the last word on arguments. It is also a good place to deal with common argument errors and to make defaults based on the _mode_ of the model (e.g. classification or regression).
* In cases where the model fit function is not a single function call, a wrapper function can be written to deal with this. See `parsnip::keras_mlp()` and `parsnip::xgb_train()`. this usually triggers package dependencies though.
* The `defaults` argument is not the only place to set defaults. The `translate()` method for an model specification gets the last word on arguments. It is also a good place to deal with common argument errors and to make defaults based on the _mode_ of the model (e.g. classification or regression).
* Users can also pass in quoted arguments

## Environments

One of the first things that the `fit` function does is to make a new environment and store the data set and associated objects. For example:
One of the first things that the `fit()` function does is to make a new environment and store the data set and associated objects. For example:

```r
eval_env <- rlang::env()
eval_env$data <- data
eval_env$formula <- formula
```

This is designed to avoid any issues when executing the call object on the data using `eval_tidy`.
This is designed to avoid any issues when executing the call object on the data using `eval_tidy()`.

Any quoted arguments (such as the `family` example given above) are evaluated in this environment just before the model call is evaluated. For a user passes in an argument that is `floor(nrow(data)/3)`, this will be evaluated at this time in the captured environment.

Expand Down
6 changes: 3 additions & 3 deletions R/arguments.R
Original file line number Diff line number Diff line change
Expand Up @@ -67,15 +67,15 @@ check_eng_args <- function(args, obj, core_args) {

#' Change elements of a model specification
#'
#' `set_args` can be used to modify the arguments of a model specification while
#' `set_mode` is used to change the model's mode.
#' `set_args()` can be used to modify the arguments of a model specification while
#' `set_mode()` is used to change the model's mode.
#'
#' @param object A model specification.
#' @param ... One or more named model arguments.
#' @param mode A character string for the model type (e.g. "classification" or
#' "regression")
#' @return An updated model object.
#' @details `set_args` will replace existing values of the arguments.
#' @details `set_args()` will replace existing values of the arguments.
#'
#' @examples
#' rand_forest()
Expand Down
14 changes: 7 additions & 7 deletions R/boost_tree.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

#' General Interface for Boosted Trees
#'
#' `boost_tree` is a way to generate a _specification_ of a model
#' `boost_tree()` is a way to generate a _specification_ of a model
#' before fitting and allows the model to be created using
#' different packages in R or via Spark. The main arguments for the
#' model are:
Expand All @@ -22,9 +22,9 @@
#' }
#' These arguments are converted to their specific names at the
#' time that the model is fit. Other options and argument can be
#' set using the `set_engine` function. If left to their defaults
#' set using the `set_engine()` function. If left to their defaults
#' here (`NULL`), the values are taken from the underlying model
#' functions. If parameters need to be modified, `update` can be used
#' functions. If parameters need to be modified, `update()` can be used
#' in lieu of recreating the object from scratch.
#'
#' @param mode A single character string for the type of model.
Expand All @@ -48,7 +48,7 @@
#' each iteration while `C5.0` samples once during traning.
#' @details
#' The data given to the function are not saved and are only used
#' to determine the _mode_ of the model. For `boost_tree`, the
#' to determine the _mode_ of the model. For `boost_tree()`, the
#' possible modes are "regression" and "classification".
#'
#' The model can be created using the `fit()` function using the
Expand Down Expand Up @@ -87,13 +87,13 @@
#'
#' @note For models created using the spark engine, there are
#' several differences to consider. First, only the formula
#' interface to via `fit` is available; using `fit_xy` will
#' interface to via `fit()` is available; using `fit_xy()` will
#' generate an error. Second, the predictions will always be in a
#' spark table format. The names will be the same as documented but
#' without the dots. Third, there is no equivalent to factor
#' columns in spark tables so class predictions are returned as
#' character columns. Fourth, to retain the model object for a new
#' R session (via `save`), the `model$fit` element of the `parsnip`
#' R session (via `save()`), the `model$fit` element of the `parsnip`
#' object should be serialized via `ml_save(object$fit)` and
#' separately saved to disk. In a new session, the object can be
#' reloaded and reattached to the `parsnip` object.
Expand Down Expand Up @@ -149,7 +149,7 @@ print.boost_tree <- function(x, ...) {
#' @export
#' @inheritParams boost_tree
#' @param object A boosted tree model specification.
#' @param ... Not used for `update`.
#' @param ... Not used for `update()`.
#' @param fresh A logical for whether the arguments should be
#' modified in-place of or replaced wholesale.
#' @return An updated model specification.
Expand Down
4 changes: 2 additions & 2 deletions R/convert_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ convert_form_to_xy_fit <-function(

w <- as.vector(model.weights(mod_frame))
if (!is.null(w) && !is.numeric(w))
stop("'weights' must be a numeric vector", call. = FALSE)
stop("`weights` must be a numeric vector", call. = FALSE)

offset <- as.vector(model.offset(mod_frame))
if (!is.null(offset)) {
Expand Down Expand Up @@ -219,7 +219,7 @@ convert_xy_to_form_fit <- function(x, y, weights = NULL, y_name = "..y") {

if (!is.null(weights)) {
if (!is.numeric(weights))
stop("'weights' must be a numeric vector", call. = FALSE)
stop("`weights` must be a numeric vector", call. = FALSE)
if (length(weights) != nrow(x))
stop("`weights` should have ", nrow(x), " elements", call. = FALSE)
}
Expand Down
12 changes: 6 additions & 6 deletions R/decision_tree.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

#' General Interface for Decision Tree Models
#'
#' `decision_tree` is a way to generate a _specification_ of a model
#' `decision_tree()` is a way to generate a _specification_ of a model
#' before fitting and allows the model to be created using
#' different packages in R or via Spark. The main arguments for the
#' model are:
Expand All @@ -16,9 +16,9 @@
#' }
#' These arguments are converted to their specific names at the
#' time that the model is fit. Other options and argument can be
#' set using `set_engine`. If left to their defaults
#' set using `set_engine()`. If left to their defaults
#' here (`NULL`), the values are taken from the underlying model
#' functions. If parameters need to be modified, `update` can be used
#' functions. If parameters need to be modified, `update()` can be used
#' in lieu of recreating the object from scratch.
#'
#' @inheritParams boost_tree
Expand Down Expand Up @@ -72,13 +72,13 @@
#'
#' @note For models created using the spark engine, there are
#' several differences to consider. First, only the formula
#' interface to via `fit` is available; using `fit_xy` will
#' interface to via `fit()` is available; using `fit_xy()` will
#' generate an error. Second, the predictions will always be in a
#' spark table format. The names will be the same as documented but
#' without the dots. Third, there is no equivalent to factor
#' columns in spark tables so class predictions are returned as
#' character columns. Fourth, to retain the model object for a new
#' R session (via `save`), the `model$fit` element of the `parsnip`
#' R session (via `save()`), the `model$fit` element of the `parsnip`
#' object should be serialized via `ml_save(object$fit)` and
#' separately saved to disk. In a new session, the object can be
#' reloaded and reattached to the `parsnip` object.
Expand Down Expand Up @@ -112,7 +112,7 @@ decision_tree <-

#' @export
print.decision_tree <- function(x, ...) {
cat("Random Forest Model Specification (", x$mode, ")\n\n", sep = "")
cat("Decision Tree Model Specification (", x$mode, ")\n\n", sep = "")
model_printer(x, ...)

if(!is.null(x$method$fit$args)) {
Expand Down
2 changes: 1 addition & 1 deletion R/engines.R
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ load_libs <- function(x, quiet, attach = FALSE) {

#' Declare a computational engine and specific arguments
#'
#' `set_engine` is used to specify which package or system will be used
#' `set_engine()` is used to specify which package or system will be used
#' to fit the model, along with any arguments specific to that software.
#'
#' @param object A model specification.
Expand Down
28 changes: 14 additions & 14 deletions R/fit.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

#' Fit a Model Specification to a Dataset
#'
#' `fit` and `fit_xy` take a model specification, translate the required
#' `fit()` and `fit_xy()` take a model specification, translate the required
#' code by substituting arguments, and execute the model fit
#' routine.
#'
Expand All @@ -22,25 +22,25 @@
#' `catch`. See [fit_control()].
#' @param ... Not currently used; values passed here will be
#' ignored. Other options required to fit the model should be
#' passed using `set_engine`.
#' @details `fit` and `fit_xy` substitute the current arguments in the model
#' passed using `set_engine()`.
#' @details `fit()` and `fit_xy()` substitute the current arguments in the model
#' specification into the computational engine's code, checks them
#' for validity, then fits the model using the data and the
#' engine-specific code. Different model functions have different
#' interfaces (e.g. formula or `x`/`y`) and these functions translate
#' between the interface used when `fit` or `fit_xy` were invoked and the one
#' between the interface used when `fit()` or `fit_xy()` were invoked and the one
#' required by the underlying model.
#'
#' When possible, these functions attempt to avoid making copies of the
#' data. For example, if the underlying model uses a formula and
#' `fit` is invoked, the original data are references
#' `fit()` is invoked, the original data are references
#' when the model is fit. However, if the underlying model uses
#' something else, such as `x`/`y`, the formula is evaluated and
#' the data are converted to the required format. In this case, any
#' calls in the resulting model objects reference the temporary
#' objects used to fit the model.
#' @examples
#' # Although `glm` only has a formula interface, different
#' # Although `glm()` only has a formula interface, different
#' # methods for specifying the model can be used
#'
#' library(dplyr)
Expand Down Expand Up @@ -94,10 +94,10 @@ fit.model_spec <-
) {
dots <- quos(...)
if (any(names(dots) == "engine"))
stop("Use `set_engine` to supply the engine.", call. = FALSE)
stop("Use `set_engine()` to supply the engine.", call. = FALSE)

if (all(c("x", "y") %in% names(dots)))
stop("`fit.model_spec` is for the formula methods. Use `fit_xy` instead.",
stop("`fit.model_spec()` is for the formula methods. Use `fit_xy()` instead.",
call. = FALSE)
cl <- match.call(expand.dots = TRUE)
# Create an environment with the evaluated argument objects. This will be
Expand All @@ -111,7 +111,7 @@ fit.model_spec <-

if (object$engine == "spark" && !inherits(eval_env$data, "tbl_spark"))
stop(
"spark objects can only be used with the formula interface to `fit` ",
"spark objects can only be used with the formula interface to `fit()` ",
"with a spark data object.", call. = FALSE
)

Expand Down Expand Up @@ -178,7 +178,7 @@ fit_xy.model_spec <-
) {
dots <- quos(...)
if (any(names(dots) == "engine"))
stop("Use `set_engine` to supply the engine.", call. = FALSE)
stop("Use `set_engine()` to supply the engine.", call. = FALSE)

cl <- match.call(expand.dots = TRUE)
eval_env <- rlang::env()
Expand All @@ -188,7 +188,7 @@ fit_xy.model_spec <-

if (object$engine == "spark")
stop(
"spark objects can only be used with the formula interface to `fit` ",
"spark objects can only be used with the formula interface to `fit()` ",
"with a spark data object.", call. = FALSE
)

Expand Down Expand Up @@ -305,7 +305,7 @@ check_interface <- function(formula, data, cl, model) {
inher(formula, "formula", cl)
inher(data, c("data.frame", "tbl_spark"), cl)

# Determine the `fit` interface
# Determine the `fit()` interface
form_interface <- !is.null(formula) & !is.null(data)

if (form_interface)
Expand All @@ -322,10 +322,10 @@ check_xy_interface <- function(x, y, cl, model) {

# rule out spark data sets that don't use the formula interface
if (inherits(x, "tbl_spark") | inherits(y, "tbl_spark"))
stop("spark objects can only be used with the formula interface via `fit` ",
stop("spark objects can only be used with the formula interface via `fit()` ",
"with a spark data object.", call. = FALSE)

# Determine the `fit` interface
# Determine the `fit()` interface
matrix_interface <- !is.null(x) & !is.null(y) && is.matrix(x)
df_interface <- !is.null(x) & !is.null(y) && is.data.frame(x)

Expand Down
4 changes: 2 additions & 2 deletions R/fit_helpers.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# These functions are the go-betweens between parsnip::fit (or parsnip::fit_xy)
# and the underlying model function (such as ranger::ranger). So if `fit_xy` is
# and the underlying model function (such as ranger::ranger). So if `fit_xy()` is
# used to fit a ranger model, there needs to be a conversion from x/y format
# data to formula/data objects and so on.

Expand Down Expand Up @@ -66,7 +66,7 @@ form_form <-
xy_xy <- function(object, env, control, target = "none", ...) {

if (inherits(env$x, "tbl_spark") | inherits(env$y, "tbl_spark"))
stop("spark objects can only be used with the formula interface to `fit`",
stop("spark objects can only be used with the formula interface to `fit()`",
call. = FALSE)

object <- check_mode(object, levels(env$y))
Expand Down
16 changes: 8 additions & 8 deletions R/linear_reg.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' General Interface for Linear Regression Models
#'
#' `linear_reg` is a way to generate a _specification_ of a model
#' `linear_reg()` is a way to generate a _specification_ of a model
#' before fitting and allows the model to be created using
#' different packages in R, Stan, keras, or via Spark. The main
#' arguments for the model are:
Expand All @@ -12,9 +12,9 @@
#' }
#' These arguments are converted to their specific names at the
#' time that the model is fit. Other options and argument can be
#' set using `set_engine`. If left to their defaults
#' set using `set_engine()`. If left to their defaults
#' here (`NULL`), the values are taken from the underlying model
#' functions. If parameters need to be modified, `update` can be used
#' functions. If parameters need to be modified, `update()` can be used
#' in lieu of recreating the object from scratch.
#' @inheritParams boost_tree
#' @param mode A single character string for the type of model.
Expand All @@ -30,7 +30,7 @@
#' (the lasso) (`glmnet` and `spark` only).
#' @details
#' The data given to the function are not saved and are only used
#' to determine the _mode_ of the model. For `linear_reg`, the
#' to determine the _mode_ of the model. For `linear_reg()`, the
#' mode will always be "regression".
#'
#' The model can be created using the `fit()` function using the
Expand Down Expand Up @@ -71,11 +71,11 @@
#' When using `glmnet` models, there is the option to pass
#' multiple values (or no values) to the `penalty` argument.
#' This can have an effect on the model object results. When using
#' the `predict` method in these cases, the return object type
#' the `predict()` method in these cases, the return object type
#' depends on the value of `penalty`. If a single value is
#' given, the results will be a simple numeric vector. When
#' multiple values or no values for `penalty` are used in
#' `linear_reg`, the `predict` method will return a data frame with
#' `linear_reg()`, the `predict()` method will return a data frame with
#' columns `values` and `lambda`.
#'
#' For prediction, the `stan` engine can compute posterior
Expand All @@ -87,13 +87,13 @@
#'
#' @note For models created using the spark engine, there are
#' several differences to consider. First, only the formula
#' interface to via `fit` is available; using `fit_xy` will
#' interface to via `fit()` is available; using `fit_xy()` will
#' generate an error. Second, the predictions will always be in a
#' spark table format. The names will be the same as documented but
#' without the dots. Third, there is no equivalent to factor
#' columns in spark tables so class predictions are returned as
#' character columns. Fourth, to retain the model object for a new
#' R session (via `save`), the `model$fit` element of the `parsnip`
#' R session (via `save()`), the `model$fit` element of the `parsnip`
#' object should be serialized via `ml_save(object$fit)` and
#' separately saved to disk. In a new session, the object can be
#' reloaded and reattached to the `parsnip` object.
Expand Down
Loading