diff --git a/DESCRIPTION b/DESCRIPTION index 56455e72a..61788129c 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -29,7 +29,7 @@ Imports: tidyr, globals Roxygen: list(markdown = TRUE) -RoxygenNote: 6.1.0.9000 +RoxygenNote: 6.1.1 Suggests: testthat, knitr, @@ -39,4 +39,3 @@ Suggests: xgboost, covr, sparklyr - diff --git a/NEWS.md b/NEWS.md index 3ab24e94c..b2e05433a 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,9 @@ # parsnip 0.0.1.9000 +## New Features + +* A "null model" is now available that fits a predictor-free model (using the mean of the outcome for regression or the mode for classification). + ## Other Changes * `varying_args()` now has a `full` argument to control whether the full set @@ -26,6 +30,8 @@ column names once (#107). * For multinomial regression using glmnet, `multi_predict()` now pulls the correct default penalty (#108). + + # parsnip 0.0.1 First CRAN release diff --git a/R/README.md b/R/README.md index 3d4e0c7b9..8fd2a7aae 100644 --- a/R/README.md +++ b/R/README.md @@ -33,19 +33,19 @@ list( `func` describes the function call (instead of having it in open code). `protect` identifies the arguments that the user should _not_ be allowed to modify, and `defaults` is a list of values that should be set but the user _can_ override. -To create the model fit call, the `protect` arguments are populated with the appropriate objects (usually from the data set), and `rlang::call2` is used to create a call that can be executed. The `translate` function can be used to show the call prototype if there is need to see it (or debugging). +To create the model fit call, the `protect` arguments are populated with the appropriate objects (usually from the data set), and `rlang::call2` is used to create a call that can be executed. The `translate()` function can be used to show the call prototype if there is need to see it (or debugging). In the chunk above, the value of the `family` object is quoted (i.e., `expr(binomial)`). If this is not quotes, R will execute the value of the option when the package is compiled. In this case, the full function definition of the binomial family object will be embedded into the model call. Arguments are frequently quoted when making the call so that data objects or objects that don't exist when the package is compiled will not be embedded. (also see the enviromnets section below) Additional notes: - * In cases where the model fit function is not a single function call, a wrapper function can be written to deal with this. See `parsnip::keras_mlp` and `parsnip::xgb_train`. this usually triggers package dependencies though. - * The `defaults` argument is not the only place to set defaults. The `translate` method for an model specification gets the last word on arguments. It is also a good place to deal with common argument errors and to make defaults based on the _mode_ of the model (e.g. classification or regression). + * In cases where the model fit function is not a single function call, a wrapper function can be written to deal with this. See `parsnip::keras_mlp()` and `parsnip::xgb_train()`. this usually triggers package dependencies though. + * The `defaults` argument is not the only place to set defaults. The `translate()` method for an model specification gets the last word on arguments. It is also a good place to deal with common argument errors and to make defaults based on the _mode_ of the model (e.g. classification or regression). * Users can also pass in quoted arguments ## Environments -One of the first things that the `fit` function does is to make a new environment and store the data set and associated objects. For example: +One of the first things that the `fit()` function does is to make a new environment and store the data set and associated objects. For example: ```r eval_env <- rlang::env() @@ -53,7 +53,7 @@ eval_env$data <- data eval_env$formula <- formula ``` -This is designed to avoid any issues when executing the call object on the data using `eval_tidy`. +This is designed to avoid any issues when executing the call object on the data using `eval_tidy()`. Any quoted arguments (such as the `family` example given above) are evaluated in this environment just before the model call is evaluated. For a user passes in an argument that is `floor(nrow(data)/3)`, this will be evaluated at this time in the captured environment. diff --git a/R/arguments.R b/R/arguments.R index 4db44be42..f32c741c3 100644 --- a/R/arguments.R +++ b/R/arguments.R @@ -67,15 +67,15 @@ check_eng_args <- function(args, obj, core_args) { #' Change elements of a model specification #' -#' `set_args` can be used to modify the arguments of a model specification while -#' `set_mode` is used to change the model's mode. +#' `set_args()` can be used to modify the arguments of a model specification while +#' `set_mode()` is used to change the model's mode. #' #' @param object A model specification. #' @param ... One or more named model arguments. #' @param mode A character string for the model type (e.g. "classification" or #' "regression") #' @return An updated model object. -#' @details `set_args` will replace existing values of the arguments. +#' @details `set_args()` will replace existing values of the arguments. #' #' @examples #' rand_forest() diff --git a/R/boost_tree.R b/R/boost_tree.R index d2253701f..0547b9bab 100644 --- a/R/boost_tree.R +++ b/R/boost_tree.R @@ -2,7 +2,7 @@ #' General Interface for Boosted Trees #' -#' `boost_tree` is a way to generate a _specification_ of a model +#' `boost_tree()` is a way to generate a _specification_ of a model #' before fitting and allows the model to be created using #' different packages in R or via Spark. The main arguments for the #' model are: @@ -22,9 +22,9 @@ #' } #' These arguments are converted to their specific names at the #' time that the model is fit. Other options and argument can be -#' set using the `set_engine` function. If left to their defaults +#' set using the `set_engine()` function. If left to their defaults #' here (`NULL`), the values are taken from the underlying model -#' functions. If parameters need to be modified, `update` can be used +#' functions. If parameters need to be modified, `update()` can be used #' in lieu of recreating the object from scratch. #' #' @param mode A single character string for the type of model. @@ -48,7 +48,7 @@ #' each iteration while `C5.0` samples once during traning. #' @details #' The data given to the function are not saved and are only used -#' to determine the _mode_ of the model. For `boost_tree`, the +#' to determine the _mode_ of the model. For `boost_tree()`, the #' possible modes are "regression" and "classification". #' #' The model can be created using the `fit()` function using the @@ -87,13 +87,13 @@ #' #' @note For models created using the spark engine, there are #' several differences to consider. First, only the formula -#' interface to via `fit` is available; using `fit_xy` will +#' interface to via `fit()` is available; using `fit_xy()` will #' generate an error. Second, the predictions will always be in a #' spark table format. The names will be the same as documented but #' without the dots. Third, there is no equivalent to factor #' columns in spark tables so class predictions are returned as #' character columns. Fourth, to retain the model object for a new -#' R session (via `save`), the `model$fit` element of the `parsnip` +#' R session (via `save()`), the `model$fit` element of the `parsnip` #' object should be serialized via `ml_save(object$fit)` and #' separately saved to disk. In a new session, the object can be #' reloaded and reattached to the `parsnip` object. @@ -149,7 +149,7 @@ print.boost_tree <- function(x, ...) { #' @export #' @inheritParams boost_tree #' @param object A boosted tree model specification. -#' @param ... Not used for `update`. +#' @param ... Not used for `update()`. #' @param fresh A logical for whether the arguments should be #' modified in-place of or replaced wholesale. #' @return An updated model specification. diff --git a/R/convert_data.R b/R/convert_data.R index dbd6603cf..382be08ec 100644 --- a/R/convert_data.R +++ b/R/convert_data.R @@ -64,7 +64,7 @@ convert_form_to_xy_fit <-function( w <- as.vector(model.weights(mod_frame)) if (!is.null(w) && !is.numeric(w)) - stop("'weights' must be a numeric vector", call. = FALSE) + stop("`weights` must be a numeric vector", call. = FALSE) offset <- as.vector(model.offset(mod_frame)) if (!is.null(offset)) { @@ -219,7 +219,7 @@ convert_xy_to_form_fit <- function(x, y, weights = NULL, y_name = "..y") { if (!is.null(weights)) { if (!is.numeric(weights)) - stop("'weights' must be a numeric vector", call. = FALSE) + stop("`weights` must be a numeric vector", call. = FALSE) if (length(weights) != nrow(x)) stop("`weights` should have ", nrow(x), " elements", call. = FALSE) } diff --git a/R/decision_tree.R b/R/decision_tree.R index 7f14e5c30..67f597e6c 100644 --- a/R/decision_tree.R +++ b/R/decision_tree.R @@ -2,7 +2,7 @@ #' General Interface for Decision Tree Models #' -#' `decision_tree` is a way to generate a _specification_ of a model +#' `decision_tree()` is a way to generate a _specification_ of a model #' before fitting and allows the model to be created using #' different packages in R or via Spark. The main arguments for the #' model are: @@ -16,9 +16,9 @@ #' } #' These arguments are converted to their specific names at the #' time that the model is fit. Other options and argument can be -#' set using `set_engine`. If left to their defaults +#' set using `set_engine()`. If left to their defaults #' here (`NULL`), the values are taken from the underlying model -#' functions. If parameters need to be modified, `update` can be used +#' functions. If parameters need to be modified, `update()` can be used #' in lieu of recreating the object from scratch. #' #' @inheritParams boost_tree @@ -72,13 +72,13 @@ #' #' @note For models created using the spark engine, there are #' several differences to consider. First, only the formula -#' interface to via `fit` is available; using `fit_xy` will +#' interface to via `fit()` is available; using `fit_xy()` will #' generate an error. Second, the predictions will always be in a #' spark table format. The names will be the same as documented but #' without the dots. Third, there is no equivalent to factor #' columns in spark tables so class predictions are returned as #' character columns. Fourth, to retain the model object for a new -#' R session (via `save`), the `model$fit` element of the `parsnip` +#' R session (via `save()`), the `model$fit` element of the `parsnip` #' object should be serialized via `ml_save(object$fit)` and #' separately saved to disk. In a new session, the object can be #' reloaded and reattached to the `parsnip` object. @@ -112,7 +112,7 @@ decision_tree <- #' @export print.decision_tree <- function(x, ...) { - cat("Random Forest Model Specification (", x$mode, ")\n\n", sep = "") + cat("Decision Tree Model Specification (", x$mode, ")\n\n", sep = "") model_printer(x, ...) if(!is.null(x$method$fit$args)) { diff --git a/R/engines.R b/R/engines.R index 049b4e85c..a1b32de5f 100644 --- a/R/engines.R +++ b/R/engines.R @@ -77,7 +77,7 @@ load_libs <- function(x, quiet, attach = FALSE) { #' Declare a computational engine and specific arguments #' -#' `set_engine` is used to specify which package or system will be used +#' `set_engine()` is used to specify which package or system will be used #' to fit the model, along with any arguments specific to that software. #' #' @param object A model specification. diff --git a/R/fit.R b/R/fit.R index 2f42309e0..594b9d21a 100644 --- a/R/fit.R +++ b/R/fit.R @@ -5,7 +5,7 @@ #' Fit a Model Specification to a Dataset #' -#' `fit` and `fit_xy` take a model specification, translate the required +#' `fit()` and `fit_xy()` take a model specification, translate the required #' code by substituting arguments, and execute the model fit #' routine. #' @@ -22,25 +22,25 @@ #' `catch`. See [fit_control()]. #' @param ... Not currently used; values passed here will be #' ignored. Other options required to fit the model should be -#' passed using `set_engine`. -#' @details `fit` and `fit_xy` substitute the current arguments in the model +#' passed using `set_engine()`. +#' @details `fit()` and `fit_xy()` substitute the current arguments in the model #' specification into the computational engine's code, checks them #' for validity, then fits the model using the data and the #' engine-specific code. Different model functions have different #' interfaces (e.g. formula or `x`/`y`) and these functions translate -#' between the interface used when `fit` or `fit_xy` were invoked and the one +#' between the interface used when `fit()` or `fit_xy()` were invoked and the one #' required by the underlying model. #' #' When possible, these functions attempt to avoid making copies of the #' data. For example, if the underlying model uses a formula and -#' `fit` is invoked, the original data are references +#' `fit()` is invoked, the original data are references #' when the model is fit. However, if the underlying model uses #' something else, such as `x`/`y`, the formula is evaluated and #' the data are converted to the required format. In this case, any #' calls in the resulting model objects reference the temporary #' objects used to fit the model. #' @examples -#' # Although `glm` only has a formula interface, different +#' # Although `glm()` only has a formula interface, different #' # methods for specifying the model can be used #' #' library(dplyr) @@ -94,10 +94,10 @@ fit.model_spec <- ) { dots <- quos(...) if (any(names(dots) == "engine")) - stop("Use `set_engine` to supply the engine.", call. = FALSE) + stop("Use `set_engine()` to supply the engine.", call. = FALSE) if (all(c("x", "y") %in% names(dots))) - stop("`fit.model_spec` is for the formula methods. Use `fit_xy` instead.", + stop("`fit.model_spec()` is for the formula methods. Use `fit_xy()` instead.", call. = FALSE) cl <- match.call(expand.dots = TRUE) # Create an environment with the evaluated argument objects. This will be @@ -111,7 +111,7 @@ fit.model_spec <- if (object$engine == "spark" && !inherits(eval_env$data, "tbl_spark")) stop( - "spark objects can only be used with the formula interface to `fit` ", + "spark objects can only be used with the formula interface to `fit()` ", "with a spark data object.", call. = FALSE ) @@ -178,7 +178,7 @@ fit_xy.model_spec <- ) { dots <- quos(...) if (any(names(dots) == "engine")) - stop("Use `set_engine` to supply the engine.", call. = FALSE) + stop("Use `set_engine()` to supply the engine.", call. = FALSE) cl <- match.call(expand.dots = TRUE) eval_env <- rlang::env() @@ -188,7 +188,7 @@ fit_xy.model_spec <- if (object$engine == "spark") stop( - "spark objects can only be used with the formula interface to `fit` ", + "spark objects can only be used with the formula interface to `fit()` ", "with a spark data object.", call. = FALSE ) @@ -305,7 +305,7 @@ check_interface <- function(formula, data, cl, model) { inher(formula, "formula", cl) inher(data, c("data.frame", "tbl_spark"), cl) - # Determine the `fit` interface + # Determine the `fit()` interface form_interface <- !is.null(formula) & !is.null(data) if (form_interface) @@ -322,10 +322,10 @@ check_xy_interface <- function(x, y, cl, model) { # rule out spark data sets that don't use the formula interface if (inherits(x, "tbl_spark") | inherits(y, "tbl_spark")) - stop("spark objects can only be used with the formula interface via `fit` ", + stop("spark objects can only be used with the formula interface via `fit()` ", "with a spark data object.", call. = FALSE) - # Determine the `fit` interface + # Determine the `fit()` interface matrix_interface <- !is.null(x) & !is.null(y) && is.matrix(x) df_interface <- !is.null(x) & !is.null(y) && is.data.frame(x) diff --git a/R/fit_helpers.R b/R/fit_helpers.R index fce3d77bf..23f8cb9e0 100644 --- a/R/fit_helpers.R +++ b/R/fit_helpers.R @@ -1,5 +1,5 @@ # These functions are the go-betweens between parsnip::fit (or parsnip::fit_xy) -# and the underlying model function (such as ranger::ranger). So if `fit_xy` is +# and the underlying model function (such as ranger::ranger). So if `fit_xy()` is # used to fit a ranger model, there needs to be a conversion from x/y format # data to formula/data objects and so on. @@ -66,7 +66,7 @@ form_form <- xy_xy <- function(object, env, control, target = "none", ...) { if (inherits(env$x, "tbl_spark") | inherits(env$y, "tbl_spark")) - stop("spark objects can only be used with the formula interface to `fit`", + stop("spark objects can only be used with the formula interface to `fit()`", call. = FALSE) object <- check_mode(object, levels(env$y)) diff --git a/R/linear_reg.R b/R/linear_reg.R index b4b716212..456f4bc9c 100644 --- a/R/linear_reg.R +++ b/R/linear_reg.R @@ -1,6 +1,6 @@ #' General Interface for Linear Regression Models #' -#' `linear_reg` is a way to generate a _specification_ of a model +#' `linear_reg()` is a way to generate a _specification_ of a model #' before fitting and allows the model to be created using #' different packages in R, Stan, keras, or via Spark. The main #' arguments for the model are: @@ -12,9 +12,9 @@ #' } #' These arguments are converted to their specific names at the #' time that the model is fit. Other options and argument can be -#' set using `set_engine`. If left to their defaults +#' set using `set_engine()`. If left to their defaults #' here (`NULL`), the values are taken from the underlying model -#' functions. If parameters need to be modified, `update` can be used +#' functions. If parameters need to be modified, `update()` can be used #' in lieu of recreating the object from scratch. #' @inheritParams boost_tree #' @param mode A single character string for the type of model. @@ -30,7 +30,7 @@ #' (the lasso) (`glmnet` and `spark` only). #' @details #' The data given to the function are not saved and are only used -#' to determine the _mode_ of the model. For `linear_reg`, the +#' to determine the _mode_ of the model. For `linear_reg()`, the #' mode will always be "regression". #' #' The model can be created using the `fit()` function using the @@ -71,11 +71,11 @@ #' When using `glmnet` models, there is the option to pass #' multiple values (or no values) to the `penalty` argument. #' This can have an effect on the model object results. When using -#' the `predict` method in these cases, the return object type +#' the `predict()` method in these cases, the return object type #' depends on the value of `penalty`. If a single value is #' given, the results will be a simple numeric vector. When #' multiple values or no values for `penalty` are used in -#' `linear_reg`, the `predict` method will return a data frame with +#' `linear_reg()`, the `predict()` method will return a data frame with #' columns `values` and `lambda`. #' #' For prediction, the `stan` engine can compute posterior @@ -87,13 +87,13 @@ #' #' @note For models created using the spark engine, there are #' several differences to consider. First, only the formula -#' interface to via `fit` is available; using `fit_xy` will +#' interface to via `fit()` is available; using `fit_xy()` will #' generate an error. Second, the predictions will always be in a #' spark table format. The names will be the same as documented but #' without the dots. Third, there is no equivalent to factor #' columns in spark tables so class predictions are returned as #' character columns. Fourth, to retain the model object for a new -#' R session (via `save`), the `model$fit` element of the `parsnip` +#' R session (via `save()`), the `model$fit` element of the `parsnip` #' object should be serialized via `ml_save(object$fit)` and #' separately saved to disk. In a new session, the object can be #' reloaded and reattached to the `parsnip` object. diff --git a/R/logistic_reg.R b/R/logistic_reg.R index 01cbb0334..10a4df1db 100644 --- a/R/logistic_reg.R +++ b/R/logistic_reg.R @@ -1,6 +1,6 @@ #' General Interface for Logistic Regression Models #' -#' `logistic_reg` is a way to generate a _specification_ of a model +#' `logistic_reg()` is a way to generate a _specification_ of a model #' before fitting and allows the model to be created using #' different packages in R, Stan, keras, or via Spark. The main #' arguments for the model are: @@ -12,9 +12,9 @@ #' } #' These arguments are converted to their specific names at the #' time that the model is fit. Other options and argument can be -#' set using `set_engine`. If left to their defaults +#' set using `set_engine()`. If left to their defaults #' here (`NULL`), the values are taken from the underlying model -#' functions. If parameters need to be modified, `update` can be used +#' functions. If parameters need to be modified, `update()` can be used #' in lieu of recreating the object from scratch. #' @inheritParams boost_tree #' @param mode A single character string for the type of model. @@ -29,7 +29,7 @@ #' L2 penalty (i.e. weight decay, or ridge regression) versus L1 #' (the lasso) (`glmnet` and `spark` only). #' @details -#' For `logistic_reg`, the mode will always be "classification". +#' For `logistic_reg()`, the mode will always be "classification". #' #' The model can be created using the `fit()` function using the #' following _engines_: @@ -69,11 +69,11 @@ #' When using `glmnet` models, there is the option to pass #' multiple values (or no values) to the `penalty` argument. #' This can have an effect on the model object results. When using -#' the `predict` method in these cases, the return object type +#' the `predict()` method in these cases, the return object type #' depends on the value of `penalty`. If a single value is #' given, the results will be a simple numeric vector. When #' multiple values or no values for `penalty` are used in -#' `logistic_reg`, the `predict` method will return a data frame with +#' `logistic_reg()`, the `predict()` method will return a data frame with #' columns `values` and `lambda`. #' #' For prediction, the `stan` engine can compute posterior @@ -86,13 +86,13 @@ #' #' @note For models created using the spark engine, there are #' several differences to consider. First, only the formula -#' interface to via `fit` is available; using `fit_xy` will +#' interface to via `fit()` is available; using `fit_xy()` will #' generate an error. Second, the predictions will always be in a #' spark table format. The names will be the same as documented but #' without the dots. Third, there is no equivalent to factor #' columns in spark tables so class predictions are returned as #' character columns. Fourth, to retain the model object for a new -#' R session (via `save`), the `model$fit` element of the `parsnip` +#' R session (via `save()`), the `model$fit` element of the `parsnip` #' object should be serialized via `ml_save(object$fit)` and #' separately saved to disk. In a new session, the object can be #' reloaded and reattached to the `parsnip` object. diff --git a/R/mars.R b/R/mars.R index c9171d9a5..1d85ec4fb 100644 --- a/R/mars.R +++ b/R/mars.R @@ -2,7 +2,7 @@ #' #' General Interface for MARS #' -#' `mars` is a way to generate a _specification_ of a model before +#' `mars()` is a way to generate a _specification_ of a model before #' fitting and allows the model to be created using R. The main #' arguments for the #' model are: @@ -17,9 +17,9 @@ #' } #' These arguments are converted to their specific names at the #' time that the model is fit. Other options and argument can be -#' set using `set_engine`. If left to their defaults +#' set using `set_engine()`. If left to their defaults #' here (`NULL`), the values are taken from the underlying model -#' functions. If parameters need to be modified, `update` can be used +#' functions. If parameters need to be modified, `update()` can be used #' in lieu of recreating the object from scratch. #' #' @inheritParams boost_tree diff --git a/R/misc.R b/R/misc.R index ff1d77f77..ef59119d9 100644 --- a/R/misc.R +++ b/R/misc.R @@ -18,7 +18,7 @@ make_classes <- function(prefix) { check_empty_ellipse <- function (...) { terms <- quos(...) if (!is_empty(terms)) - stop("Please pass other arguments to the model function via `set_engine`", call. = FALSE) + stop("Please pass other arguments to the model function via `set_engine()`", call. = FALSE) terms } diff --git a/R/mlp.R b/R/mlp.R index 8706a46b6..9d037b138 100644 --- a/R/mlp.R +++ b/R/mlp.R @@ -1,6 +1,6 @@ #' General Interface for Single Layer Neural Network #' -#' `mlp`, for multilayer perceptron, is a way to generate a _specification_ of +#' `mlp()`, for multilayer perceptron, is a way to generate a _specification_ of #' a model before fitting and allows the model to be created using #' different packages in R or via keras The main arguments for the #' model are: @@ -18,13 +18,13 @@ #' #' These arguments are converted to their specific names at the #' time that the model is fit. Other options and argument can be -#' set using `set_engine`. If left to their defaults +#' set using `set_engine()`. If left to their defaults #' here (see above), the values are taken from the underlying model #' functions. One exception is `hidden_units` when `nnet::nnet` is used; that #' function's `size` argument has no default so a value of 5 units will be #' used. Also, unless otherwise specified, the `linout` argument to -#' `nnet::nnet` will be set to `TRUE` when a regression model is created. -#' If parameters need to be modified, `update` can be used +#' `nnet::nnet()` will be set to `TRUE` when a regression model is created. +#' If parameters need to be modified, `update()` can be used #' in lieu of recreating the object from scratch. #' #' @inheritParams boost_tree diff --git a/R/mlp_data.R b/R/mlp_data.R index 4eac89dcc..724035e7e 100644 --- a/R/mlp_data.R +++ b/R/mlp_data.R @@ -143,7 +143,7 @@ mlp_nnet_data <- class2ind <- function (x, drop2nd = FALSE) { if (!is.factor(x)) - stop("'x' should be a factor") + stop("`x` should be a factor") y <- model.matrix( ~ x - 1) colnames(y) <- gsub("^x", "", colnames(y)) attributes(y)$assign <- NULL diff --git a/R/multinom_reg.R b/R/multinom_reg.R index 444f5667e..33fc37918 100644 --- a/R/multinom_reg.R +++ b/R/multinom_reg.R @@ -1,6 +1,6 @@ #' General Interface for Multinomial Regression Models #' -#' `multinom_reg` is a way to generate a _specification_ of a model +#' `multinom_reg()` is a way to generate a _specification_ of a model #' before fitting and allows the model to be created using #' different packages in R, keras, or Spark. The main arguments for the #' model are: @@ -12,9 +12,9 @@ #' } #' These arguments are converted to their specific names at the #' time that the model is fit. Other options and argument can be -#' set using `set_engine`. If left to their defaults +#' set using `set_engine()`. If left to their defaults #' here (`NULL`), the values are taken from the underlying model -#' functions. If parameters need to be modified, `update` can be used +#' functions. If parameters need to be modified, `update()` can be used #' in lieu of recreating the object from scratch. #' @inheritParams boost_tree #' @param mode A single character string for the type of model. @@ -29,7 +29,7 @@ #' L2 penalty (i.e. weight decay, or ridge regression) versus L1 #' (the lasso) (`glmnet` only). #' @details -#' For `multinom_reg`, the mode will always be "classification". +#' For `multinom_reg()`, the mode will always be "classification". #' #' The model can be created using the `fit()` function using the #' following _engines_: @@ -60,22 +60,22 @@ #' When using `glmnet` models, there is the option to pass #' multiple values (or no values) to the `penalty` argument. #' This can have an effect on the model object results. When using -#' the `predict` method in these cases, the return object type +#' the `predict()` method in these cases, the return object type #' depends on the value of `penalty`. If a single value is #' given, the results will be a simple numeric vector. When #' multiple values or no values for `penalty` are used in -#' `multinom_reg`, the `predict` method will return a data frame with +#' `multinom_reg()`, the `predict()` method will return a data frame with #' columns `values` and `lambda`. #' #' @note For models created using the spark engine, there are #' several differences to consider. First, only the formula -#' interface to via `fit` is available; using `fit_xy` will +#' interface to via `fit()` is available; using `fit_xy()` will #' generate an error. Second, the predictions will always be in a #' spark table format. The names will be the same as documented but #' without the dots. Third, there is no equivalent to factor #' columns in spark tables so class predictions are returned as #' character columns. Fourth, to retain the model object for a new -#' R session (via `save`), the `model$fit` element of the `parsnip` +#' R session (via `save()`), the `model$fit` element of the `parsnip` #' object should be serialized via `ml_save(object$fit)` and #' separately saved to disk. In a new session, the object can be #' reloaded and reattached to the `parsnip` object. @@ -223,7 +223,7 @@ predict._multnet <- if (length(penalty) != 1) stop("`penalty` should be a single numeric value. ", - "`multi_predict` can be used to get multiple predictions ", + "`multi_predict()` can be used to get multiple predictions ", "per row of data.", call. = FALSE) object$spec <- eval_args(object$spec) res <- predict.model_fit( @@ -299,9 +299,9 @@ multi_predict._multnet <- check_glmnet_lambda <- function(dat, object) { if (length(object$fit$lambda) > 1) stop( - "`predict` doesn't work with multiple penalties (i.e. lambdas). ", + "`predict()` doesn't work with multiple penalties (i.e. lambdas). ", "Please specify a single value using `penalty = some_value` or use ", - "`multi_predict` to get multiple predictions per row of data.", + "`multi_predict()` to get multiple predictions per row of data.", call. = FALSE ) dat diff --git a/R/nearest_neighbor.R b/R/nearest_neighbor.R index b85c16a9c..816e0843d 100644 --- a/R/nearest_neighbor.R +++ b/R/nearest_neighbor.R @@ -19,7 +19,7 @@ #' } #' These arguments are converted to their specific names at the #' time that the model is fit. Other options and argument can be -#' set using `set_engine`. If left to their defaults +#' set using `set_engine()`. If left to their defaults #' here (`NULL`), the values are taken from the underlying model #' functions. If parameters need to be modified, `update()` can be used #' in lieu of recreating the object from scratch. diff --git a/R/nullmodel.R b/R/nullmodel.R index 84ec41cc6..c37ecfead 100644 --- a/R/nullmodel.R +++ b/R/nullmodel.R @@ -1,8 +1,9 @@ #' Fit a simple, non-informative model #' -#' Fit a single mean or largest class model +#' Fit a single mean or largest class model. `nullmodel()` is the underlying +#' computational function for the `null_model()` specification. #' -#' \code{nullmodel} emulates other model building functions, but returns the +#' `nullmodel()` emulates other model building functions, but returns the #' simplest model possible given a training set: a single mean for numeric #' outcomes and the most prevalent class for factor outcomes. When class #' probabilities are requested, the percentage of the training set samples with @@ -19,7 +20,7 @@ #' the number of predictions to return) #' @param type Either "raw" (for regression), "class" or "prob" (for #' classification) -#' @return The output of \code{nullmodel} is a list of class \code{nullmodel} +#' @return The output of `nullmodel()` is a list of class \code{nullmodel} #' with elements \item{call }{the function call} \item{value }{the mean of #' \code{y} or the most prevalent class} \item{levels }{when \code{y} is a #' factor, a vector of levels. \code{NULL} otherwise} \item{pct }{when \code{y} @@ -28,7 +29,7 @@ #' the training samples with that class (the other columns are zero). } \item{n #' }{the number of elements in \code{y}} #' -#' \code{predict.nullmodel} returns a either a factor or numeric vector +#' `predict.nullmodel()` returns a either a factor or numeric vector #' depending on the class of \code{y}. All predictions are always the same. #' @keywords models #' @examples @@ -110,7 +111,7 @@ predict.nullmodel <- function (object, new_data = NULL, type = NULL, ...) { out <- factor(rep(object$value, n), levels = object$levels) } } else { - if(type %in% c("prob", "class")) stop("ony raw predicitons are applicable to regression models") + if(type %in% c("prob", "class")) stop("Only numeric predicitons are applicable to regression models") if(length(object$value) == 1) { out <- rep(object$value, n) } else { @@ -125,7 +126,7 @@ predict.nullmodel <- function (object, new_data = NULL, type = NULL, ...) { #' General Interface for null models #' -#' `null_model` is a way to generate a _specification_ of a model before +#' `null_model()` is a way to generate a _specification_ of a model before #' fitting and allows the model to be created using R. It doens't have any #' main arguments. #' diff --git a/R/predict.R b/R/predict.R index 6e3656bca..7b498bc6b 100644 --- a/R/predict.R +++ b/R/predict.R @@ -1,14 +1,14 @@ #' Model predictions #' #' Apply a model to create different types of predictions. -#' `predict` can be used for all types of models and used the +#' `predict()` can be used for all types of models and used the #' "type" argument for more specificity. #' #' @param object An object of class `model_fit` #' @param new_data A rectangular data object, such as a data frame. #' @param type A single character value or `NULL`. Possible values #' are "numeric", "class", "prob", "conf_int", "pred_int", "quantile", -#' or "raw". When `NULL`, `predict` will choose an appropriate value +#' or "raw". When `NULL`, `predict()` will choose an appropriate value #' based on the model's mode. #' @param opts A list of optional arguments to the underlying #' predict function that will be used when `type = "raw"`. The @@ -17,11 +17,11 @@ #' @param ... Ignored. To pass arguments to pass to the underlying #' function when `predict.model_fit(type = "raw")`, #' use the `opts` argument. -#' @details If "type" is not supplied to `predict`, then a choice +#' @details If "type" is not supplied to `predict()`, then a choice #' is made (`type = "numeric"` for regression models and #' `type = "class"` for classification). #' -#' `predict` is designed to provide a tidy result (see "Value" +#' `predict()` is designed to provide a tidy result (see "Value" #' section below) in a tibble output format. #' #' When using `type = "conf_int"` and `type = "pred_int"`, the options @@ -29,7 +29,7 @@ #' extra column of standard error values (if available). #' #' @return With the exception of `type = "raw"`, the results of -#' `predict.model_fit` will be a tibble as many rows in the output +#' `predict.model_fit()` will be a tibble as many rows in the output #' as there are rows in `new_data` and the column names will be #' predictable. #' @@ -49,8 +49,8 @@ #' a list-column. Each list element contains a tibble with columns #' `.pred` and `.quantile` (and perhaps other columns). #' -#' Using `type = "raw"` with `predict.model_fit` (or using -#' `predict_raw`) will return the unadulterated results of the +#' Using `type = "raw"` with `predict.model_fit()` (or using +#' `predict_raw()`) will return the unadulterated results of the #' prediction function. #' #' In the case of Spark-based models, since table columns cannot @@ -131,10 +131,10 @@ check_pred_type <- function(object, type) { switch(object$spec$mode, regression = "numeric", classification = "class", - stop("Type should be 'regression' or 'classification'.", call. = FALSE)) + stop("`type` should be 'regression' or 'classification'.", call. = FALSE)) } if (!(type %in% pred_types)) - stop("'type' should be one of: ", + stop("`type` should be one of: ", glue_collapse(pred_types, sep = ", ", last = " and "), call. = FALSE) if (type == "numeric" & object$spec$mode != "regression") diff --git a/R/predict_class.R b/R/predict_class.R index c586b1039..b1cd24d8e 100644 --- a/R/predict_class.R +++ b/R/predict_class.R @@ -10,7 +10,7 @@ #' @export predict_class.model_fit <- function (object, new_data, ...) { if(object$spec$mode != "classification") - stop("`predict.model_fit` is for predicting factor outcomes.", + stop("`predict.model_fit()` is for predicting factor outcomes.", call. = FALSE) if (!any(names(object$spec$method) == "class")) diff --git a/R/predict_classprob.R b/R/predict_classprob.R index 8f2f79b8d..de816c190 100644 --- a/R/predict_classprob.R +++ b/R/predict_classprob.R @@ -7,7 +7,7 @@ #' @importFrom tibble as_tibble is_tibble tibble predict_classprob.model_fit <- function (object, new_data, ...) { if(object$spec$mode != "classification") - stop("`predict.model_fit` is for predicting factor outcomes.", + stop("`predict.model_fit()` is for predicting factor outcomes.", call. = FALSE) if (!any(names(object$spec$method) == "classprob")) diff --git a/R/predict_numeric.R b/R/predict_numeric.R index 69c5e7f84..054fc3eb5 100644 --- a/R/predict_numeric.R +++ b/R/predict_numeric.R @@ -7,8 +7,8 @@ predict_numeric.model_fit <- function (object, new_data, ...) { if (object$spec$mode != "regression") - stop("`predict_numeric` is for predicting numeric outcomes. ", - "Use `predict_class` or `predict_classprob` for ", + stop("`predict_numeric()` is for predicting numeric outcomes. ", + "Use `predict_class()` or `predict_classprob()` for ", "classification models.", call. = FALSE) if (!any(names(object$spec$method) == "numeric")) diff --git a/R/rand_forest.R b/R/rand_forest.R index 4dc26ea5d..3d131e936 100644 --- a/R/rand_forest.R +++ b/R/rand_forest.R @@ -2,7 +2,7 @@ #' General Interface for Random Forest Models #' -#' `rand_forest` is a way to generate a _specification_ of a model +#' `rand_forest()` is a way to generate a _specification_ of a model #' before fitting and allows the model to be created using #' different packages in R or via Spark. The main arguments for the #' model are: @@ -15,9 +15,9 @@ #' } #' These arguments are converted to their specific names at the #' time that the model is fit. Other options and argument can be -#' set using `set_engine`. If left to their defaults +#' set using `set_engine()`. If left to their defaults #' here (`NULL`), the values are taken from the underlying model -#' functions. If parameters need to be modified, `update` can be used +#' functions. If parameters need to be modified, `update()` can be used #' in lieu of recreating the object from scratch. #' #' @inheritParams boost_tree @@ -75,7 +75,7 @@ #' #' @note For models created using the spark engine, there are #' several differences to consider. First, only the formula -#' interface to via `fit` is available; using `fit_xy` will +#' interface to via `fit()` is available; using `fit_xy()` will #' generate an error. Second, the predictions will always be in a #' spark table format. The names will be the same as documented but #' without the dots. Third, there is no equivalent to factor diff --git a/R/surv_reg.R b/R/surv_reg.R index 65c86b416..05388cd67 100644 --- a/R/surv_reg.R +++ b/R/surv_reg.R @@ -1,6 +1,6 @@ #' General Interface for Parametric Survival Models #' -#' `surv_reg` is a way to generate a _specification_ of a model +#' `surv_reg()` is a way to generate a _specification_ of a model #' before fitting and allows the model to be created using #' R. The main argument for the #' model is: @@ -9,12 +9,12 @@ #' } #' This argument is converted to its specific names at the #' time that the model is fit. Other options and argument can be -#' set using `set_engine`. If left to its default +#' set using `set_engine()`. If left to its default #' here (`NULL`), the value is taken from the underlying model #' functions. #' #' The data given to the function are not saved and are only used -#' to determine the _mode_ of the model. For `surv_reg`,the +#' to determine the _mode_ of the model. For `surv_reg()`,the #' mode will always be "regression". #' #' Since survival models typically involve censoring (and require the use of @@ -31,7 +31,7 @@ #' @param dist A character string for the outcome distribution. "weibull" is #' the default. #' @details -#' For `surv_reg`, the mode will always be "regression". +#' For `surv_reg()`, the mode will always be "regression". #' #' The model can be created using the `fit()` function using the #' following _engines_: diff --git a/R/svm_poly.R b/R/svm_poly.R index adb6d043d..70aa6886e 100644 --- a/R/svm_poly.R +++ b/R/svm_poly.R @@ -1,6 +1,6 @@ #' General interface for polynomial support vector machines #' -#' `svm_poly` is a way to generate a _specification_ of a model +#' `svm_poly()` is a way to generate a _specification_ of a model #' before fitting and allows the model to be created using #' different packages in R or via Spark. The main arguments for the #' model are: @@ -14,9 +14,9 @@ #' } #' These arguments are converted to their specific names at the #' time that the model is fit. Other options and argument can be -#' set using `set_engine`. If left to their defaults +#' set using `set_engine()`. If left to their defaults #' here (`NULL`), the values are taken from the underlying model -#' functions. If parameters need to be modified, `update` can be used +#' functions. If parameters need to be modified, `update()` can be used #' in lieu of recreating the object from scratch. #' #' @inheritParams boost_tree diff --git a/R/svm_rbf.R b/R/svm_rbf.R index 7670dfc75..713279728 100644 --- a/R/svm_rbf.R +++ b/R/svm_rbf.R @@ -1,6 +1,6 @@ #' General interface for radial basis function support vector machines #' -#' `svm_rbf` is a way to generate a _specification_ of a model +#' `svm_rbf()` is a way to generate a _specification_ of a model #' before fitting and allows the model to be created using #' different packages in R or via Spark. The main arguments for the #' model are: @@ -14,9 +14,9 @@ #' } #' These arguments are converted to their specific names at the #' time that the model is fit. Other options and argument can be -#' set using `set_engine`. If left to their defaults +#' set using `set_engine()`. If left to their defaults #' here (`NULL`), the values are taken from the underlying model -#' functions. If parameters need to be modified, `update` can be used +#' functions. If parameters need to be modified, `update()` can be used #' in lieu of recreating the object from scratch. #' #' @inheritParams boost_tree diff --git a/R/translate.R b/R/translate.R index 2c342e360..bd2be1edb 100644 --- a/R/translate.R +++ b/R/translate.R @@ -1,15 +1,15 @@ #' Resolve a Model Specification for a Computational Engine #' -#' `translate` will translate a model specification into a code +#' `translate()` will translate a model specification into a code #' object that is specific to a particular engine (e.g. R package). #' It translates generic parameters to their counterparts. #' #' @param x A model specification. #' @param ... Not currently used. #' @details -#' `translate` produces a _template_ call that lacks the specific +#' `translate()` produces a _template_ call that lacks the specific #' argument values (such as `data`, etc). These are filled in once -#' `fit` is called with the specifics of the data for the model. +#' `fit()` is called with the specifics of the data for the model. #' The call may also include `varying` arguments if these are in #' the specification. #' diff --git a/man/boost_tree.Rd b/man/boost_tree.Rd index 6ad9a2bde..5f6789103 100644 --- a/man/boost_tree.Rd +++ b/man/boost_tree.Rd @@ -46,13 +46,13 @@ each iteration while \code{C5.0} samples once during traning.} \item{fresh}{A logical for whether the arguments should be modified in-place of or replaced wholesale.} -\item{...}{Not used for \code{update}.} +\item{...}{Not used for \code{update()}.} } \value{ An updated model specification. } \description{ -\code{boost_tree} is a way to generate a \emph{specification} of a model +\code{boost_tree()} is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using different packages in R or via Spark. The main arguments for the model are: @@ -72,14 +72,14 @@ to split further. } These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be -set using the \code{set_engine} function. If left to their defaults +set using the \code{set_engine()} function. If left to their defaults here (\code{NULL}), the values are taken from the underlying model -functions. If parameters need to be modified, \code{update} can be used +functions. If parameters need to be modified, \code{update()} can be used in lieu of recreating the object from scratch. } \details{ The data given to the function are not saved and are only used -to determine the \emph{mode} of the model. For \code{boost_tree}, the +to determine the \emph{mode} of the model. For \code{boost_tree()}, the possible modes are "regression" and "classification". The model can be created using the \code{fit()} function using the @@ -92,13 +92,13 @@ following \emph{engines}: \note{ For models created using the spark engine, there are several differences to consider. First, only the formula -interface to via \code{fit} is available; using \code{fit_xy} will +interface to via \code{fit()} is available; using \code{fit_xy()} will generate an error. Second, the predictions will always be in a spark table format. The names will be the same as documented but without the dots. Third, there is no equivalent to factor columns in spark tables so class predictions are returned as character columns. Fourth, to retain the model object for a new -R session (via \code{save}), the \code{model$fit} element of the \code{parsnip} +R session (via \code{save()}), the \code{model$fit} element of the \code{parsnip} object should be serialized via \code{ml_save(object$fit)} and separately saved to disk. In a new session, the object can be reloaded and reattached to the \code{parsnip} object. diff --git a/man/decision_tree.Rd b/man/decision_tree.Rd index f6e7d2937..165e9d59c 100644 --- a/man/decision_tree.Rd +++ b/man/decision_tree.Rd @@ -29,10 +29,10 @@ in a node that are required for the node to be split further.} \item{fresh}{A logical for whether the arguments should be modified in-place of or replaced wholesale.} -\item{...}{Not used for \code{update}.} +\item{...}{Not used for \code{update()}.} } \description{ -\code{decision_tree} is a way to generate a \emph{specification} of a model +\code{decision_tree()} is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using different packages in R or via Spark. The main arguments for the model are: @@ -46,9 +46,9 @@ that are required for the node to be split further. } These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be -set using \code{set_engine}. If left to their defaults +set using \code{set_engine()}. If left to their defaults here (\code{NULL}), the values are taken from the underlying model -functions. If parameters need to be modified, \code{update} can be used +functions. If parameters need to be modified, \code{update()} can be used in lieu of recreating the object from scratch. } \details{ @@ -68,13 +68,13 @@ machines. \note{ For models created using the spark engine, there are several differences to consider. First, only the formula -interface to via \code{fit} is available; using \code{fit_xy} will +interface to via \code{fit()} is available; using \code{fit_xy()} will generate an error. Second, the predictions will always be in a spark table format. The names will be the same as documented but without the dots. Third, there is no equivalent to factor columns in spark tables so class predictions are returned as character columns. Fourth, to retain the model object for a new -R session (via \code{save}), the \code{model$fit} element of the \code{parsnip} +R session (via \code{save()}), the \code{model$fit} element of the \code{parsnip} object should be serialized via \code{ml_save(object$fit)} and separately saved to disk. In a new session, the object can be reloaded and reattached to the \code{parsnip} object. diff --git a/man/fit.Rd b/man/fit.Rd index 54b1abcf9..c282a01f0 100644 --- a/man/fit.Rd +++ b/man/fit.Rd @@ -29,7 +29,7 @@ outcome(s), predictors, case weights, etc). Note: when needed, a \item{...}{Not currently used; values passed here will be ignored. Other options required to fit the model should be -passed using \code{set_engine}.} +passed using \code{set_engine()}.} \item{x}{A matrix or data frame of predictors.} @@ -53,22 +53,22 @@ The return value will also have a class related to the fitted model (e.g. \code{"_glm"}) before the base class of \code{"model_fit"}. } \description{ -\code{fit} and \code{fit_xy} take a model specification, translate the required +\code{fit()} and \code{fit_xy()} take a model specification, translate the required code by substituting arguments, and execute the model fit routine. } \details{ -\code{fit} and \code{fit_xy} substitute the current arguments in the model +\code{fit()} and \code{fit_xy()} substitute the current arguments in the model specification into the computational engine's code, checks them for validity, then fits the model using the data and the engine-specific code. Different model functions have different interfaces (e.g. formula or \code{x}/\code{y}) and these functions translate -between the interface used when \code{fit} or \code{fit_xy} were invoked and the one +between the interface used when \code{fit()} or \code{fit_xy()} were invoked and the one required by the underlying model. When possible, these functions attempt to avoid making copies of the data. For example, if the underlying model uses a formula and -\code{fit} is invoked, the original data are references +\code{fit()} is invoked, the original data are references when the model is fit. However, if the underlying model uses something else, such as \code{x}/\code{y}, the formula is evaluated and the data are converted to the required format. In this case, any @@ -76,7 +76,7 @@ calls in the resulting model objects reference the temporary objects used to fit the model. } \examples{ -# Although `glm` only has a formula interface, different +# Although `glm()` only has a formula interface, different # methods for specifying the model can be used library(dplyr) diff --git a/man/linear_reg.Rd b/man/linear_reg.Rd index 5c00e5f60..c4539fd37 100644 --- a/man/linear_reg.Rd +++ b/man/linear_reg.Rd @@ -30,10 +30,10 @@ L2 penalty (i.e. weight decay, or ridge regression) versus L1 \item{fresh}{A logical for whether the arguments should be modified in-place of or replaced wholesale.} -\item{...}{Not used for \code{update}.} +\item{...}{Not used for \code{update()}.} } \description{ -\code{linear_reg} is a way to generate a \emph{specification} of a model +\code{linear_reg()} is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using different packages in R, Stan, keras, or via Spark. The main arguments for the model are: @@ -45,14 +45,14 @@ the model. Note that this will be ignored for some engines. } These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be -set using \code{set_engine}. If left to their defaults +set using \code{set_engine()}. If left to their defaults here (\code{NULL}), the values are taken from the underlying model -functions. If parameters need to be modified, \code{update} can be used +functions. If parameters need to be modified, \code{update()} can be used in lieu of recreating the object from scratch. } \details{ The data given to the function are not saved and are only used -to determine the \emph{mode} of the model. For \code{linear_reg}, the +to determine the \emph{mode} of the model. For \code{linear_reg()}, the mode will always be "regression". The model can be created using the \code{fit()} function using the @@ -67,13 +67,13 @@ following \emph{engines}: \note{ For models created using the spark engine, there are several differences to consider. First, only the formula -interface to via \code{fit} is available; using \code{fit_xy} will +interface to via \code{fit()} is available; using \code{fit_xy()} will generate an error. Second, the predictions will always be in a spark table format. The names will be the same as documented but without the dots. Third, there is no equivalent to factor columns in spark tables so class predictions are returned as character columns. Fourth, to retain the model object for a new -R session (via \code{save}), the \code{model$fit} element of the \code{parsnip} +R session (via \code{save()}), the \code{model$fit} element of the \code{parsnip} object should be serialized via \code{ml_save(object$fit)} and separately saved to disk. In a new session, the object can be reloaded and reattached to the \code{parsnip} object. @@ -108,11 +108,11 @@ model, the template of the fit calls are: When using \code{glmnet} models, there is the option to pass multiple values (or no values) to the \code{penalty} argument. This can have an effect on the model object results. When using -the \code{predict} method in these cases, the return object type +the \code{predict()} method in these cases, the return object type depends on the value of \code{penalty}. If a single value is given, the results will be a simple numeric vector. When multiple values or no values for \code{penalty} are used in -\code{linear_reg}, the \code{predict} method will return a data frame with +\code{linear_reg()}, the \code{predict()} method will return a data frame with columns \code{values} and \code{lambda}. For prediction, the \code{stan} engine can compute posterior diff --git a/man/logistic_reg.Rd b/man/logistic_reg.Rd index 0a3416c0c..e7ac89673 100644 --- a/man/logistic_reg.Rd +++ b/man/logistic_reg.Rd @@ -30,10 +30,10 @@ L2 penalty (i.e. weight decay, or ridge regression) versus L1 \item{fresh}{A logical for whether the arguments should be modified in-place of or replaced wholesale.} -\item{...}{Not used for \code{update}.} +\item{...}{Not used for \code{update()}.} } \description{ -\code{logistic_reg} is a way to generate a \emph{specification} of a model +\code{logistic_reg()} is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using different packages in R, Stan, keras, or via Spark. The main arguments for the model are: @@ -45,13 +45,13 @@ the model. Note that this will be ignored for some engines. } These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be -set using \code{set_engine}. If left to their defaults +set using \code{set_engine()}. If left to their defaults here (\code{NULL}), the values are taken from the underlying model -functions. If parameters need to be modified, \code{update} can be used +functions. If parameters need to be modified, \code{update()} can be used in lieu of recreating the object from scratch. } \details{ -For \code{logistic_reg}, the mode will always be "classification". +For \code{logistic_reg()}, the mode will always be "classification". The model can be created using the \code{fit()} function using the following \emph{engines}: @@ -65,13 +65,13 @@ following \emph{engines}: \note{ For models created using the spark engine, there are several differences to consider. First, only the formula -interface to via \code{fit} is available; using \code{fit_xy} will +interface to via \code{fit()} is available; using \code{fit_xy()} will generate an error. Second, the predictions will always be in a spark table format. The names will be the same as documented but without the dots. Third, there is no equivalent to factor columns in spark tables so class predictions are returned as character columns. Fourth, to retain the model object for a new -R session (via \code{save}), the \code{model$fit} element of the \code{parsnip} +R session (via \code{save()}), the \code{model$fit} element of the \code{parsnip} object should be serialized via \code{ml_save(object$fit)} and separately saved to disk. In a new session, the object can be reloaded and reattached to the \code{parsnip} object. @@ -106,11 +106,11 @@ model, the template of the fit calls are: When using \code{glmnet} models, there is the option to pass multiple values (or no values) to the \code{penalty} argument. This can have an effect on the model object results. When using -the \code{predict} method in these cases, the return object type +the \code{predict()} method in these cases, the return object type depends on the value of \code{penalty}. If a single value is given, the results will be a simple numeric vector. When multiple values or no values for \code{penalty} are used in -\code{logistic_reg}, the \code{predict} method will return a data frame with +\code{logistic_reg()}, the \code{predict()} method will return a data frame with columns \code{values} and \code{lambda}. For prediction, the \code{stan} engine can compute posterior diff --git a/man/mars.Rd b/man/mars.Rd index 090e0b77f..0dd3fe908 100644 --- a/man/mars.Rd +++ b/man/mars.Rd @@ -28,10 +28,10 @@ final model, including the intercept.} \item{fresh}{A logical for whether the arguments should be modified in-place of or replaced wholesale.} -\item{...}{Not used for \code{update}.} +\item{...}{Not used for \code{update()}.} } \description{ -\code{mars} is a way to generate a \emph{specification} of a model before +\code{mars()} is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using R. The main arguments for the model are: @@ -46,9 +46,9 @@ in \code{?earth}. } These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be -set using \code{set_engine}. If left to their defaults +set using \code{set_engine()}. If left to their defaults here (\code{NULL}), the values are taken from the underlying model -functions. If parameters need to be modified, \code{update} can be used +functions. If parameters need to be modified, \code{update()} can be used in lieu of recreating the object from scratch. } \details{ diff --git a/man/mlp.Rd b/man/mlp.Rd index 11a32e343..70fdae041 100644 --- a/man/mlp.Rd +++ b/man/mlp.Rd @@ -38,10 +38,10 @@ function between the hidden and output layers is automatically set to either \item{fresh}{A logical for whether the arguments should be modified in-place of or replaced wholesale.} -\item{...}{Not used for \code{update}.} +\item{...}{Not used for \code{update()}.} } \description{ -\code{mlp}, for multilayer perceptron, is a way to generate a \emph{specification} of +\code{mlp()}, for multilayer perceptron, is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using different packages in R or via keras The main arguments for the model are: @@ -63,13 +63,13 @@ in lieu of recreating the object from scratch. \details{ These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be -set using \code{set_engine}. If left to their defaults +set using \code{set_engine()}. If left to their defaults here (see above), the values are taken from the underlying model functions. One exception is \code{hidden_units} when \code{nnet::nnet} is used; that function's \code{size} argument has no default so a value of 5 units will be used. Also, unless otherwise specified, the \code{linout} argument to -\code{nnet::nnet} will be set to \code{TRUE} when a regression model is created. -If parameters need to be modified, \code{update} can be used +\code{nnet::nnet()} will be set to \code{TRUE} when a regression model is created. +If parameters need to be modified, \code{update()} can be used in lieu of recreating the object from scratch. The model can be created using the \code{fit()} function using the diff --git a/man/multinom_reg.Rd b/man/multinom_reg.Rd index 0de9c5ee1..ebcc03f49 100644 --- a/man/multinom_reg.Rd +++ b/man/multinom_reg.Rd @@ -30,10 +30,10 @@ L2 penalty (i.e. weight decay, or ridge regression) versus L1 \item{fresh}{A logical for whether the arguments should be modified in-place of or replaced wholesale.} -\item{...}{Not used for \code{update}.} +\item{...}{Not used for \code{update()}.} } \description{ -\code{multinom_reg} is a way to generate a \emph{specification} of a model +\code{multinom_reg()} is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using different packages in R, keras, or Spark. The main arguments for the model are: @@ -45,13 +45,13 @@ the model. Note that this will be ignored for some engines. } These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be -set using \code{set_engine}. If left to their defaults +set using \code{set_engine()}. If left to their defaults here (\code{NULL}), the values are taken from the underlying model -functions. If parameters need to be modified, \code{update} can be used +functions. If parameters need to be modified, \code{update()} can be used in lieu of recreating the object from scratch. } \details{ -For \code{multinom_reg}, the mode will always be "classification". +For \code{multinom_reg()}, the mode will always be "classification". The model can be created using the \code{fit()} function using the following \emph{engines}: @@ -64,13 +64,13 @@ following \emph{engines}: \note{ For models created using the spark engine, there are several differences to consider. First, only the formula -interface to via \code{fit} is available; using \code{fit_xy} will +interface to via \code{fit()} is available; using \code{fit_xy()} will generate an error. Second, the predictions will always be in a spark table format. The names will be the same as documented but without the dots. Third, there is no equivalent to factor columns in spark tables so class predictions are returned as character columns. Fourth, to retain the model object for a new -R session (via \code{save}), the \code{model$fit} element of the \code{parsnip} +R session (via \code{save()}), the \code{model$fit} element of the \code{parsnip} object should be serialized via \code{ml_save(object$fit)} and separately saved to disk. In a new session, the object can be reloaded and reattached to the \code{parsnip} object. @@ -97,11 +97,11 @@ model, the template of the fit calls are: When using \code{glmnet} models, there is the option to pass multiple values (or no values) to the \code{penalty} argument. This can have an effect on the model object results. When using -the \code{predict} method in these cases, the return object type +the \code{predict()} method in these cases, the return object type depends on the value of \code{penalty}. If a single value is given, the results will be a simple numeric vector. When multiple values or no values for \code{penalty} are used in -\code{multinom_reg}, the \code{predict} method will return a data frame with +\code{multinom_reg()}, the \code{predict()} method will return a data frame with columns \code{values} and \code{lambda}. } diff --git a/man/nearest_neighbor.Rd b/man/nearest_neighbor.Rd index 2caf7b251..a2a596d86 100644 --- a/man/nearest_neighbor.Rd +++ b/man/nearest_neighbor.Rd @@ -39,7 +39,7 @@ and the Euclidean distance with \code{dist_power = 2}. } These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be -set using \code{set_engine}. If left to their defaults +set using \code{set_engine()}. If left to their defaults here (\code{NULL}), the values are taken from the underlying model functions. If parameters need to be modified, \code{update()} can be used in lieu of recreating the object from scratch. diff --git a/man/null_model.Rd b/man/null_model.Rd index d1b709214..b0930770b 100644 --- a/man/null_model.Rd +++ b/man/null_model.Rd @@ -12,7 +12,7 @@ Possible values for this model are "unknown", "regression", or "classification".} } \description{ -\code{null_model} is a way to generate a \emph{specification} of a model before +\code{null_model()} is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using R. It doens't have any main arguments. } diff --git a/man/nullmodel.Rd b/man/nullmodel.Rd index d13596d93..8efe1af67 100644 --- a/man/nullmodel.Rd +++ b/man/nullmodel.Rd @@ -33,7 +33,7 @@ the number of predictions to return)} classification)} } \value{ -The output of \code{nullmodel} is a list of class \code{nullmodel} +The output of \code{nullmodel()} is a list of class \code{nullmodel} with elements \item{call }{the function call} \item{value }{the mean of \code{y} or the most prevalent class} \item{levels }{when \code{y} is a factor, a vector of levels. \code{NULL} otherwise} \item{pct }{when \code{y} @@ -42,14 +42,15 @@ otherwise). The column for the most prevalent class has the proportion of the training samples with that class (the other columns are zero). } \item{n }{the number of elements in \code{y}} -\code{predict.nullmodel} returns a either a factor or numeric vector +\code{predict.nullmodel()} returns a either a factor or numeric vector depending on the class of \code{y}. All predictions are always the same. } \description{ -Fit a single mean or largest class model +Fit a single mean or largest class model. \code{nullmodel()} is the underlying +computational function for the \code{null_model()} specification. } \details{ -\code{nullmodel} emulates other model building functions, but returns the +\code{nullmodel()} emulates other model building functions, but returns the simplest model possible given a training set: a single mean for numeric outcomes and the most prevalent class for factor outcomes. When class probabilities are requested, the percentage of the training set samples with diff --git a/man/predict.model_fit.Rd b/man/predict.model_fit.Rd index 418102ae5..78999cbd0 100644 --- a/man/predict.model_fit.Rd +++ b/man/predict.model_fit.Rd @@ -20,7 +20,7 @@ predict_raw(object, ...) \item{type}{A single character value or \code{NULL}. Possible values are "numeric", "class", "prob", "conf_int", "pred_int", "quantile", -or "raw". When \code{NULL}, \code{predict} will choose an appropriate value +or "raw". When \code{NULL}, \code{predict()} will choose an appropriate value based on the model's mode.} \item{opts}{A list of optional arguments to the underlying @@ -34,7 +34,7 @@ use the \code{opts} argument.} } \value{ With the exception of \code{type = "raw"}, the results of -\code{predict.model_fit} will be a tibble as many rows in the output +\code{predict.model_fit()} will be a tibble as many rows in the output as there are rows in \code{new_data} and the column names will be predictable. @@ -54,8 +54,8 @@ Quantile predictions return a tibble with a column \code{.pred}, which is a list-column. Each list element contains a tibble with columns \code{.pred} and \code{.quantile} (and perhaps other columns). -Using \code{type = "raw"} with \code{predict.model_fit} (or using -\code{predict_raw}) will return the unadulterated results of the +Using \code{type = "raw"} with \code{predict.model_fit()} (or using +\code{predict_raw()}) will return the unadulterated results of the prediction function. In the case of Spark-based models, since table columns cannot @@ -65,15 +65,15 @@ type-specific prediction functions. } \description{ Apply a model to create different types of predictions. -\code{predict} can be used for all types of models and used the +\code{predict()} can be used for all types of models and used the "type" argument for more specificity. } \details{ -If "type" is not supplied to \code{predict}, then a choice +If "type" is not supplied to \code{predict()}, then a choice is made (\code{type = "numeric"} for regression models and \code{type = "class"} for classification). -\code{predict} is designed to provide a tidy result (see "Value" +\code{predict()} is designed to provide a tidy result (see "Value" section below) in a tibble output format. When using \code{type = "conf_int"} and \code{type = "pred_int"}, the options diff --git a/man/rand_forest.Rd b/man/rand_forest.Rd index c7f5ab20e..5d804b4fe 100644 --- a/man/rand_forest.Rd +++ b/man/rand_forest.Rd @@ -30,10 +30,10 @@ in a node that are required for the node to be split further.} \item{fresh}{A logical for whether the arguments should be modified in-place of or replaced wholesale.} -\item{...}{Not used for \code{update}.} +\item{...}{Not used for \code{update()}.} } \description{ -\code{rand_forest} is a way to generate a \emph{specification} of a model +\code{rand_forest()} is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using different packages in R or via Spark. The main arguments for the model are: @@ -46,9 +46,9 @@ that are required for the node to be split further. } These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be -set using \code{set_engine}. If left to their defaults +set using \code{set_engine()}. If left to their defaults here (\code{NULL}), the values are taken from the underlying model -functions. If parameters need to be modified, \code{update} can be used +functions. If parameters need to be modified, \code{update()} can be used in lieu of recreating the object from scratch. } \details{ @@ -62,7 +62,7 @@ following \emph{engines}: \note{ For models created using the spark engine, there are several differences to consider. First, only the formula -interface to via \code{fit} is available; using \code{fit_xy} will +interface to via \code{fit()} is available; using \code{fit_xy()} will generate an error. Second, the predictions will always be in a spark table format. The names will be the same as documented but without the dots. Third, there is no equivalent to factor diff --git a/man/set_args.Rd b/man/set_args.Rd index 0ea2d3656..57e59a60c 100644 --- a/man/set_args.Rd +++ b/man/set_args.Rd @@ -21,11 +21,11 @@ set_mode(object, mode) An updated model object. } \description{ -\code{set_args} can be used to modify the arguments of a model specification while -\code{set_mode} is used to change the model's mode. +\code{set_args()} can be used to modify the arguments of a model specification while +\code{set_mode()} is used to change the model's mode. } \details{ -\code{set_args} will replace existing values of the arguments. +\code{set_args()} will replace existing values of the arguments. } \examples{ rand_forest() diff --git a/man/set_engine.Rd b/man/set_engine.Rd index 754077732..889451351 100644 --- a/man/set_engine.Rd +++ b/man/set_engine.Rd @@ -20,7 +20,7 @@ engine. These are captured as quosures and can be \code{varying()}.} An updated model specification. } \description{ -\code{set_engine} is used to specify which package or system will be used +\code{set_engine()} is used to specify which package or system will be used to fit the model, along with any arguments specific to that software. } \examples{ diff --git a/man/surv_reg.Rd b/man/surv_reg.Rd index 095d771a8..8ff6dc582 100644 --- a/man/surv_reg.Rd +++ b/man/surv_reg.Rd @@ -21,10 +21,10 @@ the default.} \item{fresh}{A logical for whether the arguments should be modified in-place of or replaced wholesale.} -\item{...}{Not used for \code{update}.} +\item{...}{Not used for \code{update()}.} } \description{ -\code{surv_reg} is a way to generate a \emph{specification} of a model +\code{surv_reg()} is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using R. The main argument for the model is: @@ -33,7 +33,7 @@ model is: } This argument is converted to its specific names at the time that the model is fit. Other options and argument can be -set using \code{set_engine}. If left to its default +set using \code{set_engine()}. If left to its default here (\code{NULL}), the value is taken from the underlying model functions. @@ -42,7 +42,7 @@ in lieu of recreating the object from scratch. } \details{ The data given to the function are not saved and are only used -to determine the \emph{mode} of the model. For \code{surv_reg},the +to determine the \emph{mode} of the model. For \code{surv_reg()},the mode will always be "regression". Since survival models typically involve censoring (and require the use of @@ -53,7 +53,7 @@ Also, for the \code{flexsurv::flexsurvfit} engine, the typical \code{strata} function cannot be used. To achieve the same effect, the extra parameter roles can be used (as described above). -For \code{surv_reg}, the mode will always be "regression". +For \code{surv_reg()}, the mode will always be "regression". The model can be created using the \code{fit()} function using the following \emph{engines}: diff --git a/man/svm_poly.Rd b/man/svm_poly.Rd index 66a0f361d..0e431e849 100644 --- a/man/svm_poly.Rd +++ b/man/svm_poly.Rd @@ -31,10 +31,10 @@ loss function (regression only)} \item{fresh}{A logical for whether the arguments should be modified in-place of or replaced wholesale.} -\item{...}{Not used for \code{update}.} +\item{...}{Not used for \code{update()}.} } \description{ -\code{svm_poly} is a way to generate a \emph{specification} of a model +\code{svm_poly()} is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using different packages in R or via Spark. The main arguments for the model are: @@ -48,9 +48,9 @@ wrong side of the margin. } These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be -set using \code{set_engine}. If left to their defaults +set using \code{set_engine()}. If left to their defaults here (\code{NULL}), the values are taken from the underlying model -functions. If parameters need to be modified, \code{update} can be used +functions. If parameters need to be modified, \code{update()} can be used in lieu of recreating the object from scratch. } \details{ diff --git a/man/svm_rbf.Rd b/man/svm_rbf.Rd index 76fa58d61..1afe3a48b 100644 --- a/man/svm_rbf.Rd +++ b/man/svm_rbf.Rd @@ -29,10 +29,10 @@ loss function (regression only)} \item{fresh}{A logical for whether the arguments should be modified in-place of or replaced wholesale.} -\item{...}{Not used for \code{update}.} +\item{...}{Not used for \code{update()}.} } \description{ -\code{svm_rbf} is a way to generate a \emph{specification} of a model +\code{svm_rbf()} is a way to generate a \emph{specification} of a model before fitting and allows the model to be created using different packages in R or via Spark. The main arguments for the model are: @@ -46,9 +46,9 @@ function. } These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be -set using \code{set_engine}. If left to their defaults +set using \code{set_engine()}. If left to their defaults here (\code{NULL}), the values are taken from the underlying model -functions. If parameters need to be modified, \code{update} can be used +functions. If parameters need to be modified, \code{update()} can be used in lieu of recreating the object from scratch. } \details{ diff --git a/man/translate.Rd b/man/translate.Rd index 82442df66..bd53050ec 100644 --- a/man/translate.Rd +++ b/man/translate.Rd @@ -12,14 +12,14 @@ translate(x, ...) \item{...}{Not currently used.} } \description{ -\code{translate} will translate a model specification into a code +\code{translate()} will translate a model specification into a code object that is specific to a particular engine (e.g. R package). It translates generic parameters to their counterparts. } \details{ -\code{translate} produces a \emph{template} call that lacks the specific +\code{translate()} produces a \emph{template} call that lacks the specific argument values (such as \code{data}, etc). These are filled in once -\code{fit} is called with the specifics of the data for the model. +\code{fit()} is called with the specifics of the data for the model. The call may also include \code{varying} arguments if these are in the specification. diff --git a/tests/testthat/test_boost_tree_C50.R b/tests/testthat/test_boost_tree_C50.R index 9babd9335..81e30fd62 100644 --- a/tests/testthat/test_boost_tree_C50.R +++ b/tests/testthat/test_boost_tree_C50.R @@ -112,7 +112,7 @@ test_that('submodel prediction', { vars <- c("female", "tenure", "total_charges", "phone_service", "monthly_charges") class_fit <- boost_tree(trees = 20, mode = "classification") %>% - set_engine("C5.0", control = C5.0Control(earlyStopping = FALSE)) %>% + set_engine("C5.0", control = C50::C5.0Control(earlyStopping = FALSE)) %>% fit(churn ~ ., data = wa_churn[-(1:4), c("churn", vars)]) pred_class <- predict(class_fit$fit, wa_churn[1:4, vars], trials = 4, type = "prob")