tidymodels · topepo · Feb 25, 2019 · Feb 25, 2019 · Feb 25, 2019 · Feb 25, 2019
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -29,7 +29,7 @@ Imports:
     tidyr,
     globals
 Roxygen: list(markdown = TRUE)
-RoxygenNote: 6.1.0.9000
+RoxygenNote: 6.1.1
 Suggests: 
     testthat,
     knitr,
@@ -39,4 +39,3 @@ Suggests:
     xgboost,
     covr,
     sparklyr
-
diff --git a/NEWS.md b/NEWS.md
@@ -1,5 +1,9 @@
 # parsnip 0.0.1.9000
 
+## New Features
+
+* A "null model" is now available that fits a predictor-free model (using the mean of the outcome for regression or the mode for classification).  
+
 ## Other Changes
 
 * `varying_args()` now has a `full` argument to control whether the full set
@@ -26,6 +30,8 @@ column names once (#107).
 * For multinomial regression using glmnet, `multi_predict()` now pulls the 
 correct default penalty (#108).
 
+
+
 # parsnip 0.0.1
 
 First CRAN release

diff --git a/R/README.md b/R/README.md
@@ -33,27 +33,27 @@ list(
 
 `func` describes the function call (instead of having it in open code). `protect` identifies the arguments that the user should _not_ be allowed to modify, and `defaults` is a list of values that should be set but the user _can_ override.
 
-To create the model fit call, the `protect` arguments are populated with the appropriate objects (usually from the data set), and `rlang::call2` is used to create a call that can be executed. The `translate` function can be used to show the call prototype if there is need to see it (or debugging). 
+To create the model fit call, the `protect` arguments are populated with the appropriate objects (usually from the data set), and `rlang::call2` is used to create a call that can be executed. The `translate()` function can be used to show the call prototype if there is need to see it (or debugging). 
 
 In the chunk above, the value of the `family` object is quoted (i.e., `expr(binomial)`). If this is not quotes, R will execute the value of the option when the package is compiled. In this case, the full function definition of the binomial family object will be embedded into the model call. Arguments are frequently quoted when making the call so that data objects or objects that don't exist when the package is compiled will not be embedded. (also see the enviromnets section below)
 
 Additional notes:
 
- * In cases where the model fit function is not a single function call, a wrapper function can be written to deal with this. See `parsnip::keras_mlp` and `parsnip::xgb_train`. this usually triggers package dependencies though. 
- * The `defaults` argument is not the only place to set defaults. The `translate` method for an model specification gets the last word on arguments. It is also a good place to deal with common argument errors and to make defaults based on the _mode_ of the model (e.g. classification or regression). 
+ * In cases where the model fit function is not a single function call, a wrapper function can be written to deal with this. See `parsnip::keras_mlp()` and `parsnip::xgb_train()`. this usually triggers package dependencies though. 
+ * The `defaults` argument is not the only place to set defaults. The `translate()` method for an model specification gets the last word on arguments. It is also a good place to deal with common argument errors and to make defaults based on the _mode_ of the model (e.g. classification or regression). 
  * Users can also pass in quoted arguments
 
 ## Environments
 
-One of the first things that the `fit` function does is to make a new environment and store the data set and associated objects. For example:
+One of the first things that the `fit()` function does is to make a new environment and store the data set and associated objects. For example:
 
 ```r
 eval_env <- rlang::env()
 eval_env$data <- data
 eval_env$formula <- formula
 ```    
 
-This is designed to avoid any issues when executing the call object on the data using `eval_tidy`. 
+This is designed to avoid any issues when executing the call object on the data using `eval_tidy()`. 
 
 Any quoted arguments (such as the `family` example given above) are evaluated in this environment just before the model call is evaluated. For a user passes in an argument that is `floor(nrow(data)/3)`, this will be evaluated at this time in the captured environment. 
 

diff --git a/R/arguments.R b/R/arguments.R
@@ -67,15 +67,15 @@ check_eng_args <- function(args, obj, core_args) {
 
 #' Change elements of a model specification
 #'
-#' `set_args` can be used to modify the arguments of a model specification while
-#'  `set_mode` is used to change the model's mode.
+#' `set_args()` can be used to modify the arguments of a model specification while
+#'  `set_mode()` is used to change the model's mode.
 #'
 #' @param object A model specification.
 #' @param ... One or more named model arguments.
 #' @param mode A character string for the model type (e.g. "classification" or
 #'  "regression")
 #' @return An updated model object.
-#' @details `set_args` will replace existing values of the arguments.
+#' @details `set_args()` will replace existing values of the arguments.
 #'
 #' @examples
 #' rand_forest()

diff --git a/R/boost_tree.R b/R/boost_tree.R
@@ -2,7 +2,7 @@
 
 #' General Interface for Boosted Trees
 #'
-#' `boost_tree` is a way to generate a _specification_ of a model
+#' `boost_tree()` is a way to generate a _specification_ of a model
 #'  before fitting and allows the model to be created using
 #'  different packages in R or via Spark. The main arguments for the
 #'  model are:
@@ -22,9 +22,9 @@
 #' }
 #' These arguments are converted to their specific names at the
 #'  time that the model is fit. Other options and argument can be
-#'  set using the  `set_engine` function. If left to their defaults
+#'  set using the  `set_engine()` function. If left to their defaults
 #'  here (`NULL`), the values are taken from the underlying model
-#'  functions.  If parameters need to be modified, `update` can be used
+#'  functions.  If parameters need to be modified, `update()` can be used
 #'  in lieu of recreating the object from scratch.
 #'
 #' @param mode A single character string for the type of model.
@@ -48,7 +48,7 @@
 #'  each iteration while `C5.0` samples once during traning.
 #' @details
 #' The data given to the function are not saved and are only used
-#'  to determine the _mode_ of the model. For `boost_tree`, the
+#'  to determine the _mode_ of the model. For `boost_tree()`, the
 #'  possible modes are "regression" and "classification".
 #'
 #' The model can be created using the `fit()` function using the
@@ -87,13 +87,13 @@
 #'
 #' @note For models created using the spark engine, there are
 #'  several differences to consider. First, only the formula
-#'  interface to via `fit` is available; using `fit_xy` will
+#'  interface to via `fit()` is available; using `fit_xy()` will
 #'  generate an error. Second, the predictions will always be in a
 #'  spark table format. The names will be the same as documented but
 #'  without the dots. Third, there is no equivalent to factor
 #'  columns in spark tables so class predictions are returned as
 #'  character columns. Fourth, to retain the model object for a new
-#'  R session (via `save`), the `model$fit` element of the `parsnip`
+#'  R session (via `save()`), the `model$fit` element of the `parsnip`
 #'  object should be serialized via `ml_save(object$fit)` and
 #'  separately saved to disk. In a new session, the object can be
 #'  reloaded and reattached to the `parsnip` object.
@@ -149,7 +149,7 @@ print.boost_tree <- function(x, ...) {
 #' @export
 #' @inheritParams boost_tree
 #' @param object A boosted tree model specification.
-#' @param ... Not used for `update`.
+#' @param ... Not used for `update()`.
 #' @param fresh A logical for whether the arguments should be
 #'  modified in-place of or replaced wholesale.
 #' @return An updated model specification.

diff --git a/R/convert_data.R b/R/convert_data.R
@@ -64,7 +64,7 @@ convert_form_to_xy_fit <-function(
 
   w <- as.vector(model.weights(mod_frame))
   if (!is.null(w) && !is.numeric(w))
-    stop("'weights' must be a numeric vector", call. = FALSE)
+    stop("`weights` must be a numeric vector", call. = FALSE)
 
   offset <- as.vector(model.offset(mod_frame))
   if (!is.null(offset)) {
@@ -219,7 +219,7 @@ convert_xy_to_form_fit <- function(x, y, weights = NULL, y_name = "..y") {
 
   if (!is.null(weights)) {
     if (!is.numeric(weights))
-      stop("'weights' must be a numeric vector", call. = FALSE)
+      stop("`weights` must be a numeric vector", call. = FALSE)
     if (length(weights) != nrow(x))
       stop("`weights` should have ", nrow(x), " elements", call. = FALSE)
   }

diff --git a/R/decision_tree.R b/R/decision_tree.R
@@ -2,7 +2,7 @@
 
 #' General Interface for Decision Tree Models
 #'
-#' `decision_tree` is a way to generate a _specification_ of a model
+#' `decision_tree()` is a way to generate a _specification_ of a model
 #'  before fitting and allows the model to be created using
 #'  different packages in R or via Spark. The main arguments for the
 #'  model are:
@@ -16,9 +16,9 @@
 #' }
 #' These arguments are converted to their specific names at the
 #'  time that the model is fit. Other options and argument can be
-#'  set using `set_engine`. If left to their defaults
+#'  set using `set_engine()`. If left to their defaults
 #'  here (`NULL`), the values are taken from the underlying model
-#'  functions. If parameters need to be modified, `update` can be used
+#'  functions. If parameters need to be modified, `update()` can be used
 #'  in lieu of recreating the object from scratch.
 #'
 #' @inheritParams boost_tree
@@ -72,13 +72,13 @@
 #'
 #' @note For models created using the spark engine, there are
 #'  several differences to consider. First, only the formula
-#'  interface to via `fit` is available; using `fit_xy` will
+#'  interface to via `fit()` is available; using `fit_xy()` will
 #'  generate an error. Second, the predictions will always be in a
 #'  spark table format. The names will be the same as documented but
 #'  without the dots. Third, there is no equivalent to factor
 #'  columns in spark tables so class predictions are returned as
 #'  character columns. Fourth, to retain the model object for a new
-#'  R session (via `save`), the `model$fit` element of the `parsnip`
+#'  R session (via `save()`), the `model$fit` element of the `parsnip`
 #'  object should be serialized via `ml_save(object$fit)` and
 #'  separately saved to disk. In a new session, the object can be
 #'  reloaded and reattached to the `parsnip` object.
@@ -112,7 +112,7 @@ decision_tree <-
 
 #' @export
 print.decision_tree <- function(x, ...) {
-  cat("Random Forest Model Specification (", x$mode, ")\n\n", sep = "")
+  cat("Decision Tree Model Specification (", x$mode, ")\n\n", sep = "")
   model_printer(x, ...)
 
   if(!is.null(x$method$fit$args)) {

diff --git a/R/engines.R b/R/engines.R
@@ -77,7 +77,7 @@ load_libs <- function(x, quiet, attach = FALSE) {
 
 #' Declare a computational engine and specific arguments
 #'
-#' `set_engine` is used to specify which package or system will be used
+#' `set_engine()` is used to specify which package or system will be used
 #'  to fit the model, along with any arguments specific to that software.
 #'
 #' @param object A model specification.

diff --git a/R/fit.R b/R/fit.R
@@ -5,7 +5,7 @@
 
 #' Fit a Model Specification to a Dataset
 #'
-#' `fit` and `fit_xy` take a model specification, translate the required
+#' `fit()` and `fit_xy()` take a model specification, translate the required
 #'  code by substituting arguments, and execute the model fit
 #'  routine.
 #'
@@ -22,25 +22,25 @@
 #'  `catch`. See [fit_control()].
 #' @param ... Not currently used; values passed here will be
 #'  ignored. Other options required to fit the model should be
-#'  passed using `set_engine`.
-#' @details  `fit` and `fit_xy` substitute the current arguments in the model
+#'  passed using `set_engine()`.
+#' @details  `fit()` and `fit_xy()` substitute the current arguments in the model
 #'  specification into the computational engine's code, checks them
 #'  for validity, then fits the model using the data and the
 #'  engine-specific code. Different model functions have different
 #'  interfaces (e.g. formula or `x`/`y`) and these functions translate
-#'  between the interface used when `fit` or `fit_xy` were invoked and the one
+#'  between the interface used when `fit()` or `fit_xy()` were invoked and the one
 #'  required by the underlying model.
 #'
 #' When possible, these functions attempt to avoid making copies of the
 #'  data. For example, if the underlying model uses a formula and
-#'  `fit` is invoked, the original data are references
+#'  `fit()` is invoked, the original data are references
 #'  when the model is fit. However, if the underlying model uses
 #'  something else, such as `x`/`y`, the formula is evaluated and
 #'  the data are converted to the required format. In this case, any
 #'  calls in the resulting model objects reference the temporary
 #'  objects used to fit the model.
 #' @examples
-#' # Although `glm` only has a formula interface, different
+#' # Although `glm()` only has a formula interface, different
 #' # methods for specifying the model can be used
 #'
 #' library(dplyr)
@@ -94,10 +94,10 @@ fit.model_spec <-
   ) {
     dots <- quos(...)
     if (any(names(dots) == "engine"))
-      stop("Use `set_engine` to supply the engine.", call. = FALSE)
+      stop("Use `set_engine()` to supply the engine.", call. = FALSE)
 
     if (all(c("x", "y") %in% names(dots)))
-      stop("`fit.model_spec` is for the formula methods. Use `fit_xy` instead.",
+      stop("`fit.model_spec()` is for the formula methods. Use `fit_xy()` instead.",
            call. = FALSE)
     cl <- match.call(expand.dots = TRUE)
     # Create an environment with the evaluated argument objects. This will be
@@ -111,7 +111,7 @@ fit.model_spec <-
 
     if (object$engine == "spark" && !inherits(eval_env$data, "tbl_spark"))
       stop(
-        "spark objects can only be used with the formula interface to `fit` ",
+        "spark objects can only be used with the formula interface to `fit()` ",
         "with a spark data object.", call. = FALSE
       )
 
@@ -178,7 +178,7 @@ fit_xy.model_spec <-
   ) {
     dots <- quos(...)
     if (any(names(dots) == "engine"))
-      stop("Use `set_engine` to supply the engine.", call. = FALSE)
+      stop("Use `set_engine()` to supply the engine.", call. = FALSE)
 
     cl <- match.call(expand.dots = TRUE)
     eval_env <- rlang::env()
@@ -188,7 +188,7 @@ fit_xy.model_spec <-
 
     if (object$engine == "spark")
       stop(
-        "spark objects can only be used with the formula interface to `fit` ",
+        "spark objects can only be used with the formula interface to `fit()` ",
         "with a spark data object.", call. = FALSE
       )
 
@@ -305,7 +305,7 @@ check_interface <- function(formula, data, cl, model) {
   inher(formula, "formula", cl)
   inher(data, c("data.frame", "tbl_spark"), cl)
 
-  # Determine the `fit` interface
+  # Determine the `fit()` interface
   form_interface <- !is.null(formula) & !is.null(data)
 
   if (form_interface)
@@ -322,10 +322,10 @@ check_xy_interface <- function(x, y, cl, model) {
 
   # rule out spark data sets that don't use the formula interface
   if (inherits(x, "tbl_spark") | inherits(y, "tbl_spark"))
-    stop("spark objects can only be used with the formula interface via `fit` ",
+    stop("spark objects can only be used with the formula interface via `fit()` ",
          "with a spark data object.", call. = FALSE)
 
-  # Determine the `fit` interface
+  # Determine the `fit()` interface
   matrix_interface <- !is.null(x) & !is.null(y) && is.matrix(x)
   df_interface <- !is.null(x) & !is.null(y) && is.data.frame(x)
 

diff --git a/R/fit_helpers.R b/R/fit_helpers.R
@@ -1,5 +1,5 @@
 # These functions are the go-betweens between parsnip::fit (or parsnip::fit_xy)
-# and the underlying model function (such as ranger::ranger). So if `fit_xy` is
+# and the underlying model function (such as ranger::ranger). So if `fit_xy()` is
 # used to fit a ranger model, there needs to be a conversion from x/y format
 # data to formula/data objects and so on.
 
@@ -66,7 +66,7 @@ form_form <-
 xy_xy <- function(object, env, control, target = "none", ...) {
 
   if (inherits(env$x, "tbl_spark") | inherits(env$y, "tbl_spark"))
-    stop("spark objects can only be used with the formula interface to `fit`",
+    stop("spark objects can only be used with the formula interface to `fit()`",
          call. = FALSE)
 
   object <- check_mode(object, levels(env$y))

diff --git a/R/linear_reg.R b/R/linear_reg.R
@@ -1,6 +1,6 @@
 #' General Interface for Linear Regression Models
 #'
-#' `linear_reg` is a way to generate a _specification_ of a model
+#' `linear_reg()` is a way to generate a _specification_ of a model
 #'  before fitting and allows the model to be created using
 #'  different packages in R, Stan, keras, or via Spark. The main
 #'  arguments for the model are:
@@ -12,9 +12,9 @@
 #' }
 #' These arguments are converted to their specific names at the
 #'  time that the model is fit. Other options and argument can be
-#'  set using `set_engine`. If left to their defaults
+#'  set using `set_engine()`. If left to their defaults
 #'  here (`NULL`), the values are taken from the underlying model
-#'  functions. If parameters need to be modified, `update` can be used
+#'  functions. If parameters need to be modified, `update()` can be used
 #'  in lieu of recreating the object from scratch.
 #' @inheritParams boost_tree
 #' @param mode A single character string for the type of model.
@@ -30,7 +30,7 @@
 #'  (the lasso) (`glmnet` and `spark` only).
 #' @details
 #' The data given to the function are not saved and are only used
-#'  to determine the _mode_ of the model. For `linear_reg`, the
+#'  to determine the _mode_ of the model. For `linear_reg()`, the
 #'  mode will always be "regression".
 #'
 #' The model can be created using the `fit()` function using the
@@ -71,11 +71,11 @@
 #' When using `glmnet` models, there is the option to pass
 #'  multiple values (or no values) to the `penalty` argument.
 #'  This can have an effect on the model object results. When using
-#'  the `predict` method in these cases, the return object type
+#'  the `predict()` method in these cases, the return object type
 #'  depends on the value of `penalty`. If a single value is
 #'  given, the results will be a simple numeric vector. When
 #'  multiple values or no values for `penalty` are used in
-#'  `linear_reg`, the `predict` method will return a data frame with
+#'  `linear_reg()`, the `predict()` method will return a data frame with
 #'  columns `values` and `lambda`.
 #'
 #' For prediction, the `stan` engine can compute posterior
@@ -87,13 +87,13 @@
 #'
 #' @note For models created using the spark engine, there are
 #'  several differences to consider. First, only the formula
-#'  interface to via `fit` is available; using `fit_xy` will
+#'  interface to via `fit()` is available; using `fit_xy()` will
 #'  generate an error. Second, the predictions will always be in a
 #'  spark table format. The names will be the same as documented but
 #'  without the dots. Third, there is no equivalent to factor
 #'  columns in spark tables so class predictions are returned as
 #'  character columns. Fourth, to retain the model object for a new
-#'  R session (via `save`), the `model$fit` element of the `parsnip`
+#'  R session (via `save()`), the `model$fit` element of the `parsnip`
 #'  object should be serialized via `ml_save(object$fit)` and
 #'  separately saved to disk. In a new session, the object can be
 #'  reloaded and reattached to the `parsnip` object.