diff --git a/R/predict.R b/R/predict.R index 72a0733d0..ab27c3450 100644 --- a/R/predict.R +++ b/R/predict.R @@ -4,45 +4,42 @@ #' `predict()` can be used for all types of models and uses the #' "type" argument for more specificity. #' -#' @param object An object of class `model_fit` +#' @param object An object of class `model_fit`. #' @param new_data A rectangular data object, such as a data frame. #' @param type A single character value or `NULL`. Possible values -#' are "numeric", "class", "prob", "conf_int", "pred_int", "quantile", "time", -#' "hazard", "survival", or "raw". When `NULL`, `predict()` will choose an -#' appropriate value based on the model's mode. +#' are `"numeric"`, `"class"`, `"prob"`, `"conf_int"`, `"pred_int"`, +#' `"quantile"`, `"time"`, `"hazard"`, `"survival"`, or `"raw"`. When `NULL`, +#' `predict()` will choose an appropriate value based on the model's mode. #' @param opts A list of optional arguments to the underlying #' predict function that will be used when `type = "raw"`. The #' list should not include options for the model object or the #' new data being predicted. -#' @param ... Arguments to the underlying model's prediction -#' function cannot be passed here (see `opts`). There are some -#' `parsnip` related options that can be passed, depending on the -#' value of `type`. Possible arguments are: +#' @param ... Additional `parsnip`-related options, depending on the +#' value of `type`. Arguments to the underlying model's prediction +#' function cannot be passed here (use the `opts` argument instead). +#' Possible arguments are: #' \itemize{ -#' \item `interval`: for `type`s of "survival" and "quantile", should +#' \item `interval`: for `type` equal to `"survival"` or `"quantile"`, should #' interval estimates be added, if available? Options are `"none"` #' and `"confidence"`. -#' \item `level`: for `type`s of "conf_int", "pred_int", and "survival" +#' \item `level`: for `type` equal to `"conf_int"`, `"pred_int"`, or `"survival"`, #' this is the parameter for the tail area of the intervals #' (e.g. confidence level for confidence intervals). -#' Default value is 0.95. -#' \item `std_error`: add the standard error of fit or prediction (on -#' the scale of the linear predictors) for `type`s of "conf_int" -#' and "pred_int". Default value is `FALSE`. -#' \item `quantile`: the quantile(s) for quantile regression -#' (not implemented yet) -#' \item `time`: the time(s) for hazard and survival probability estimates. +#' Default value is `0.95`. +#' \item `std_error`: for `type` equal to `"conf_int"` or `"pred_int"`, add +#' the standard error of fit or prediction (on the scale of the +#' linear predictors). Default value is `FALSE`. +#' \item `quantile`: for `type` equal to `quantile`, the quantiles of the +#' distribution. Default is `(1:9)/10`. +#' \item `time`: for `type` equal to `"survival"` or `"hazard"`, the +#' time points at which the survival probability or hazard is estimated. #' } -#' @details If "type" is not supplied to `predict()`, then a choice -#' is made: +#' @details For `type = NULL`, `predict()` uses #' #' * `type = "numeric"` for regression models, #' * `type = "class"` for classification, and #' * `type = "time"` for censored regression. #' -#' `predict()` is designed to provide a tidy result (see "Value" -#' section below) in a tibble output format. -#' #' ## Interval predictions #' #' When using `type = "conf_int"` and `type = "pred_int"`, the options @@ -58,37 +55,42 @@ #' have the opposite sign as what the underlying model's `predict()` method #' produces. Set `increasing = FALSE` to suppress this behavior. #' -#' @return With the exception of `type = "raw"`, the results of -#' `predict.model_fit()` will be a tibble as many rows in the output -#' as there are rows in `new_data` and the column names will be -#' predictable. +#' @return With the exception of `type = "raw"`, the result of +#' `predict.model_fit()` +#' +#' * is a tibble +#' * has as many rows as there are rows in `new_data` +#' * has standardized column names, see below: +#' +#' For `type = "numeric"`, the tibble has a `.pred` column for a single +#' outcome and `.pred_Yname` columns for a multivariate outcome. #' -#' For numeric results with a single outcome, the tibble will have -#' a `.pred` column and `.pred_Yname` for multivariate results. +#' For `type = "class"`, the tibble has a `.pred_class` column. #' -#' For hard class predictions, the column is named `.pred_class` -#' and, when `type = "prob"`, the columns are `.pred_classlevel`. +#' For `type = "prob"`, the tibble has `.pred_classlevel` columns. #' -#' `type = "conf_int"` and `type = "pred_int"` return tibbles with -#' columns `.pred_lower` and `.pred_upper` with an attribute for -#' the confidence level. In the case where intervals can be -#' produces for class probabilities (or other non-scalar outputs), -#' the columns will be named `.pred_lower_classlevel` and so on. +#' For `type = "conf_int"` and `type = "pred_int"`, the tibble has +#' `.pred_lower` and `.pred_upper` columns with an attribute for +#' the confidence level. In the case where intervals can be +#' produces for class probabilities (or other non-scalar outputs), +#' the columns are named `.pred_lower_classlevel` and so on. #' -#' Quantile predictions return a tibble with a column `.pred`, which is +#' For `type = "quantile"`, the tibble has a `.pred` column, which is #' a list-column. Each list element contains a tibble with columns #' `.pred` and `.quantile` (and perhaps other columns). #' -#' Using `type = "raw"` with `predict.model_fit()` will return -#' the unadulterated results of the prediction function. +#' For `type = "time"`, the tibble has a `.pred_time` column. #' -#' For censored regression: +#' For `type = "survival"`, the tibble has a `.pred` column, which is +#' a list-column. Each list element contains a tibble with columns +#' `.time` and `.pred_survival` (and perhaps other columns). +#' +#' For `type = "hazard"`, the tibble has a `.pred` column, which is +#' a list-column. Each list element contains a tibble with columns +#' `.time` and `.pred_hazard` (and perhaps other columns). #' -#' * `type = "time"` produces a column `.pred_time`. -#' * `type = "hazard"` results in a list column `.pred` containing tibbles -#' with a column `.pred_hazard`. -#' * `type = "survival"` results in a list column `.pred` containing tibbles -#' with a `.pred_survival` column. +#' Using `type = "raw"` with `predict.model_fit()` will return +#' the unadulterated results of the prediction function. #' #' In the case of Spark-based models, since table columns cannot #' contain dots, the same convention is used except 1) no dots diff --git a/man/bart-internal.Rd b/man/bart-internal.Rd index 690f715f7..2665bd380 100644 --- a/man/bart-internal.Rd +++ b/man/bart-internal.Rd @@ -20,9 +20,9 @@ dbart_predict_calc(obj, new_data, type, level = 0.95, std_err = FALSE) \item{level}{Confidence level.} \item{type}{A single character value or \code{NULL}. Possible values -are "numeric", "class", "prob", "conf_int", "pred_int", "quantile", "time", -"hazard", "survival", or "raw". When \code{NULL}, \code{predict()} will choose an -appropriate value based on the model's mode.} +are \code{"numeric"}, \code{"class"}, \code{"prob"}, \code{"conf_int"}, \code{"pred_int"}, +\code{"quantile"}, \code{"time"}, \code{"hazard"}, \code{"survival"}, or \code{"raw"}. When \code{NULL}, +\code{predict()} will choose an appropriate value based on the model's mode.} \item{std_err}{Attach column for standard error of prediction or not.} } diff --git a/man/other_predict.Rd b/man/other_predict.Rd index ee49f70d1..04442d2da 100644 --- a/man/other_predict.Rd +++ b/man/other_predict.Rd @@ -53,28 +53,29 @@ predict_survival(object, ...) predict_time(object, ...) } \arguments{ -\item{object}{An object of class \code{model_fit}} +\item{object}{An object of class \code{model_fit}.} \item{new_data}{A rectangular data object, such as a data frame.} -\item{...}{Arguments to the underlying model's prediction -function cannot be passed here (see \code{opts}). There are some -\code{parsnip} related options that can be passed, depending on the -value of \code{type}. Possible arguments are: +\item{...}{Additional \code{parsnip}-related options, depending on the +value of \code{type}. Arguments to the underlying model's prediction +function cannot be passed here (use the \code{opts} argument instead). +Possible arguments are: \itemize{ -\item \code{interval}: for \code{type}s of "survival" and "quantile", should +\item \code{interval}: for \code{type} equal to \code{"survival"} or \code{"quantile"}, should interval estimates be added, if available? Options are \code{"none"} and \code{"confidence"}. -\item \code{level}: for \code{type}s of "conf_int", "pred_int", and "survival" +\item \code{level}: for \code{type} equal to \code{"conf_int"}, \code{"pred_int"}, or \code{"survival"}, this is the parameter for the tail area of the intervals (e.g. confidence level for confidence intervals). -Default value is 0.95. -\item \code{std_error}: add the standard error of fit or prediction (on -the scale of the linear predictors) for \code{type}s of "conf_int" -and "pred_int". Default value is \code{FALSE}. -\item \code{quantile}: the quantile(s) for quantile regression -(not implemented yet) -\item \code{time}: the time(s) for hazard and survival probability estimates. +Default value is \code{0.95}. +\item \code{std_error}: for \code{type} equal to \code{"conf_int"} or \code{"pred_int"}, add +the standard error of fit or prediction (on the scale of the +linear predictors). Default value is \code{FALSE}. +\item \code{quantile}: for \code{type} equal to \code{quantile}, the quantiles of the +distribution. Default is \code{(1:9)/10}. +\item \code{time}: for \code{type} equal to \code{"survival"} or \code{"hazard"}, the +time points at which the survival probability or hazard is estimated. }} \item{level}{A single numeric value between zero and one for the diff --git a/man/predict.model_fit.Rd b/man/predict.model_fit.Rd index 4632bb205..df30a6f12 100644 --- a/man/predict.model_fit.Rd +++ b/man/predict.model_fit.Rd @@ -13,74 +13,80 @@ predict_raw(object, ...) } \arguments{ -\item{object}{An object of class \code{model_fit}} +\item{object}{An object of class \code{model_fit}.} \item{new_data}{A rectangular data object, such as a data frame.} \item{type}{A single character value or \code{NULL}. Possible values -are "numeric", "class", "prob", "conf_int", "pred_int", "quantile", "time", -"hazard", "survival", or "raw". When \code{NULL}, \code{predict()} will choose an -appropriate value based on the model's mode.} +are \code{"numeric"}, \code{"class"}, \code{"prob"}, \code{"conf_int"}, \code{"pred_int"}, +\code{"quantile"}, \code{"time"}, \code{"hazard"}, \code{"survival"}, or \code{"raw"}. When \code{NULL}, +\code{predict()} will choose an appropriate value based on the model's mode.} \item{opts}{A list of optional arguments to the underlying predict function that will be used when \code{type = "raw"}. The list should not include options for the model object or the new data being predicted.} -\item{...}{Arguments to the underlying model's prediction -function cannot be passed here (see \code{opts}). There are some -\code{parsnip} related options that can be passed, depending on the -value of \code{type}. Possible arguments are: +\item{...}{Additional \code{parsnip}-related options, depending on the +value of \code{type}. Arguments to the underlying model's prediction +function cannot be passed here (use the \code{opts} argument instead). +Possible arguments are: \itemize{ -\item \code{interval}: for \code{type}s of "survival" and "quantile", should +\item \code{interval}: for \code{type} equal to \code{"survival"} or \code{"quantile"}, should interval estimates be added, if available? Options are \code{"none"} and \code{"confidence"}. -\item \code{level}: for \code{type}s of "conf_int", "pred_int", and "survival" +\item \code{level}: for \code{type} equal to \code{"conf_int"}, \code{"pred_int"}, or \code{"survival"}, this is the parameter for the tail area of the intervals (e.g. confidence level for confidence intervals). -Default value is 0.95. -\item \code{std_error}: add the standard error of fit or prediction (on -the scale of the linear predictors) for \code{type}s of "conf_int" -and "pred_int". Default value is \code{FALSE}. -\item \code{quantile}: the quantile(s) for quantile regression -(not implemented yet) -\item \code{time}: the time(s) for hazard and survival probability estimates. +Default value is \code{0.95}. +\item \code{std_error}: for \code{type} equal to \code{"conf_int"} or \code{"pred_int"}, add +the standard error of fit or prediction (on the scale of the +linear predictors). Default value is \code{FALSE}. +\item \code{quantile}: for \code{type} equal to \code{quantile}, the quantiles of the +distribution. Default is \code{(1:9)/10}. +\item \code{time}: for \code{type} equal to \code{"survival"} or \code{"hazard"}, the +time points at which the survival probability or hazard is estimated. }} } \value{ -With the exception of \code{type = "raw"}, the results of -\code{predict.model_fit()} will be a tibble as many rows in the output -as there are rows in \code{new_data} and the column names will be -predictable. +With the exception of \code{type = "raw"}, the result of +\code{predict.model_fit()} +\itemize{ +\item is a tibble +\item has as many rows as there are rows in \code{new_data} +\item has standardized column names, see below: +} + +For \code{type = "numeric"}, the tibble has a \code{.pred} column for a single +outcome and \code{.pred_Yname} columns for a multivariate outcome. -For numeric results with a single outcome, the tibble will have -a \code{.pred} column and \code{.pred_Yname} for multivariate results. +For \code{type = "class"}, the tibble has a \code{.pred_class} column. -For hard class predictions, the column is named \code{.pred_class} -and, when \code{type = "prob"}, the columns are \code{.pred_classlevel}. +For \code{type = "prob"}, the tibble has \code{.pred_classlevel} columns. -\code{type = "conf_int"} and \code{type = "pred_int"} return tibbles with -columns \code{.pred_lower} and \code{.pred_upper} with an attribute for +For \code{type = "conf_int"} and \code{type = "pred_int"}, the tibble has +\code{.pred_lower} and \code{.pred_upper} columns with an attribute for the confidence level. In the case where intervals can be produces for class probabilities (or other non-scalar outputs), -the columns will be named \code{.pred_lower_classlevel} and so on. +the columns are named \code{.pred_lower_classlevel} and so on. -Quantile predictions return a tibble with a column \code{.pred}, which is +For \code{type = "quantile"}, the tibble has a \code{.pred} column, which is a list-column. Each list element contains a tibble with columns \code{.pred} and \code{.quantile} (and perhaps other columns). +For \code{type = "time"}, the tibble has a \code{.pred_time} column. + +For \code{type = "survival"}, the tibble has a \code{.pred} column, which is +a list-column. Each list element contains a tibble with columns +\code{.time} and \code{.pred_survival} (and perhaps other columns). + +For \code{type = "hazard"}, the tibble has a \code{.pred} column, which is +a list-column. Each list element contains a tibble with columns +\code{.time} and \code{.pred_hazard} (and perhaps other columns). + Using \code{type = "raw"} with \code{predict.model_fit()} will return the unadulterated results of the prediction function. -For censored regression: -\itemize{ -\item \code{type = "time"} produces a column \code{.pred_time}. -\item \code{type = "hazard"} results in a list column \code{.pred} containing tibbles -with a column \code{.pred_hazard}. -\item \code{type = "survival"} results in a list column \code{.pred} containing tibbles -with a \code{.pred_survival} column. -} - In the case of Spark-based models, since table columns cannot contain dots, the same convention is used except 1) no dots appear in names and 2) vectors are never returned but @@ -97,16 +103,12 @@ Apply a model to create different types of predictions. "type" argument for more specificity. } \details{ -If "type" is not supplied to \code{predict()}, then a choice -is made: +For \code{type = NULL}, \code{predict()} uses \itemize{ \item \code{type = "numeric"} for regression models, \item \code{type = "class"} for classification, and \item \code{type = "time"} for censored regression. } - -\code{predict()} is designed to provide a tidy result (see "Value" -section below) in a tibble output format. \subsection{Interval predictions}{ When using \code{type = "conf_int"} and \code{type = "pred_int"}, the options