Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,16 @@ sudo: true
warnings_are_errors: false

r:
- 3.1
- 3.2
- oldrel
- release
- devel

env:
- KERAS_BACKEND="tensorflow"
global:
- MAKEFLAGS="-j 2"

r_binary_packages:
- rstan
Expand Down
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: parsnip
Version: 0.0.0.9003
Version: 0.0.0.9004
Title: A Common API to Modeling and analysis Functions
Description: A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. R, spark, stan, etc).
Authors@R: c(
Expand Down
10 changes: 5 additions & 5 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -58,12 +58,12 @@ S3method(varying_args,model_spec)
S3method(varying_args,recipe)
S3method(varying_args,step)
export("%>%")
export(.cols)
export(.dat)
export(.n_cols)
export(.n_facts)
export(.n_levs)
export(.n_obs)
export(.n_preds)
export(.facts)
export(.lvls)
export(.obs)
export(.preds)
export(.x)
export(.y)
export(C5.0_train)
Expand Down
8 changes: 7 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
# parsnip 0.0.0.9004

* Arguments to modeling functions are now captured as quosures.
* `others` has been replaced by `...`
* Data descriptor names have beemn changed and are now functions. The descriptor definitions for "cols" and "preds" have been switched.

# parsnip 0.0.0.9003

* `regularization` was changed to `penalty` in a few models to be consistent with [this change](tidymodels/model-implementation-principles@08d3afd).
* if a mode is not chosen in the model specification, it is assigned at the time of fit. [51](https://github.com/topepo/parsnip/issues/51)
* If a mode is not chosen in the model specification, it is assigned at the time of fit. [51](https://github.com/topepo/parsnip/issues/51)
* The underlying modeling packages now are loaded by namespace. There will be some exceptions noted in the documentation for each model. For example, in some `predict` methods, the `earth` package will need to be attached to be fully operational.

# parsnip 0.0.0.9002
Expand Down
150 changes: 75 additions & 75 deletions R/descriptors.R
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
#' @name descriptors
#' @aliases descriptors .n_obs .n_cols .n_preds .n_facts .n_levs .x .y .dat
#' @aliases descriptors .obs .cols .preds .facts .lvls .x .y .dat
#' @title Data Set Characteristics Available when Fitting Models
#' @description When using the `fit()` functions there are some
#' variables that will be available for use in arguments. For
#' example, if the user would like to choose an argument value
#' based on the current number of rows in a data set, the `.n_obs()`
#' based on the current number of rows in a data set, the `.obs()`
#' function can be used. See Details below.
#' @details
#' Existing functions:
#' \itemize{
#' \item `.n_obs()`: The current number of rows in the data set.
#' \item `.n_cols()`: The number of columns in the data set that are
#' \item `.obs()`: The current number of rows in the data set.
#' \item `.preds()`: The number of columns in the data set that are
#' associated with the predictors prior to dummy variable creation.
#' \item `.n_preds()`: The number of predictors after dummy variables
#' are created (if any).
#' \item `.n_facts()`: The number of factor predictors in the dat set.
#' \item `.n_levs()`: If the outcome is a factor, this is a table
#' \item `.cols()`: The number of predictor columns availible after dummy
#' variables are created (if any).
#' \item `.facts()`: The number of factor predictors in the dat set.
#' \item `.lvls()`: If the outcome is a factor, this is a table
#' with the counts for each level (and `NA` otherwise).
#' \item `.x()`: The predictors returned in the format given. Either a
#' data frame or a matrix.
Expand All @@ -29,26 +29,26 @@
#' For example, if you use the model formula `Sepal.Width ~ .` with the `iris`
#' data, the values would be
#' \preformatted{
#' .n_cols() = 4 (the 4 columns in `iris`)
#' .n_preds() = 5 (3 numeric columns + 2 from Species dummy variables)
#' .n_obs() = 150
#' .n_levs() = NA (no factor outcome)
#' .n_facts() = 1 (the Species predictor)
#' .y() = <vector> (Sepal.Width as a vector)
#' .x() = <data.frame> (The other 4 columns as a data frame)
#' .dat() = <data.frame> (The full data set)
#' .preds() = 4 (the 4 columns in `iris`)
#' .cols() = 5 (3 numeric columns + 2 from Species dummy variables)
#' .obs() = 150
#' .lvls() = NA (no factor outcome)
#' .facts() = 1 (the Species predictor)
#' .y() = <vector> (Sepal.Width as a vector)
#' .x() = <data.frame> (The other 4 columns as a data frame)
#' .dat() = <data.frame> (The full data set)
#' }
#'
#' If the formula `Species ~ .` where used:
#' \preformatted{
#' .n_cols() = 4 (the 4 numeric columns in `iris`)
#' .n_preds() = 4 (same)
#' .n_obs() = 150
#' .n_levs() = c(setosa = 50, versicolor = 50, virginica = 50)
#' .n_facts() = 0
#' .y() = <vector> (Species as a vector)
#' .x() = <data.frame> (The other 4 columns as a data frame)
#' .dat() = <data.frame> (The full data set)
#' .preds() = 4 (the 4 numeric columns in `iris`)
#' .cols() = 4 (same)
#' .obs() = 150
#' .lvls() = c(setosa = 50, versicolor = 50, virginica = 50)
#' .facts() = 0
#' .y() = <vector> (Species as a vector)
#' .x() = <data.frame> (The other 4 columns as a data frame)
#' .dat() = <data.frame> (The full data set)
#' }
#'
#' To use these in a model fit, pass them to a model specification.
Expand All @@ -60,7 +60,7 @@
#'
#' data("lending_club")
#'
#' rand_forest(mode = "classification", mtry = .n_cols() - 2)
#' rand_forest(mode = "classification", mtry = .cols() - 2)
#' }
#'
#' When no descriptors are found, the computation of the descriptor values
Expand All @@ -70,23 +70,23 @@ NULL

#' @export
#' @rdname descriptors
.n_cols <- function() descr_env$.n_cols()
.cols <- function() descr_env$.cols()

#' @export
#' @rdname descriptors
.n_preds <- function() descr_env$.n_preds()
.preds <- function() descr_env$.preds()

#' @export
#' @rdname descriptors
.n_obs <- function() descr_env$.n_obs()
.obs <- function() descr_env$.obs()

#' @export
#' @rdname descriptors
.n_levs <- function() descr_env$.n_levs()
.lvls <- function() descr_env$.lvls()

#' @export
#' @rdname descriptors
.n_facts <- function() descr_env$.n_facts()
.facts <- function() descr_env$.facts()

#' @export
#' @rdname descriptors
Expand Down Expand Up @@ -116,24 +116,24 @@ get_descr_df <- function(formula, data) {
tmp_dat <- convert_form_to_xy_fit(formula, data, indicators = FALSE)

if(is.factor(tmp_dat$y)) {
.n_levs <- function() {
.lvls <- function() {
table(tmp_dat$y, dnn = NULL)
}
} else .n_levs <- function() { NA }
} else .lvls <- function() { NA }

.n_cols <- function() {
.preds <- function() {
ncol(tmp_dat$x)
}

.n_preds <- function() {
.cols <- function() {
ncol(convert_form_to_xy_fit(formula, data, indicators = TRUE)$x)
}

.n_obs <- function() {
.obs <- function() {
nrow(data)
}

.n_facts <- function() {
.facts <- function() {
sum(vapply(tmp_dat$x, is.factor, logical(1)))
}

Expand All @@ -150,11 +150,11 @@ get_descr_df <- function(formula, data) {
}

list(
.n_cols = .n_cols,
.n_preds = .n_preds,
.n_obs = .n_obs,
.n_levs = .n_levs,
.n_facts = .n_facts,
.cols = .cols,
.preds = .preds,
.obs = .obs,
.lvls = .lvls,
.facts = .facts,
.dat = .dat,
.x = .x,
.y = .y
Expand Down Expand Up @@ -233,23 +233,23 @@ get_descr_spark <- function(formula, data) {

obs <- dplyr::tally(data) %>% dplyr::pull()

.n_cols <- function() length(f_term_labels)
.n_preds <- function() all_preds
.n_obs <- function() obs
.n_levs <- function() y_vals
.n_facts <- function() factor_pred
.cols <- function() all_preds
.preds <- function() length(f_term_labels)
.obs <- function() obs
.lvls <- function() y_vals
.facts <- function() factor_pred
.x <- function() abort("Descriptor `.x()` not defined for Spark.")
.y <- function() abort("Descriptor `.y()` not defined for Spark.")
.dat <- function() abort("Descriptor `.dat()` not defined for Spark.")

# still need .x(), .y(), .dat() ?

list(
.n_cols = .n_cols,
.n_preds = .n_preds,
.n_obs = .n_obs,
.n_levs = .n_levs,
.n_facts = .n_facts,
.cols = .cols,
.preds = .preds,
.obs = .obs,
.lvls = .lvls,
.facts = .facts,
.dat = .dat,
.x = .x,
.y = .y
Expand All @@ -258,25 +258,25 @@ get_descr_spark <- function(formula, data) {

get_descr_xy <- function(x, y) {

.n_levs <- if (is.factor(y)) {
.lvls <- if (is.factor(y)) {
function() table(y, dnn = NULL)
} else {
function() NA
}

.n_cols <- function() {
.cols <- function() {
ncol(x)
}

.n_preds <- function() {
.preds <- function() {
ncol(x)
}

.n_obs <- function() {
.obs <- function() {
nrow(x)
}

.n_facts <- function() {
.facts <- function() {
if(is.data.frame(x))
sum(vapply(x, is.factor, logical(1)))
else
Expand All @@ -296,11 +296,11 @@ get_descr_xy <- function(x, y) {
}

list(
.n_cols = .n_cols,
.n_preds = .n_preds,
.n_obs = .n_obs,
.n_levs = .n_levs,
.n_facts = .n_facts,
.cols = .cols,
.preds = .preds,
.obs = .obs,
.lvls = .lvls,
.facts = .facts,
.dat = .dat,
.x = .x,
.y = .y
Expand Down Expand Up @@ -363,11 +363,11 @@ has_any_descrs <- function(x) {
is_descr <- function(x) {

descrs <- list(
".n_cols",
".n_preds",
".n_obs",
".n_levs",
".n_facts",
".cols",
".preds",
".obs",
".lvls",
".facts",
".x",
".y",
".dat"
Expand All @@ -378,7 +378,7 @@ is_descr <- function(x) {

# Helpers for overwriting descriptors temporarily ------------------------------

# descrs = list of functions that actually eval to .n_cols()
# descrs = list of functions that actually eval to .cols()
poke_descrs <- function(descrs) {

descr_names <- names(descr_env)
Expand Down Expand Up @@ -414,13 +414,13 @@ scoped_descrs <- function(descrs, frame = caller_env()) {
# with their actual implementations
descr_env <- rlang::new_environment(
data = list(
.n_cols = function() abort("Descriptor context not set"),
.n_preds = function() abort("Descriptor context not set"),
.n_obs = function() abort("Descriptor context not set"),
.n_levs = function() abort("Descriptor context not set"),
.n_facts = function() abort("Descriptor context not set"),
.x = function() abort("Descriptor context not set"),
.y = function() abort("Descriptor context not set"),
.dat = function() abort("Descriptor context not set")
.cols = function() abort("Descriptor context not set"),
.preds = function() abort("Descriptor context not set"),
.obs = function() abort("Descriptor context not set"),
.lvls = function() abort("Descriptor context not set"),
.facts = function() abort("Descriptor context not set"),
.x = function() abort("Descriptor context not set"),
.y = function() abort("Descriptor context not set"),
.dat = function() abort("Descriptor context not set")
)
)
Loading