A preprocessing engine to generate design matrices
Clone or download
topepo Merge pull request #225 from tidymodels/textrecipes-changes
Allow list columns when computing the number of complete records
Latest commit d72d576 Oct 15, 2018
Permalink
Failed to load latest commit information.
.github Mod CODEOWNERS to stop auto-request Jul 8, 2018
R changes for list columns produced by steps Oct 14, 2018
data additions for #216 with data set and updated docs Oct 13, 2018
docs additions for #216 with data set and updated docs Oct 13, 2018
man changes for list columns produced by steps Oct 14, 2018
pkgdown version bump and doc update for imminent CRAN release Jan 11, 2018
presentations Added a directory for presentations that is not kept in build May 6, 2017
tests changes for list columns produced by steps Oct 14, 2018
vignettes Merge pull request #215 from tidymodels/id_field Oct 5, 2018
.Rbuildignore no revdeps Oct 2, 2018
.gitignore changes for CRAN release Jun 3, 2018
.travis.yml cache some packages Oct 2, 2018
DESCRIPTION use generics package Oct 5, 2018
NAMESPACE changes for list columns produced by steps Oct 14, 2018
NEWS.md additions for #216 with data set and updated docs Oct 13, 2018
README.Rmd more changes to tidymodels org Jul 30, 2018
README.html more changes to tidymodels org Jul 30, 2018
README.md more changes to tidymodels org Jul 30, 2018
_pkgdown.yml additions for #216 with data set and updated docs Oct 13, 2018
codecov.yml Updated package and travis to (attempt to) use R >= 3.1 Sep 27, 2017
contributors.md step_relu add rather than replace, prefix arg, update contributors Jul 15, 2017
recipes.Rproj Moved package files to the top directory Feb 2, 2017
recipes_hex.png Hex sticker (by Dan Kuhn and colored by Greg Swinehart) Sep 24, 2017
recipes_hex_thumb.png Added thumbnail to readme Sep 24, 2017

README.md

Build Status Coverage status CRAN_Status_Badge Downloads

The recipes package is an alternative method for creating and preprocessing design matrices that can be used for modeling or visualization. From wikipedia:

In statistics, a design matrix (also known as regressor matrix or model matrix) is a matrix of values of explanatory variables of a set of objects, often denoted by X. Each row represents an individual object, with the successive columns corresponding to the variables and their specific values for that object.

While R already has long-standing methods for creating these matrices (e.g. formulas and model.matrix), there are some limitations to what the existing infrastructure can do.

The idea of the recipes package is to define a recipe or blueprint that can be used to sequentially define the encodings and preprocessing of the data (i.e. "feature engineering"). For example, to create a simple recipe containing only an outcome and predictors and have the predictors centered and scaled:

library(recipes)
library(mlbench)
data(Sonar)
sonar_rec <- recipe(Class ~ ., data = Sonar) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors())

To install it, use:

install.packages("recipes")

## for development version:
require("devtools")
install_github("tidymodels/recipes")