Skip to content

large breaking changes

Updated Dec 4, 2019
  

For an eventual recipes 1.0.0 release, what would we like to change that would have major implications?

non-step steps

Updated Dec 9, 2017

So far, steps are defined as data transformation operations that

  • add, subtract, or modify variables in the data using statistical/mathematical transformations or the data
  • do not modify the number of rows
  • are executed both on the training set (during prep) and any new data sets (via bake)

There are a few scenarios where recipes could benefit from operations that are not steps per se:

  • checks on data characteristics. You might want a step that will stop operation if certain conditions are (or are not) met. Otherwise, the check can return the data unaltered.
  • dplyroperations: it would be helpful to be able to mutate, filter, or possible summarize the data. step_rm is basically dplyr::select.
  • another class of operations that might affect the rows of the training data. For example, down-sampling for class imbalances might remove rows during prep but should only be baked on the training set. It should not affect the new data processed by bake. Another procedure for imbalances, called SMOTE both down-samples the data and creates new instances from the existing data set.

fun_calls throws issues when the formula is long.

terms... It would be good to have something else that can take standard R formula with . and minus signs and

  • returns the expanded list of individual terms (as calls)
  • returns which ones were subtracted
  • works with simple formulas on the lhs (e.g. y1 + y2 + y2 ~ x without cbind)
You can’t perform that action at this time.