recipes (development version)
Other Changes
-
prep()gained an option to print a summary of which columns were added and/or removed during execution. -
To reduce confusion between
bake()andjuice(), the latter is superseded in favor of usingbake(object, new_data = NULL). Thenew_dataargument now has no default, so aNULLvalue must be explicitly used in order to emulate the results ofjuice().juice()will remain in the package (and used internally) but most communication and training will usebake(object, new_data = NULL). (#543)
recipes 0.1.13
Breaking Changes
-
step_filter(),step_slice(),step_sample(), andstep_naomit()had their defaults forskipchanged toTRUE. In the vast majority of applications, these steps should not be applied to the test or assessment sets. -
tidyrversion 1.0.0 or later is now required.
Other Changes
-
step_pls()was changed so that it uses the Bioconductor mixOmics package. Objects created with previous versions ofrecipescan still usejuice()andbake(). With the current version, the categorical outcomes can be used but now multivariate models do not. Also, the new method allows for sparse results. -
As suggested by @StefanBRas,
step_ica()now defaults to the C engine (#518) -
Avoided partial matching on
seq()arguments in internal functions. -
Improved error messaging, for example when a user tries to
prep()a tuneable recipe. -
step_upsample()andstep_downsample()are soft deprecated in recipes as they are now available in the themis package. They will be removed in the next version. -
step_zv()now handlesNAvalues so that variables with zero variance plus are removed. -
The selectors
all_of()andany_of()can now be used in step selections (#477). -
The
tunepacakge can now use recipes withcheckoperations (but also requirestune>= 0.1.0.9000). -
The
tidymethod forstep_pca()now has an option for returning the variance statistics for each component.
recipes 0.1.12
- Some S3 methods were not being registered previously. This caused issues in R 4.0.
recipes 0.1.11
Other Changes
- While
recipesdoes not directly depend ondials, it has several S3 methods for generics indials. Version 0.0.5 ofdialsadded stricter validation for these methods, so changes were required forrecipes.
New Operations
step_cut()enables you to create a factor from a numeric based on provided break (contributed by Edwin Thoen)
recipes 0.1.10
Breaking Changes
- renamed
yj_trans()toyj_transform()to avoid conflicts.
Other Changes
-
Added flexible naming options for new columns created by
step_depth()andstep_classdist()(#262). -
Small changes for base R's
stringsAsFactorschange.
recipes 0.1.9
-
Delayed S3 method registration for
tune::tunable()methods that live in recipes will now work correctly on R >=4.0.0 (#439, tidymodels/tune#146). -
step_relevel()added.
recipes 0.1.8
Breaking Changes
-
The imputation steps do not change the data type being imputed now. Previously, if the data were integer, the data would be changed to numeric (for some step types). The change is breaking since the underlying data of imputed values are now saved as a list instead of a vector (for some step types).
-
The data sets were moved to the new
modeldatapackage. -
step_num2factor()was rewritten due to a bug that ignored the user-supplied levels (#425). The results of thetransformargument are now required to be a function andlevelsmust now be supplied.
Other Changes
-
Using a minus in the formula to
recipes()is no longer allowed (it didn't remove variables anyway).step_rm()orupdate_role()can be used instead. -
When using a selector that returns no columns,
juice()andbake()will now return a tibble with as many rows as the original template data or thenew_datarespectively. This is more consistent with how selectors work in dplyr (#411). -
Code was added to explicitly register
tunablemethods whenrecipesis loaded. This is required because of changes occurring in R 4.0. -
check_class()checks if a variable is of the designated class. Class is either learned from the train set or provided in the check. (contributed by Edwin Thoen) -
step_normalize()andstep_scale()gained afactorargument with values of 1 or 2 that can scale the standard deviations used to transform the data. (#380) -
bake()now produces a tibble with columns in the same order asjuice()(#365)
recipes 0.1.7
Release driven by changes in tidyr (v 1.0.0).
Breaking Changes
format_selector()'s wdth argument has been renamed to width
(#250).
New Operations
step_mutate_at(),step_rename(), andstep_rename_at()were added.
Other Changes
-
The use of
varying()will be deprecated in favor of an upcoming functiontune(). No changes are need in this version, but subsequent versions will work withtune(). -
format_ch_vec()andformat_selector()are now exported (#250). -
check_new_valuesbreaksbakeif variable contains values that were not observed in the train set (contributed by Edwin Thoen) -
When no outcomes are in the recipe, using
juice(object, all_outcomes()andbake(object, new_data, all_outcomes()will return a tibble with zero rows and zero columns (instead of failing). (#298). This will also occur when the selectors select no columns. -
As alternatives to
step_kpca(), two separate steps were added calledstep_kpca_rbf()andstep_kpca_poly(). The use ofstep_kpca()will print a deprecation message that it will be going away. -
step_nzv()andstep_poly()had arguments promoted out of theiroptionsslot.optionscan be used in the short term but is deprecated. -
step_downsample()will replace theratioargument withunder_ratioandstep_upsample()will replace it withover_ratio.ratiostill works (for now) but issues a deprecation message. -
step_discretize()has arguments moved out ofoptionstoo; the main arguments are nownum_breaks(instead ofcuts) andmin_unique. Again, deprecation messages are issued with the old argument structure. -
Models using the
dimRedpackage (step_kpca(),step_isomap(), andstep_nnmf()) would silently fail if the projection method failed. An error is issued now. -
Methods were added for a future generic called
tunable(). This outlines which parameters in a step can/could be tuned.
recipes 0.1.6
Release driven by changes in rlang.
Breaking Changes
-
Since 2018, a warning has been issued when the wrong argument was used in
bake(recipe, newdata). The depredation period is over andnew_datais officially required. -
Previously, if
step_other()did not collapse any levels, it would still add an "other" level to the factor. This would lump new factor levels into "other" when data were baked (asstep_novel()does). This no longer occurs since it was inconsistent with?step_other, which said that
"If no pooling is done the data are unmodified".
New Operations
step_normalize()centers and scales the data (if you are, like Max, too lazy to use two separate steps).step_unknown()will convert missing data in categorical columns to "unknown" and update factor levels.
Other Changes
-
If
thresholdargument ofstep_otheris greater than one then it specifies the minimum sample size before the levels of the factor are collapsed into the "other" category. #289 -
step_knnimpute()can now pass two options to the underlying knn code, including the number of threads (#323). -
Due to changes by CRAN,
step_nnmf()only works on versions of R >= 3.6.0 due to dependency issues. -
step_dummy()andstep_other()are now tolerant to cases where that step's selectors do not capture any columns. In this case, no modifications to the data are made. (#290, #348) -
step_dummy()can now retain the original columns that are used to make the dummy variables. (#328) -
step_other()'s print method only reports the variables with collapsed levels (as opposed to any column that was tested to see if it needed collapsing). (#338) -
step_pca(),step_kpca(),step_ica(),step_nnmf(),step_pls(), andstep_isomap()now accept zero components. In this case, the original data are returned.
recipes 0.1.5
Small release driven by changes in sample() in the current r-devel.
Other Changes
-
A new vignette discussing roles has been added.
-
To provide infrastructure for finalizing varying parameters, an
update()method for recipe steps has been added. This allows users to alter information in steps that have not yet been trained. -
step_interactwill no longer fail if an interaction contains an interaction using column that has been previously filtered from the data. A warning is issued when this happens and no interaction terms will be created. -
step_corrwas made more fault tolerant for cases where the data contain a zero-variance column or columns with missing values. -
Set the embedded environment to NULL in
prep.step_dummyto reduce the file size of serialized recipe class objects when usingsaveRDS.
Breaking Changes
- The
tidymethod forstep_dummynow returns the original variable and the levels of the future dummy variables.
Bug Fixes
- Updating the role of new columns generated by a recipe step no longer also updates
NAroles of existing columns (#296).
recipes 0.1.4
Breaking Changes
-
Several argument names were changed to be consistent with other
tidymodelspackages (e.g.dials) and the general tidyverse naming conventions.Kinstep_knnimputewas changed toneighbors.step_isomaphad the number of neighbors promoted to a main argument calledneighborsstep_pca,step_pls,step_kpca,step_icanow usenum_compinstead ofnum. ,step_isomapusesnum_termsinstead ofnum.step_bagimputemovednbaggout of the options and into a main argumenttrees.step_bsandstep_nshas degrees of freedom promoted to a main argument with namedeg_free. Also,step_bshaddegreepromoted to a main argument.step_BoxCoxandstep_YeoJohnsonhadnuniquechange tonum_unique.bake,juiceand other functions hasnewdatachanged tonew_data. For this version only, usingnewdatawill only result in a wanring.- Several steps had
na.rmchanged tona_rm. prepand a few steps hadstringsAsFactorschanged tostrings_as_factors.
-
add_role()can now only add new additional roles. To alter existing roles, useupdate_role(). This change also allows for the possibility of having multiple roles/types for one variable. #221 -
All steps gain an
idfield that will be used in the future to reference other steps. -
The
retainoption toprepis now defaulted toTRUE. Ifverbose = TRUE, the approximate size of the data set is printed. #207
New Operations
step_integerconverts data to ordered integers similar toLabelEncoder#123 and #185step_geodistcan be used to calculate the distance between geocodes and a single reference location.step_arrange,step_filter,step_mutate,step_sample, andstep_sliceimplement theirdplyranalogs.step_nnmfcomputes the non-negative matrix factorization for data.
Other Changes
- The
rsamplefunctionprepperwas moved torecipes(issue). - A number of packages were moved from "Imports" to "Suggests" to reduce the install footprint. A function was added to prompt the user to install the needed packages when the relevant steps are invoked.
step_step_string2factorwill now accept factors and leave them as-is.step_knnimputenow excludes missing data in the variable to be imputed from the nearest-neighbor calculation. This would have resulted in some missing data to not be imputed (i.e. return another missing value).step_dummynow produces a warning (instead of failing) when non-factor columns are selected. Only factor columns are used; no conversion is done for character data. issue #186dummy_namesgained a separator argument. issue #183step_downsampleandstep_upsamplenow haveseedarguments for more control over randomness.broomis no longer used to get thetidygeneric. These are now contained in thegenericspackage.- When a recipe is prepared, a running list of all columns is created and the last known use of each column is kept. This is to avoid bugs when a step that is skipped removes columns. issue #239
recipes 0.1.3
New Operations
-
check_rangebreaksbakeif variable range in new data is outside the range that was learned from the train set (contributed by Edwin Thoen) -
step_lagcan lag variables in the data set (contributed by Alex Hayes). -
step_naomitremoves rows with missing data for specific columns (contributed by Alex Hayes). -
step_rollimputecan be used to impute data in a sequence or series by estimating their values within a moving window. -
step_plscan conduct supervised feature extraction for predictors.
Other Changes
-
step_loggained anoffsetargument. -
step_loggained asignedargument (contributed by Edwin Thoen). -
The internal functions
sel2charandprinterhave been exported to enable other packages to contain steps. -
When training new steps after some steps have been previously trained, the
retain = TRUEoption should be set on previous invocations ofprep. -
For
step_dummy:- It can now compute the entire set of dummy variables per factor predictor using the
one_hot = TRUEoption. Thanks to Davis Vaughan. - The
contrastoption was removed. The step uses the global option for contrasts. - `The step also produces missing indicator variables when the original factor has a missing value
- It can now compute the entire set of dummy variables per factor predictor using the
-
step_otherwill now convert novel levels of the factor to the "other" level. -
step_bin2factornow has an option to choose how the values are translated to the levels (contributed by Michael Levy). -
bakeandjuicecan now export basic data frames. -
The
okcdata were updated with two additional columns.
Bug Fixes
-
issue 125 that prevented several steps from working with dplyr grouped data frames. (contributed by Jeffrey Arnold)
-
issue 127 where options to
step_discretizewere not being passed todiscretize.
recipes 0.1.2
General Changes
-
Edwin Thoen suggested adding validation checks for certain data characteristics. This fed into the existing notion of expanding
recipesbeyond steps (see the non-step steps project). A new set of operations, calledchecks, can now be used. These should throw an informative error when the check conditions are not met and return the existing data otherwise. -
Steps now have a
skipoption that will not apply preprocessing whenbakeis used. See the article on skipping steps for more information.
New Operations
-
check_missingwill validate that none of the specified variables contain missing data. -
detect_stepcan be used to check if a recipe contains a particular preprocessing operation. -
step_num2factorcan be used to convert numeric data (especially integers) to factors. -
step_noveladds a new factor level to nominal variables that will be used when new data contain a level that did not exist when the recipe was prepared. -
step_profilecan be used to generate design matrix grids for prediction profile plots of additive models where one variable is varied over a grid and all of the others are fixed at a single value. -
step_downsampleandstep_upsamplecan be used to change the number of rows in the data based on the frequency distributions of a factor variable in the training set. By default, this operation is only applied to the training set;bakeignores this operation. -
step_naomitdrops rows when specified columns containNA, similar totidyr::drop_na. -
step_lagallows for the creation of lagged predictor columns.
Other Changes
step_spatialsignnow has the option of removing missing data prior to computing the norm.
recipes 0.1.1
- The default selectors for
bakewas changed fromall_predictors()toeverything(). - The
verboseoption forprepis now defaulted toFALSE - A bug in
step_dummywas fixed that makes sure that the correct binary variables are generated despite the levels or values of the incoming factor. Also,step_dummynow requires factor inputs. step_dummyalso has a new default naming function that works better for factors. However, there is an extra argument (ordinal) now to the functions that can be passed tostep_dummy.step_interactnow allows for selectors (e.g.all_predictors()orstarts_with("prefix")to be used in the interaction formula.step_YeoJohnsongained anna.rmoption.dplyr::one_ofwas added to the list of selectors.step_bsadds B-spline basis functions.step_unorderconverts ordered factors to unordered factors.step_countcounts the number of instances that a pattern exists in a string.step_string2factorandstep_factor2stringcan be used to move between encodings.step_lowerimputeis for numeric data where the values cannot be measured below a specific value. For these cases, random uniform values are used for the truncated values.- A step to remove simple zero-variance variables was added (
step_zv). - A series of
tidymethods were added for recipes and many (but not all) steps. - In
bake.recipe, the argumentnewdatais now without a default. bakeandjuicecan now save the final processed data set in sparse format. Note that, as the steps are processed, a non-sparse data frame is used to store the results.- A formula method was added for recipes to get a formula with the outcome(s) and predictors based on the trained recipe.
recipes 0.1.0
First CRAN release.
- Changed
preparetoprepper issue #59
recipes 0.0.1.9003
- Two of the main functions changed names.
learnhas becomeprepareandprocesshas becomebake
recipes 0.0.1.9002
New steps
step_lincombremoves variables involved in linear combinations to resolve them.- A step for converting binary variables to factors (
step_bin2factor) step_regexapplies a regular expression to a character or factor vector to create dummy variables.
Other changes
step_dummyandstep_interactdo a better job of respecting missing values in the data set.
recipes 0.0.1.9001
- The class system for
recipeobjects was changed so that pipes can be used to create the recipe with a formula. process.recipelost theroleargument in factor of a general set of selectors. If no selector is used, all the predictors are returned.- Two steps for simple imputation using the mean or mode were added.