Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,5 @@ tests/testthat/NMF*
^vignettes/articles$
^[\.]?air\.toml$
^[.]?air[.]toml$
^vignettes/\.quarto$
^vignettes/*_files$
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ revdep/library.noindex
revdep/data.sqlite
.httr-oauth
revdep/cloud.noindex/*
**/.quarto/
5 changes: 2 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,10 @@ Suggests:
ggplot2,
igraph,
kernlab,
knitr,
methods,
modeldata (>= 0.1.1),
parsnip (>= 1.2.0),
quarto,
RANN,
RcppRoll,
rmarkdown,
Expand All @@ -65,8 +65,7 @@ Suggests:
testthat (>= 3.0.0),
workflows,
xml2
VignetteBuilder:
knitr
VignetteBuilder: quarto
RdMacros:
lifecycle
Config/Needs/website: tidyverse/tidytemplate, rmarkdown
Expand Down
2 changes: 1 addition & 1 deletion man/step_kpca_poly.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/step_kpca_rbf.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions vignettes/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*.html
*.R
*_files
57 changes: 32 additions & 25 deletions vignettes/Dummies.Rmd → vignettes/Dummies.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,14 @@ description: |
This vignette describes different methods for encoding categorical
predictors, with special attention to interaction terms and contrasts.
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Handling categorical predictors}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{quarto::html}
%\VignetteEncoding{UTF-8}
knitr:
opts_chunk:
collapse: true
comment: '#>'
---

```{r}
Expand All @@ -18,7 +23,7 @@ knitr::opts_chunk$set(
digits = 3,
collapse = TRUE,
comment = "#>"
)
)
options(digits = 3)
library(recipes)
```
Expand All @@ -36,7 +41,7 @@ library(recipes)
# make a copy for use below
iris <- iris |> mutate(original = Species)

iris_rec <- recipe( ~ ., data = iris)
iris_rec <- recipe(~., data = iris)
summary(iris_rec)
```

Expand All @@ -46,8 +51,8 @@ The default approach is to create dummy variables using the "reference cell" par

```{r}
#| label: iris-ref-cell
ref_cell <-
iris_rec |>
ref_cell <-
iris_rec |>
step_dummy(Species) |>
prep(training = iris)
summary(ref_cell)
Expand All @@ -69,8 +74,8 @@ To get this encoding you can use the `contrasts` argument like so:, the global o
```{r}
#| label: iris-helmert
# now make dummy variables with new parameterization
helmert <-
iris_rec |>
helmert <-
iris_rec |>
step_dummy(Species, contrasts = "contr.helmert") |>
prep(training = iris)
summary(helmert)
Expand All @@ -90,9 +95,9 @@ Creating interactions with recipes requires the use of a model formula, such as

```{r}
#| label: iris-2int
iris_int <-
iris_int <-
iris_rec |>
step_interact( ~ Sepal.Width:Sepal.Length) |>
step_interact(~ Sepal.Width:Sepal.Length) |>
prep(training = iris)
summary(iris_int)
```
Expand All @@ -105,10 +110,10 @@ For example, if you were to use the standard formula interface, the creation of

```{r}
#| label: mm-int
model.matrix(~ Species*Sepal.Length, data = iris) |>
as.data.frame() |>
model.matrix(~ Species * Sepal.Length, data = iris) |>
as.data.frame() |>
# show a few specific rows
slice(c(1, 51, 101)) |>
slice(c(1, 51, 101)) |>
as.data.frame()
```

Expand All @@ -119,8 +124,10 @@ With recipes, you create them sequentially. This raises an issue: do I have to t
#| eval: false
# Must I do this?
iris_rec |>
step_interact( ~ Species_versicolor:Sepal.Length +
Species_virginica:Sepal.Length)
step_interact(
~ Species_versicolor:Sepal.Length +
Species_virginica:Sepal.Length
)
```

Not only is this a pain, but it may not be obvious what dummy variables are available (especially when [`step_other`](https://recipes.tidymodels.org/reference/step_other.html) is used).
Expand All @@ -129,10 +136,10 @@ The solution is to use a selector:

```{r}
#| label: iris-sel
iris_int <-
iris_rec |>
iris_int <-
iris_rec |>
step_dummy(Species) |>
step_interact( ~ starts_with("Species"):Sepal.Length) |>
step_interact(~ starts_with("Species"):Sepal.Length) |>
prep(training = iris)
summary(iris_int)
```
Expand Down Expand Up @@ -168,9 +175,9 @@ Would it work if I didn't convert species to a factor and used the interactions

```{r}
#| label: iris-dont
iris_int <-
iris_rec |>
step_interact( ~ Species:Sepal.Length) |>
iris_int <-
iris_rec |>
step_interact(~ Species:Sepal.Length) |>
prep(training = iris)
summary(iris_int)
```
Expand All @@ -188,7 +195,7 @@ There are models (e.g. `glmnet` and others) that can avoid this issue so you mig

```{r}
#| label: one-hot
iris_rec |>
iris_rec |>
step_dummy(Species, one_hot = TRUE) |>
prep(training = iris) |>
bake(original, new_data = NULL, starts_with("Species")) |>
Expand All @@ -203,17 +210,17 @@ This will give you the full set of indicators and, when you use the typical cont

```{r}
#| label: one-hot-two
hot_reference <-
iris_rec |>
hot_reference <-
iris_rec |>
step_dummy(Species, one_hot = TRUE) |>
prep(training = iris) |>
bake(original, new_data = NULL, starts_with("Species")) |>
distinct()

hot_reference

hot_helmert <-
iris_rec |>
hot_helmert <-
iris_rec |>
step_dummy(Species, one_hot = TRUE, contrasts = "contr.helmert") |>
prep(training = iris) |>
bake(original, new_data = NULL, starts_with("Species")) |>
Expand Down
7 changes: 6 additions & 1 deletion vignettes/Ordering.Rmd → vignettes/Ordering.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,14 @@ description: |
The order in which recipe steps are specified matters, and this vignette gives
some general suggestions that you should consider.
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Ordering of steps}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{quarto::html}
%\VignetteEncoding{UTF-8}
knitr:
opts_chunk:
collapse: true
comment: '#>'
---

In the recipes package, there are no constraints on the order in which steps are added to the recipe; you as a user are free to apply steps in the order appropriate to your data preprocessing needs. However, the **order of steps matters** and there are some general suggestions that you should consider.
Expand Down
7 changes: 6 additions & 1 deletion vignettes/Roles.Rmd → vignettes/Roles.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@ output: rmarkdown::html_vignette
description: |
In recipes, roles provide a way to select variables for different steps.
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Roles in recipes}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{quarto::html}
%\VignetteEncoding{UTF-8}
knitr:
opts_chunk:
collapse: true
comment: '#>'
---

```{r}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,14 @@ description: |
You can select which variables or features should be used in recipes. This
vignette goes over the basics of using selection functions.
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Selecting variables}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{quarto::html}
%\VignetteEncoding{UTF-8}
knitr:
opts_chunk:
collapse: true
comment: '#>'
---

```{r}
Expand Down
11 changes: 8 additions & 3 deletions vignettes/Skipping.Rmd → vignettes/Skipping.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,14 @@ description: |
However, in some situations we only want to only apply a step to the training
data and we want to skip that step on testing data.
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{On skipping steps}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{quarto::html}
%\VignetteEncoding{UTF-8}
knitr:
opts_chunk:
collapse: true
comment: '#>'
---

```{r}
Expand All @@ -19,7 +24,7 @@ knitr::opts_chunk$set(
digits = 3,
collapse = TRUE,
comment = "#>"
)
)
options(digits = 3)
library(recipes)
```
Expand Down Expand Up @@ -81,7 +86,7 @@ car_recipe <- recipe(mpg ~ ., data = mtcars) |>
prep(training = mtcars)

# These *should* produce the same results (as they do for `hp`)
bake(car_recipe, new_data = NULL) |> head() |> select(disp, hp)
bake(car_recipe, new_data = NULL) |> head() |> select(disp, hp)
bake(car_recipe, new_data = mtcars) |> head() |> select(disp, hp)
```

Expand Down
7 changes: 6 additions & 1 deletion vignettes/recipes.Rmd → vignettes/recipes.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,14 @@ description: |
Start here if this is your first time using recipes! You will learn about
basic usage, steps, selectors, and checks.
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Introduction to recipes}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{quarto::html}
%\VignetteEncoding{UTF-8}
knitr:
opts_chunk:
collapse: true
comment: '#>'
---

```{r}
Expand Down
Loading