Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step_holiday fails with NA values #743

Closed
MrFlick opened this issue Jul 5, 2021 · 1 comment · Fixed by #954
Closed

step_holiday fails with NA values #743

MrFlick opened this issue Jul 5, 2021 · 1 comment · Fixed by #954
Labels
bug an unexpected problem or unintended behavior

Comments

@MrFlick
Copy link

MrFlick commented Jul 5, 2021

The problem

The step_holiday function seems to return errors during baking/juicing in the presence of NA values. For example

Reproducible example

library(recipes)

dd <- data.frame(
  date = as.Date(c("2021-12-04", "2021-12-25", NA, "2021-12-27"))
)

recipe(~date, data=dd) %>% 
  step_holiday(date) %>% 
  prep() %>% 
  juice()

# Error in if (rng.nch[1] != rng.nch[2]) stop("'charvec' has non-NA entries of different number of characters") : 
#   missing value where TRUE/FALSE needed

Tested with recipes_0.1.16

Possible solution

The problem seems to be with the is_holiday function. It's passing a unique vector of years to the timeDate::holiday() function but that vector can include NA values which end up throwing the error the timeDate::holiday() code. It would be better to omit NA years (possibly checking to make sure there is at least one NA-year). It probably also makes sense to propagate the NA values in the returned indicator columns. One possible solution is

is_holiday <- function(hol, dt) {
  # ~~ add na.omit() here to drop NA values ~~
  hdate <- holiday(year = unique(year(na.omit(dt))), Holiday = hol)
  hdate <- as.Date(hdate)
  out <- rep(0, length(dt))
  out[dt %in% hdate] <- 1
  # ~~ carry forward missing values ~~
  out[is.na(dt)] <- NA
  out
}

This updated version of the function would return

# A tibble: 4 x 4
  date       date_LaborDay date_NewYearsDay date_ChristmasDay
  <date>             <dbl>            <dbl>             <dbl>
1 2021-12-04             0                0                 0
2 2021-12-25             0                0                 1
3 NA                    NA               NA                NA
4 2021-12-27             0                0                 0

If that seems like a reasonable solution, I could prepare a pull request.

@EmilHvitfeldt EmilHvitfeldt added the bug an unexpected problem or unintended behavior label Apr 14, 2022
@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators May 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants