Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract() does not honor regex options #693

Closed
mixolydianpink opened this issue Jul 26, 2019 · 4 comments · Fixed by #1283
Closed

extract() does not honor regex options #693

mixolydianpink opened this issue Jul 26, 2019 · 4 comments · Fixed by #1283

Comments

@mixolydianpink
Copy link

extract() does not honor regex options. Here's a reprex:

# Example data.
df <- data.frame(x="A")
# Extract the first case-insensitive (English) letter.
out <- tidyr::extract(df,
  x,
  c("A"),
  stringr::regex("([a-z])", ignore_case = TRUE)
)

# Expect this to be "A"
out$A
#> [1] NA

I saw #223, but I think this is a different issue.

I'm really not familiar with R as a language or the tidyverse as a system so I'm not sure if this is intentional behavior, but I wasn't able to find documentation that extract() treats the regex as a simple character vector. The documentation for extract() refers to it only as "a regular expression."

The apparent issue is this line in extract.R:

  matches <- stringi::stri_match_first_regex(x, regex)[, -1, drop = FALSE]

I would expect it to pass the options of the regex to stri_match_first_regex using its opts_regex parameter.

As far as I can tell, tidyr doesn't import stringr at all. So extract()s lack of awareness of stringr's regex types could be an intentional design choice of the tidyverse. But that should still be better documented.

I attempted to write a fix with the assumption that use of stringr was allowed, but ran into the problem that stringr's method of getting options from a regex seems to be opts(), which is not exported. In any case, my changes were

# extract.R:57
# Doesn't work. `stringr::opts()` not available.
  matches <- stringi::stri_match_first_regex(x,
    regex,
    opts_regex = stringr::opts(regex)
  )[, -1, drop = FALSE]
# test-extract.R:48
test_that("honors regex options", {
  df <- data.frame(x="A")
  out <- df %>% extract(x, c("A"), stringr::regex("([a-z])", ignore_case = TRUE))
  expect_equal(out$A, "A")
})
@hadley hadley added documentation help wanted ❤️ we'd love your help! labels Sep 7, 2019
@hadley
Copy link
Member

hadley commented Sep 7, 2019

I think this is just a documentation issue; due to stringr's use of stringi (which has heavy compilation requirements) we unfortunately can't use stringr everywhere within the tidyverse.

namelessjon added a commit to namelessjon/tidyr that referenced this issue Oct 19, 2019
Fixes tidyverse#693

Note that we're using a stringi regular expression.  Also note it has to be a string, not something like a stringr::regex
@hadley
Copy link
Member

hadley commented Apr 22, 2020

My explanation was a bit incomplete since tidyr actually does use stringi — I don't think we want to commit to any specific regular expression engine in the docs, since we might want to change them out in the future.

@hadley

This comment has been minimized.

@hadley
Copy link
Member

hadley commented Dec 17, 2021

@DavisVaughan I think we can now commit to separate() continuing to use base regexps, so this probably just needs a small documentation update. If you wanted to be extra, in separate() and extract() we could also check that the pattern doesn't have class pattern (for now) or stringr_pattern (for after tidyverse/stringr#384)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants