Extract and separate are too slow #72
Comments
Yeah, same for I personally like the way stringr handles that specification with the |
@aaronwolen the dev version of stringr uses stringi, so once that rolls out we'll get stringi performance for free. That might be enough for this issue |
A bit of benchmarking suggests that library(tidyr)
library(stringi)
options(digits = 3)
x <- replicate(1e5, paste(sample(letters, 3), collapse = "-"))
df <- data_frame(x)
microbenchmark::microbenchmark(
separate = separate(df, x, c("x", "y", "z"), "-"),
regex = stri_split_regex(x, "-"),
regex_n = stri_split_regex(x, "-", nmax = 3),
fixed = stri_split_fixed(x, "-"),
times = 10
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> separate 106.6 109.6 122.6 112.5 120.0 208.5 10 c
#> regex 64.9 66.0 68.1 67.6 70.4 72.4 10 b
#> regex_n 65.1 65.3 67.6 66.7 68.9 73.3 10 b
#> fixed 37.5 37.8 39.8 38.1 38.6 53.9 10 a
microbenchmark::microbenchmark(
extract = extract(df, x, c("(x", "y", "z"), "(.)-(.)-(.)"),
regex = stri_match_first_regex(x, "(.)-(.)-(.)"),
times = 10
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> extract 1005.0 1073.7 1104 1092 1119 1209 10 b
#> regex 82.1 89.5 115 91 115 206 10 a |
Benchmark is now #> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> extract 86.7 90.6 93.5 95.1 96.2 98.3 10 b
#> regex 79.1 80.3 82.0 80.8 82.1 88.1 10 a |
And post be2eb95: #> Unit: milliseconds
#> expr min lq mean median uq max neval
#> separate 97.7 100.7 111.6 101.1 111.8 173.9 10
#> regex 68.5 69.7 72.4 71.4 74.1 80.1 10
#> regex_n 65.9 66.8 75.8 68.3 70.8 143.5 10
#> fixed 33.2 34.1 37.7 35.5 40.3 51.5 10 |
No description provided.
The text was updated successfully, but these errors were encountered: