performance issue of inds_combine with many columns. Could be improve using std::string #56

ginolhac · 2018-03-21T15:44:07Z

Hello,

We noticed that dplyr::select() is slow when many columns are supplied. After investigations, @koncina and me, we found that this line Rcpp line might be the one to change:
https://github.com/tidyverse/tidyselect/blob/master/src/combine_variables.cpp#L28

The declaration of out_name as <String> and not std::string is slowing down the process.

Historically, it comes from a fix of an encoding issue by @romainfrancois here. Later on, @romainfrancois did a second fix but String remains. I must stress out that I know nothing about encoding, so keeping the String type could be mandatory.

After replacing L28 by std::vector<std::string> out_name; creating a function inds_combine_std, we obtained:

fake_columns <- paste0("col_", 1:50000)
system.time(v1 <- tidyselect:::inds_combine(fake_columns, list(-1)))
system.time(v2 <- inds_combine_std(fake_columns, list(-1)))
identical(v1, v2)

  user   system elapsed
11.548    0.003  11.556
  user   system elapsed
 0.007    0.000   0.007
 [1] TRUE

Of note, this line is also present in the current dplyr code:
https://github.com/tidyverse/dplyr/blob/master/src/combine_variables.cpp#L29

We replaced in dplyr v0.7.4 L29 with std::string and did some benchmark, see how it also improves:

Is there any (encoding) reason to keep the String type in both tidyselect and dplyr?

The text was updated successfully, but these errors were encountered:

lionel- · 2018-03-21T15:47:42Z

Interesting, thanks! cc @krlmlr

romainfrancois · 2018-03-21T15:56:56Z

🏊 back, did you check that your fix handles this issue tidyverse/dplyr#1023

Weird that std::string would beat Rcpp::String as String is designed to avoid making copies, etc ... or maybe it's a case when it's better to make a copy. I used to live under the assumptions that avoiding copies is best, but then perhaps it's a cache thing, and when you make a std::vector<std::string> the strings are close to each other. I don't know.

Thanks for reporting this.

lionel- · 2018-03-21T15:58:53Z

It may be caused by SEXP preservation, you have to allocate one SEXP node each time because the precious list is a linked list.

lionel- · 2018-03-21T15:59:32Z

well strings are pooled and implicitly protected so maybe not (haven't checked the Rcpp code)

Edit: Also std::string() allocates as well and even makes copies as Romain mentioned.

ginolhac · 2018-03-21T16:00:50Z

I have a mac and @koncina ubuntu, so we thought about testing it but we don't have the machine for testing

krlmlr · 2018-03-21T17:41:07Z

Thanks. On Ubuntu, we could try profiling the code with gprofiler and see where the time is burnt.

ginolhac · 2018-03-22T15:02:44Z

a small update, while @koncina is trying to use gprofiler. It turns out that the compiler is acting, but this is largely beyond my skills, so here are the facts.

combine_vars

using this rcpp example of the combine_vars() function using either std::string or String:

on ubuntu or windows (WM), the std::string is performing better
on mac (or VM with bsd): no gain!!

> system.time(a <- inds_combine(paste0("col_", 1:50000), list(-1)))
   user  system elapsed 
  0.061   0.003   0.065 

> system.time(b <- inds_combine_fix(paste0("col_", 1:50000), list(-1)))
   user  system elapsed 
  0.071   0.003   0.077 

> system.time(c <- inds_combine(paste0("col_", 1:50000), list(1:50000, -1)))
   user  system elapsed 
 11.306   0.124  11.822 

> system.time(d <- inds_combine_fix(paste0("col_", 1:50000), list(1:50000, -1)))
   user  system elapsed 
 11.184   0.099  11.470 

> identical(a, b, c, d)
[1] TRUE

simpler example

using this simpler cpp example

the 3 OS give similar results, std::string is faster

> system.time(a <- with_string(paste0("col_", 1:50000)))
   user  system elapsed 
 17.087   0.401  21.884 

> system.time(b <- with_stdstr(paste0("col_", 1:50000)))
   user  system elapsed 
  0.038   0.001   0.040 

> identical(a, b)
[1] TRUE

sorry to not be more helpful, but we cannot figure out the inconsistency of the OS used

romainfrancois · 2018-03-22T17:31:44Z

This is very helpful, especially with the simpler example outside of Rcpp. might be worth saving the results of 1:50000 outside of the system.time

I believe in both cases out_vec would be empty since when it is created tmp is empty.

CharacterVector out_vec(tmp.begin(), tmp.end());

Probably an Rcpp thing. Perhaps as I guessed the other day about cache locality. I don't know enough about the allocators used in std::vector<std::string> but if they do store their data contiguously, maybe that's the reason.

krlmlr · 2018-03-23T09:21:52Z

A run time difference of that magnitude suggests that some heavy copying happens, or nested loops.

romainfrancois · 2018-03-23T09:38:55Z

Probably in this constructor.
https://github.com/RcppCore/Rcpp/blob/43c24bc7dd7c771195b830e54528a3236ac7dccf/inst/include/Rcpp/String.h#L89

       /** from string proxy */
        String(const StringProxy& proxy): data(proxy.get()), valid(true), buffer_ready(false), enc(Rf_getCharCE(proxy.get())) {
            Rcpp_PreserveObject(data);
            RCPP_STRING_DEBUG("String(const StringProxy&)");
        }

with the global string cache, do individual strings CHARSXP have to be protected ? the preserve/release is not free.
is Rf_getCharCE costly ?

lionel- · 2018-03-23T09:58:07Z

with the global string cache, do individual strings CHARSXP have to be protected ? the preserve/release is not free.

I don't think so, once it's in the string pool it's there forever. But as long as it's not guaranteed by the R language "spec" we probably can't rely on it though.

Is the CHARSXP released right after being assigned to the vector or only at the end of the vector filling? I think the former should be more efficient.

romainfrancois · 2018-03-23T10:10:00Z

potentially to go into vector::push_back( const String& ) the String( StringProxy ) ctor is used and then the String copy constructor is used right after it, there's no move operations in Rcpp as far as I can see.

The copy might be elided with RVO by the compiler, but I'm not sure. If not then that means another preserver and release.

then all the Strings are held by the std::vector<String> and its destructor's job is to destruct all the strings, I'm not sure about the order of destruction though, but I guess we're looking at a loop calling R_ReleaseObject. That also means more opportunities for the gc to want to play.

lionel- · 2018-03-23T10:11:51Z

I'm not sure about the order of destruction though

Oh for R_ReleaseObject that's extremely important because there's a linear search through the precious linked list if the object you are releasing is not on top of the stack.

romainfrancois · 2018-03-23T10:16:47Z

in the meantime, we can either:

use std::string as suggested
develop some lighter String that does not do preserve/release. I don't entirely trust the string pool though.

but yeah paying for preserve/release on individual strings feels wrong

lionel- · 2018-03-23T10:17:32Z

Rewrite tidyselect in C ;)

koncina · 2018-03-23T10:20:13Z

To complete the simpler example posted earlier I created a function using two subfunctions to use either Rcpp::string or std::string.
The functions measures the time to perform the for loop at two different steps: just before the end of the subfunction and after returning from the subfunction (and you can allocate or not memory for the vector).

Here is a figure showing that the time loss might indeed be linked to a destructor (time is lost after exiting the function). Don't know if this observation is helping you.

lionel- · 2018-03-23T10:33:32Z

Thanks that's very helpful! That does suggest it's about the string preservation/release.

koncina · 2018-03-23T13:03:38Z

A small update to illustrate a previous remark that FreeBSD (or Mac OS) behaves differently. The following plot can be created using a FreeBSD (11.1) VM (and I presume on Mac OS):

I added the code to generate the plot in the gist.

Remark: The FreeBSD VM is running R 3.4.3 with Rcpp 0.12.16 while my Ubuntu setup is running R 3.4.4 with Rcpp 0.12.16. But using rocker to test R 3.4.3 on ubuntu does not change the observation.

ginolhac · 2018-03-23T13:37:05Z

thanks @koncina, I can confirm that on native macOS it behaves the same way

R version 3.4.4, system x86_64, darwin17.3.0, Rcpp 0.12.16

krlmlr · 2018-03-25T22:36:17Z

@koncina: Were you able to use gprofiler?

Destruction order of containers seems to be unspecified (https://stackoverflow.com/a/2083629/946850), this might explain different observations. We could add our own destructor to VarList to release in reverse order, how does that compare with using std::vector<String>?

koncina · 2018-03-29T13:56:50Z

@krlmlr: gprofiler identifies Rf_pmatch() as the limiting step (testing dplyr's select() with a large tibble or dplyr:::combine_vars() / tidyselect:::inds_combine() with a large character vector).

library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.4
## ✔ tidyr   0.8.0     ✔ stringr 1.3.0
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(gprofiler)

# Create a large tibble to test select:
large_tibble <- matrix(runif(5e6), nrow = 100) %>%
  as_tibble()

# Use gprofiler
out_file <- tempfile("gprofiler", fileext = ".out")
start_profiler(out_file)
## Temporary files: /tmp/RtmprlMAfr/gprofiler47921b65c136.prof, /tmp/RtmprlMAfr/gprofiler47924d1a0bde.out/gprofiler4ca329b84657.out
x <- select(large_tibble, -1)
stop_profiler()

summaryRprof(out_file)
## $by.self
##                                                                                          self.time self.pct total.time total.pct
## "Rf_pmatch"                                                                                  12.24    95.77      12.24     95.77
## "_Z12combine_varsN4Rcpp6VectorILi16ENS_15PreserveStorageEEENS_6ListOfINS0_ILi13ES1_EEEE"      0.46     3.60       0.48      3.76
## "FUN"                                                                                         0.06     0.47       0.08      0.63
## "<Anonymous>"                                                                                 0.02     0.16       0.02      0.16
## 
## $by.total
##                                                                                          total.time total.pct self.time self.pct
## "select_vars"                                                                                 12.78    100.00      0.00     0.00
## "select.data.frame"                                                                           12.78    100.00      0.00     0.00
## "select"                                                                                      12.78    100.00      0.00     0.00
## "combine_vars"                                                                                12.70     99.37      0.00     0.00
## "Rf_pmatch"                                                                                   12.24     95.77     12.24    95.77
## "_Z12combine_varsN4Rcpp6VectorILi16ENS_15PreserveStorageEEENS_6ListOfINS0_ILi13ES1_EEEE"       0.48      3.76      0.46     3.60
## "_dplyr_combine_vars"                                                                          0.48      3.76      0.00     0.00
## "Rf_NewFrameConfirm"                                                                           0.48      3.76      0.00     0.00
## "FUN"                                                                                          0.08      0.63      0.06     0.47
## "lapply"                                                                                       0.06      0.47      0.00     0.00
## "map_if"                                                                                       0.06      0.47      0.00     0.00
## "map"                                                                                          0.06      0.47      0.00     0.00
## "<Anonymous>"                                                                                  0.02      0.16      0.02     0.16
## "map_lgl"                                                                                      0.02      0.16      0.00     0.00
## "map_mold"                                                                                     0.02      0.16      0.00     0.00
## "R_ReleaseObject"                                                                              0.02      0.16      0.00     0.00
## "vapply"                                                                                       0.02      0.16      0.00     0.00
## 
## $sample.interval
## [1] 0.02
## 
## $sampling.time
## [1] 12.78

## R version 3.4.4 (2018-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 17.10
## 
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8     LC_MONETARY=fr_FR.UTF-8   
##  [6] LC_MESSAGES=fr_FR.UTF-8    LC_PAPER=fr_FR.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gprofiler_0.0-1 forcats_0.3.0   stringr_1.3.0   dplyr_0.7.4     purrr_0.2.4     readr_1.1.1     tidyr_0.8.0     tibble_1.4.2    ggplot2_2.2.1  
## [10] tidyverse_1.2.1
## 
## loaded via a namespace (and not attached):
##  [1] reshape2_1.4.3   haven_1.1.1      lattice_0.20-35  colorspace_1.3-2 htmltools_0.3.6  yaml_2.1.18      rlang_0.2.0      pillar_1.2.1    
##  [9] withr_2.1.2      foreign_0.8-69   glue_1.2.0       modelr_0.1.1     readxl_1.0.0     bindrcpp_0.2.2   bindr_0.1.1      plyr_1.8.4      
## [17] RProtoBuf_0.4.11 munsell_0.4.3    gtable_0.2.0     cellranger_1.1.0 rvest_0.3.2      psych_1.7.8      evaluate_0.10.1  knitr_1.20      
## [25] parallel_3.4.4   broom_0.4.3      Rcpp_0.12.16     scales_0.5.0     backports_1.1.2  jsonlite_1.5     mnormt_1.5-5     hms_0.4.2       
## [33] digest_0.6.15    pprof_1.0        stringi_1.1.7    grid_3.4.4       rprojroot_1.3-2  cli_1.0.0        tools_3.4.4      bitops_1.0-6    
## [41] magrittr_1.5     lazyeval_0.2.1   RCurl_1.95-4.10  crayon_1.3.4     pkgconfig_2.0.1  xml2_1.2.0       profile_1.0      lubridate_1.7.3 
## [49] assertthat_0.2.0 rmarkdown_1.9    httr_1.3.1       rstudioapi_0.7   R6_2.2.2         nlme_3.1-131.1   compiler_3.4.4

I had some issues when I started to use gprofiler as stop_profiler() randomly failed to combine the profiles ("sum(ds_pprof$samples$value) == sum(ds_rprof$samples$value) is not TRUE" error). Making a long story short: I ended up in testing gprofiler with a self-compiled R 3.4.4 (same sessionInfo and capabilities) but gprofiler shows up the RecursiveRelease() function instead of Rf_pmatch() (no Rf_ function is visible)...

@romainfrancois: I checked the code in the https://github.com/tidyverse/dplyr/issues/1023 issue on a windows 10 installation. After replacing String with std::string on line 29 in dplyr's combine_variables.cpp, the performances are improved and I'm not reproducing the encoding issue.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)

set.seed(123)
n <- 1e5
dat <- data_frame(
    Group = sample(c("m", "f"), size = n, replace=TRUE),
    Process = sample(c("å", "ä", "ö", "Å", "Ä", "Ö"), size = n, replace=TRUE)
)

dat %>% 
    group_by(Process) %>% 
    summarise(Number = n()) %>%
    spread(Process, Number, fill=0) %>% 
summarise_all(funs(sum))
## # A tibble: 1 x 6
##        ä      Ä      å      Å      ö      Ö
##    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 16685. 16713. 16692. 16586. 16621. 16703.

sessionInfo()
## R version 3.4.4 (2018-03-15)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 16299)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
## [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
## [5] LC_TIME=French_France.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] tidyr_0.8.0 dplyr_0.7.4
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16     knitr_1.20       bindr_0.1.1      magrittr_1.5    
##  [5] tidyselect_0.2.4 R6_2.2.2         rlang_0.2.0      stringr_1.3.0   
##  [9] tools_3.4.4      utf8_1.1.3       cli_1.0.0        htmltools_0.3.6 
## [13] yaml_2.1.18      assertthat_0.2.0 rprojroot_1.3-2  digest_0.6.15   
## [17] tibble_1.4.2     crayon_1.3.4     bindrcpp_0.2     purrr_0.2.4     
## [21] glue_1.2.0       evaluate_0.10.1  rmarkdown_1.9    stringi_1.1.7   
## [25] compiler_3.4.4   pillar_1.2.1     backports_1.1.2  pkgconfig_2.0.1

krlmlr · 2018-03-29T22:46:36Z

Thanks for the heads up, it would be interesting to see which test cases caused these random failures to combine profiles. Would you mind filing an issue at the gprofiler repo?

Maybe the home-compiled R version is confusing gprofiler. Again, that would be a problem we'd need to look at eventually.

In the session info I see that dplyr 0.7.4 is being measured. But tidyselect is used only in the development version of dplyr, not yet in 0.7.4. Maybe it's useful to look for weak performance using tidyselect functions only?

ginolhac · 2018-03-30T07:11:14Z

@krlmlr what @koncina changed in dplyr 0.7.4 is this line https://github.com/tidyverse/dplyr/blob/master/src/combine_variables.cpp#L29. Most of the tidyselect code is duplicated in the current version of dplyr.

krlmlr · 2018-03-30T20:08:43Z

This is good to know, but will this also lead to better performance when applied to tidyselect in the upcoming dplyr 0.7.5?

When using vars_select(), it seems that the size of vars is more crucial than the number of unquote-spliced (bang-bang-banged) arguments:

library(tidyselect)

test_vars <- function(m, n) {
  vars <- paste0("v", seq_len(n))
  sub_vars <- rlang::syms(sample(vars, m))
  system.time(vars_select(vars, !!!sub_vars))
}

test_vars(800, 16000)
#>    user  system elapsed 
#>   1.212   0.010   1.223
test_vars(1600, 16000)
#>    user  system elapsed 
#>   1.142   0.000   1.142
test_vars(3200, 16000)
#>    user  system elapsed 
#>    1.32    0.00    1.32
test_vars(6400, 16000)
#>    user  system elapsed 
#>   1.491   0.000   1.491
test_vars(1600, 32000)
#>    user  system elapsed 
#>   4.213   0.000   4.213
test_vars(3200, 32000)
#>    user  system elapsed 
#>   4.487   0.007   4.494
test_vars(6400, 32000)
#>    user  system elapsed 
#>   4.868   0.000   4.870
test_vars(12800, 32000)
#>    user  system elapsed 
#>   6.253   0.012   6.266

Created on 2018-03-30 by the reprex package (v0.2.0).

@koncina: Would you mind to gprofile a variant of this code?

koncina · 2018-03-31T13:49:20Z

I agree that we should switch to the development version and focus on tidyselect. The only drawback for now is that gprofiler fails to reveal the calls below the inds_combine() function. I was able to fix this for testing purposes by disabling the ByteCompile option in the DESCRIPTION file.

Another interesting observation is that performances in the development version are worse than in 0.7.4: In addition to the identification of inds_combine() / Rf_pmatch() as the limiting step, base::assign() is also showing up (using dplyr 0.7.4.9003 commit eafffad and tidyselect 0.2.4.9000 commit 72bedbb).

library(tidyverse)
library(gprofiler)
large_tibble <- matrix(runif(5e6), nrow = 100) %>%
  as_tibble()

out_file <- tempfile("gprofiler", fileext = ".out")
start_profiler(out_file)
x <- select(large_tibble, -1)
stop_profiler()

summaryRprof(out_file)[c("by.self", "sampling.time")]
## $by.self
##                                                                                          self.time self.pct total.time total.pct
## "Rf_pmatch"                                                                                  24.18    71.62      24.18     71.62
## "base::assign"                                                                                8.74    25.89       8.74     25.89
## "_Z12inds_combineN4Rcpp6VectorILi16ENS_15PreserveStorageEEENS_6ListOfINS0_ILi13ES1_EEEE"      0.24     0.71       0.52      1.54
## "_ZN7VarList6updateEiN4Rcpp6StringE"                                                          0.24     0.71       0.24      0.71
## "env_bind_impl"                                                                               0.08     0.24       9.08     26.90
## "get"                                                                                         0.08     0.24       0.10      0.30
## "<GC>"                                                                                        0.08     0.24       0.08      0.24
## "getExportedValue"                                                                            0.04     0.12       0.22      0.65
## "asNamespace"                                                                                 0.04     0.12       0.04      0.12
## "::"                                                                                          0.02     0.06       0.24      0.71
## "setdiff2"                                                                                    0.02     0.06       0.02      0.06
## 
## $sampling.time
## [1] 33.76

Fixing the code in inds_combine() (replacing String by std::string on line 28 in combine_variables.cpp) improves also the performances in the development version but base::assign() remains a limiting step as shown in this plot (as I disabled ByteCompile in tidyselect I also added a condition with ByteCompile enabled to show that it doesn't explain the drop in performance):

@krlmlr: The random gprofiler error I mentioned also occurs with the code used in my posts and using the CRAN R binary... I'm still trying to find more informations before opening an issue: sometimes the error occurs more frequently suggesting that something is interfering...
I will look into the code of your last post by next week.

krlmlr · 2018-05-26T20:52:17Z

I'm working on jointprof again and will take a look myself too.

lionel- · 2019-09-09T11:35:06Z

It looks like the old R implementation is faster than the C++ one.

Small variable set:

cn <- setNames(nm = c("x", "y", "z"))
bench::mark(tidysel = vars_select(cn, contains("x"), contains("y")))[1:8]


# R

#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 tidysel      1.34ms   1.43ms      679.    1.37KB     6.42   317     3


# C++ (std::string)

#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 tidysel      1.33ms   1.42ms      683.    3.86KB     6.44   318     3


# C++

#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 tidysel      1.33ms   1.42ms      683.    3.86KB     6.44   318     3

Large variable set:

fake_columns <- paste0("col_", 1:50000)

bench::mark(
  tidysel = vars_select(fake_columns, -1),
  base = fake_columns[-1],
  check = FALSE
)[1:8]


# R

#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 tidysel       1.04s    1.04s     0.958    20.7MB     8.62     1     9
#> 2 base       931.82µs   1.31ms   597.      976.8KB    13.7    304     7


# C++

#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 tidysel       1.74s    1.74s     0.574    12.8MB     5.16     1     9
#> 2 base       944.32µs   1.37ms   496.      976.8KB    12.0    248     6

lionel- · 2019-09-09T11:35:39Z

(This is with r-lib/rlang#828)

Closes r-lib#56

Closes r-lib#56 Closes r-lib#113

romainfrancois mentioned this issue Mar 27, 2018

n_distinct way slower than length(unique) tidyverse/dplyr#977

Closed

krlmlr mentioned this issue May 26, 2018

Use case: tidyselect performance problems r-prof/jointprof#23

Closed

lionel- added the bug an unexpected problem or unintended behavior label Sep 9, 2019

lionel- added a commit to lionel-/tidyselect that referenced this issue Sep 9, 2019

Rewrite inds_combine() in R

dd05ed2

Closes r-lib#56

lionel- added a commit to lionel-/tidyselect that referenced this issue Sep 9, 2019

Rewrite inds_combine() in R

c17e579

Closes r-lib#56 Closes r-lib#113

lionel- mentioned this issue Sep 9, 2019

Rewrite inds_combine() in R #114

Merged

lionel- added a commit to lionel-/tidyselect that referenced this issue Sep 9, 2019

Rewrite inds_combine() in R

3e9d021

Closes r-lib#56 Closes r-lib#113

lionel- closed this as completed in #114 Sep 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance issue of inds_combine with many columns. Could be improve using std::string #56

performance issue of inds_combine with many columns. Could be improve using std::string #56

ginolhac commented Mar 21, 2018 •

edited

Loading

lionel- commented Mar 21, 2018

romainfrancois commented Mar 21, 2018

lionel- commented Mar 21, 2018

lionel- commented Mar 21, 2018 •

edited

Loading

ginolhac commented Mar 21, 2018

krlmlr commented Mar 21, 2018

ginolhac commented Mar 22, 2018

romainfrancois commented Mar 22, 2018

krlmlr commented Mar 23, 2018

romainfrancois commented Mar 23, 2018

lionel- commented Mar 23, 2018

romainfrancois commented Mar 23, 2018

lionel- commented Mar 23, 2018

romainfrancois commented Mar 23, 2018

lionel- commented Mar 23, 2018

koncina commented Mar 23, 2018

lionel- commented Mar 23, 2018

koncina commented Mar 23, 2018

ginolhac commented Mar 23, 2018

krlmlr commented Mar 25, 2018

koncina commented Mar 29, 2018 •

edited

Loading

krlmlr commented Mar 29, 2018

ginolhac commented Mar 30, 2018

krlmlr commented Mar 30, 2018

koncina commented Mar 31, 2018

krlmlr commented May 26, 2018

lionel- commented Sep 9, 2019

lionel- commented Sep 9, 2019

performance issue of inds_combine with many columns. Could be improve using std::string #56

performance issue of inds_combine with many columns. Could be improve using std::string #56

Comments

ginolhac commented Mar 21, 2018 • edited Loading

lionel- commented Mar 21, 2018

romainfrancois commented Mar 21, 2018

lionel- commented Mar 21, 2018

lionel- commented Mar 21, 2018 • edited Loading

ginolhac commented Mar 21, 2018

krlmlr commented Mar 21, 2018

ginolhac commented Mar 22, 2018

combine_vars

simpler example

romainfrancois commented Mar 22, 2018

krlmlr commented Mar 23, 2018

romainfrancois commented Mar 23, 2018

lionel- commented Mar 23, 2018

romainfrancois commented Mar 23, 2018

lionel- commented Mar 23, 2018

romainfrancois commented Mar 23, 2018

lionel- commented Mar 23, 2018

koncina commented Mar 23, 2018

lionel- commented Mar 23, 2018

koncina commented Mar 23, 2018

ginolhac commented Mar 23, 2018

krlmlr commented Mar 25, 2018

koncina commented Mar 29, 2018 • edited Loading

krlmlr commented Mar 29, 2018

ginolhac commented Mar 30, 2018

krlmlr commented Mar 30, 2018

koncina commented Mar 31, 2018

krlmlr commented May 26, 2018

lionel- commented Sep 9, 2019

lionel- commented Sep 9, 2019

ginolhac commented Mar 21, 2018 •

edited

Loading

lionel- commented Mar 21, 2018 •

edited

Loading

koncina commented Mar 29, 2018 •

edited

Loading