Closed
Description
This is not a bug, but we noticed a significant decrease in performance with readr between versions 0.1.0 and 0.2.2 when reading in a large set of smaller files. Turns out default_local
is quite slow compared to the other fast readr
functions.
Perhaps the below example can be captured somewhere in the documentation.
x <- paste0(paste0(1:1000,',',rep(letters,length=1000)),collapse = '\n')
# Version 0.1.0
t1 <- system.time(l <- lapply(rep(x,1000),FUN = read_lines))
#user system elapsed
#0.19 0.03 0.22
That is fast!, now the same with readr 0.2.2
# Version 0.2.2
t2 <- system.time(l <- lapply(rep(x,1000),FUN = read_lines))
#user system elapsed
#8.67 19.06 27.86
That is over 100 times slower. The way to fix this is by making a single call to default_local
t3 <- system.time({locale=default_locale();l <- lapply(rep(x,1000),FUN = read_lines,locale=locale)})
#user system elapsed
#0.17 0.01 0.19
Back to the old readr 0.1.0 performance (perhaps even a hair faster) Nice!