Skip to content

Performance when reading a large number of smaller files #416

Closed
@ghaarsma

Description

@ghaarsma

This is not a bug, but we noticed a significant decrease in performance with readr between versions 0.1.0 and 0.2.2 when reading in a large set of smaller files. Turns out default_local is quite slow compared to the other fast readr functions.

Perhaps the below example can be captured somewhere in the documentation.

x <- paste0(paste0(1:1000,',',rep(letters,length=1000)),collapse = '\n')
# Version 0.1.0
t1 <- system.time(l <- lapply(rep(x,1000),FUN = read_lines))
#user  system elapsed
#0.19    0.03    0.22

That is fast!, now the same with readr 0.2.2

# Version 0.2.2
t2 <- system.time(l <- lapply(rep(x,1000),FUN = read_lines))
#user  system elapsed 
#8.67   19.06   27.86 

That is over 100 times slower. The way to fix this is by making a single call to default_local

t3 <- system.time({locale=default_locale();l <- lapply(rep(x,1000),FUN = read_lines,locale=locale)})
#user  system elapsed 
#0.17    0.01    0.19 

Back to the old readr 0.1.0 performance (perhaps even a hair faster) Nice!

Metadata

Metadata

Assignees

Labels

featurea feature request or enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions