New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance when reading a large number of smaller files #416

Closed
ghaarsma opened this Issue Jun 7, 2016 · 1 comment

Comments

Projects
None yet
3 participants
@ghaarsma
Contributor

ghaarsma commented Jun 7, 2016

This is not a bug, but we noticed a significant decrease in performance with readr between versions 0.1.0 and 0.2.2 when reading in a large set of smaller files. Turns out default_local is quite slow compared to the other fast readr functions.

Perhaps the below example can be captured somewhere in the documentation.

x <- paste0(paste0(1:1000,',',rep(letters,length=1000)),collapse = '\n')
# Version 0.1.0
t1 <- system.time(l <- lapply(rep(x,1000),FUN = read_lines))
#user  system elapsed
#0.19    0.03    0.22

That is fast!, now the same with readr 0.2.2

# Version 0.2.2
t2 <- system.time(l <- lapply(rep(x,1000),FUN = read_lines))
#user  system elapsed 
#8.67   19.06   27.86 

That is over 100 times slower. The way to fix this is by making a single call to default_local

t3 <- system.time({locale=default_locale();l <- lapply(rep(x,1000),FUN = read_lines,locale=locale)})
#user  system elapsed 
#0.17    0.01    0.19 

Back to the old readr 0.1.0 performance (perhaps even a hair faster) Nice!

@hadley

This comment has been minimized.

Member

hadley commented Jun 9, 2016

@jimhester this should just be a matter of memoising default_locale()

jimhester added a commit to jimhester/readr that referenced this issue Jun 15, 2016

jimhester added a commit to jimhester/readr that referenced this issue Jun 15, 2016

jimhester added a commit to jimhester/readr that referenced this issue Jun 15, 2016

@jimhester jimhester self-assigned this Jun 15, 2016

@jimhester jimhester added in progress and removed ready labels Jun 15, 2016

@jimhester jimhester removed the in progress label Jun 15, 2016

@lock lock bot locked and limited conversation to collaborators Sep 25, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.