Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of vroom_format #377

Closed
matmu opened this issue Oct 4, 2021 · 6 comments
Closed

Performance of vroom_format #377

matmu opened this issue Oct 4, 2021 · 6 comments

Comments

@matmu
Copy link

matmu commented Oct 4, 2021

I want to convert a data frame to a delimited string with vroom::vroom_format (vroom_1.5.5). The example below vroom_format takes around 4 seconds to convert a data.frame with 35k rows and 400 columns of double values (~240MB) into a string. Is this a time one would expect? Given that the throughput of vroom is 1.23 GB/sec it feels quite slow.

library(vroom)

df = data.frame(replicate(400, runif(35000, min=0, max=100)))

system.time({
  res = vroom_format(df)
})
@jimhester
Copy link
Collaborator

vroom_format() is primarily intended for testing and debugging, the implementation uses only a single thread.

@matmu
Copy link
Author

matmu commented Oct 4, 2021

readr_2.0.2 uses vroom_format in format_delim (here)

@jimhester
Copy link
Collaborator

And readr::format_delim() as well is primarily used for testing and debugging.

@matmu
Copy link
Author

matmu commented Oct 4, 2021

@jimhester thanks for your feedback. I am confused now. I am implementing a REST API with plumber and the function plumber::serialize_tsv wraps readr::format_tsv as serialization function. I didn't expect that this is all based on a testing and debugging function. Do you know of any performant function for this purpose that is not only meant for testing and debugging?

@jimhester
Copy link
Collaborator

Thank you for the usage example, I refactored the vroom_format() code to use most of the same machinery as vroom_write(), so it is now multi-threaded.

As a result on my machine your example now takes a little less than a second to run, where before it was a little more than 3 seconds.

@matmu
Copy link
Author

matmu commented Oct 5, 2021

Cool, that's awesome! Thanks a lot for your prompt help. Will check it out later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants