-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue with group_by, factor and first under 0.8.0.1 #4295
Comments
I am also seeing a deterioration in performance with grouped data while using |
In addition to the change in speed there is a change in use of RAM as well. For large datasets a group_by and summarize call that used to run smoothly now maxes out RAM and refuses to run. Rolling back to 0.7.8 makes things work smoothly again. (running an 8 GB dataset on a server with 128 GB of RAM). Not sure if knowing there's a change in RAM usage will help in figuring out what's happening, but thought I should at least note it here in case it does. Thanks! |
What happens here is that the hybrid version of |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
I have encountered a performance issue with
dplyr 0.8.0.1
when applyingfirst()
to a factor variable in agroup_by()
setting with high-dimensional identifier. The operation is significantly faster when converting the factor variable to a character (seereprex
output). In contrast, the execution times of both approaches used to be quite similar underdplyr 0.7.x
.Created on 2019-03-19 by the reprex package (v0.2.1)
Session info
The text was updated successfully, but these errors were encountered: