-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
group_by .drop=TRUE on arrow table is leading to an error #5763
Comments
This feels like an cc @nealrichardson. |
nealrichardson
added a commit
to apache/arrow
that referenced
this issue
Feb 17, 2021
tidyverse/dplyr#5763 concisely exposes multiple issues: * We don't expect the .drop argument to group_by (fixed here) * You can apparently provide expressions to `...` in group_by, which effectively do `mutate()` to add the columns and then group by them (detected here with a useful error; implementation of the feature deferred to ARROW-11658) * Our input validation in `[.ArrowTabular` and the Table/RecordBatch `SelectColumns` methods was incomplete, and where it was present was not very helpful (fixed here) Other issues observed here and deferred: * Table has a proper SelectColumns method in the C++ library but the RecordBatch one is in the R library and should be pushed down to C++ (ARROW-11660) * The .drop argument is not actually implemented here, it is only caught if specified, and if the value given is other than the default, it errors. We should keep it around like we do the group_vars themselves (ARROW-11658), and we'll need to implement it too in the C++ query engine eventually. Closes #9510 from nealrichardson/group-by-drop Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
GeorgeAp
pushed a commit
to sirensolutions/arrow
that referenced
this issue
Jun 7, 2021
tidyverse/dplyr#5763 concisely exposes multiple issues: * We don't expect the .drop argument to group_by (fixed here) * You can apparently provide expressions to `...` in group_by, which effectively do `mutate()` to add the columns and then group by them (detected here with a useful error; implementation of the feature deferred to ARROW-11658) * Our input validation in `[.ArrowTabular` and the Table/RecordBatch `SelectColumns` methods was incomplete, and where it was present was not very helpful (fixed here) Other issues observed here and deferred: * Table has a proper SelectColumns method in the C++ library but the RecordBatch one is in the R library and should be pushed down to C++ (ARROW-11660) * The .drop argument is not actually implemented here, it is only caught if specified, and if the value given is other than the default, it errors. We should keep it around like we do the group_vars themselves (ARROW-11658), and we'll need to implement it too in the C++ query engine eventually. Closes apache#9510 from nealrichardson/group-by-drop Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
michalursa
pushed a commit
to michalursa/arrow
that referenced
this issue
Jun 13, 2021
tidyverse/dplyr#5763 concisely exposes multiple issues: * We don't expect the .drop argument to group_by (fixed here) * You can apparently provide expressions to `...` in group_by, which effectively do `mutate()` to add the columns and then group by them (detected here with a useful error; implementation of the feature deferred to ARROW-11658) * Our input validation in `[.ArrowTabular` and the Table/RecordBatch `SelectColumns` methods was incomplete, and where it was present was not very helpful (fixed here) Other issues observed here and deferred: * Table has a proper SelectColumns method in the C++ library but the RecordBatch one is in the R library and should be pushed down to C++ (ARROW-11660) * The .drop argument is not actually implemented here, it is only caught if specified, and if the value given is other than the default, it errors. We should keep it around like we do the group_vars themselves (ARROW-11658), and we'll need to implement it too in the C++ query engine eventually. Closes apache#9510 from nealrichardson/group-by-drop Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
group_by(..., .drop=TRUE)
on arrow table is leading to an errorThe text was updated successfully, but these errors were encountered: