Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify group context detection #6552

Merged
merged 9 commits into from Nov 21, 2022

Conversation

DavisVaughan
Copy link
Member

Closes #6534

We now detect whether or not we need to add group context by checking if the mask$get_current_group() value is 0 or not. If it is 0, we are either before or after the actual computation of the group results. Otherwise we must be inside some kind of group computation, or we have hardcoded the group using mask$set_current_group(), which we do in a few places.

The most important change was probably resetting the group index to 0 in DPLYR_MASK_FINALISE() so that it is 0 after the group computations too.

This allowed me to remove the special error wrapping in pick() and in the across() deprecation.

R/across.R Show resolved Hide resolved
Comment on lines -65 to -68
"dplyr:::error_incompatible_combine",
"dplyr:::mutate_mixed_null",
"dplyr:::mutate_constant_recycle_error",
"dplyr:::summarise_mixed_null",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 4 error classes are still needed to dispatch off for things like error bullet generation, but we no longer need to special case them here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

R/conditions.R Show resolved Hide resolved
R/pick.R Show resolved Hide resolved
Comment on lines 53 to 62
dplyr::stop_summarise_mixed_null();
const SEXP* p_chunks = VECTOR_PTR_RO(chunks);

for (R_xlen_t i = 0; i < ngroups; i++) {
if (p_chunks[i] == R_NilValue) {
// Find out the first time the group was `NULL`
// so that the error will be associated with this group
DPLYR_MASK_SET_GROUP(i);
dplyr::stop_summarise_mixed_null();
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the error path that occurs when you happen to return a NULL from one group and a non-NULL value in another group

In mutate() we already do something similar (report the group with the first problematic NULL), so this is just us being consistent with that now

Comment on lines 370 to +405
i In argument `..1 = (if_any())`.
i In group 1: `g = 1`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor improvement

We used to always skip adding the group context if we saw a warning_across_missing_cols_deprecated warning class. But during the evaluation path (i.e. not expansion) we actually do have the group context, so we can report it

@@ -164,6 +164,18 @@
<error/dplyr:::mutate_error>
Error in `mutate()`:
i In argument: `..1 = if (a == 1) NULL else "foo"`.
i In group 1: `a = 1`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another small improvement

Again, we used to always skip adding the group context if mutate_mixed_null was the error class, but we actually do set the group value explicitly in the C++ code for this case, and we can now let that bubble up

Comment on lines +998 to +1034
times_two <- function(x) x * 2

# Expansion path
expect_snapshot(out <- mutate(df, z = across()))
expect_identical(out$z, df)
expect_snapshot(out <- mutate(gdf, z = across()))
expect_identical(out$z, df[c("x", "y")])
expect_snapshot(out <- mutate(df, across(.fns = times_two)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized this wasn't actually doing the "expansion path" of across() because it was a named call to across(), i.e. the z = , so I fixed that here. It ends up tweaking the warning in the snapshot test a little

Copy link
Member

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement!

R/across.R Show resolved Hide resolved
Comment on lines -65 to -68
"dplyr:::error_incompatible_combine",
"dplyr:::mutate_mixed_null",
"dplyr:::mutate_constant_recycle_error",
"dplyr:::summarise_mixed_null",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

src/summarise.cpp Outdated Show resolved Hide resolved
tests/testthat/_snaps/summarise.md Show resolved Hide resolved
@DavisVaughan DavisVaughan force-pushed the fix/group-context-errors branch 2 times, most recently from cf27e06 to ac57d3d Compare November 21, 2022 13:59
Previously we never actually used the expansion path because named across calls went through the evaluation path
It is `0L` before any group evaluation, and we now ensure that it is `0L` on the way out too, through `DPLYR_MASK_FINALISE()`

This ensures that we can continue to use `$set_current_group()` on the fly right before an `abort()` call to force a group location
We already do this for `mutate()`, so this is just us being consistent
@DavisVaughan DavisVaughan merged commit 370fdf0 into tidyverse:main Nov 21, 2022
@DavisVaughan DavisVaughan deleted the fix/group-context-errors branch November 21, 2022 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Grouped across() doesn't give good error with bad input
2 participants