-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer not automatically coerced to double #1892
Comments
Thanks, confirmed. Works for me if I replace |
@krlmlr The test would be added to |
Yes, if a grouped mutate uses different return types it's safer to throw an error; if this is really desired behavior the user can coerce. The only exception we'd like to allow is integer vs. double, and logical NAs, see also r-lib/vctrs#7 and dplyr::if_else(). Test location looks good to me. |
Also note that grouped mutate does not current coerce factor + character -> character (WARN) as expected. See PR #2249 for a demonstrative test. |
* Add failing test for #1892 * Correct Description of Test * Correct Spelling * Add Skip for Currently Failing Test * Correct Formatting * Correct Formatting and Clarify Data Type * Increased Clarity of Test Expectation * Add Failing Test * Prefer typeof() in place of class() * Single Instance per Group; Clarified Description; expect_type() rather than expect_equal() * Code Area Covered by expect_warning() Reduced * Explicitly Expect Warning to be Issued by Mutate * Add Reference to Issue #1892 * Test are in Mutate Rather than group_by with Other Grouped Mutate Tests * Add Expectations Regarding the Exact Output * Drop explicit (and unnecessary) casting
Here's a simpler reprex, along with a suggestion that the code path is different for library(dplyr, warn.conflicts = FALSE)
df <- tibble(
value = c(1L, NA),
group = c("A", "B")
)
df %>%
group_by(group) %>%
mutate(value = if (is.na(value)) 0 else 1L)
#> Error in mutate_impl(.data, dots): incompatible types, expecting a integer vector
df %>%
group_by(group) %>%
summarise(value = if (is.na(value)) 0 else 1L)
#> # A tibble: 2 × 2
#> group value
#> <chr> <dbl>
#> 1 A 1
#> 2 B 0 |
Another reprex: df <- tibble(
g = c(1, 2, 2),
x = c(1L, 2L, 3L)
)
df %>%
group_by(g) %>%
mutate(median(x)) |
@zeehio any ideas what's going on here? Despite my previous assertion, I'd really like to fix this bug for the next release. |
This issue goes through a different code path than the The error is raised here https://github.com/hadley/dplyr/blob/61f77bc1d2dbf496cf5b72a750a24f134d506a45/inst/include/dplyr/Gatherer.h#L81 The wrong underlying assumption in I think the best solution would be to use the |
Yes, that would be fantastic! It would fix a whole class of bugs. |
Current solution (dplyr-0.5.0 and master):Implementation
Assumptions of the current algorithm:
General case assumptions:
Possible general case implementation
Issue with this general case
Tradeoff solution proposalAssumptions
Advantage with respect to the general case:
Advantage with respect to the current implementation:
@hadley if the tradeoff solution is good for you I will go with it. |
@zeehio in the tradeoff solution, could you spell out in a bit more detail what happens if the types are different in different groups? |
Yes, we will use the same rules as in
|
Ok, I don't quite see how this will work under the hood, but it sounds reasonable, and I'd love to review a PR 😁 |
`mutate(col2 = fun(col1))` on a grouped data frame calls `fun` once per group. It used to require that `fun` returns the exact same type and that was not desirable in functions that may return different (but compatible) types, such as integer and numeric. This PR changes that behaviour, so the returned vectors from each of the `fun` calls are combined using the same coercion rules than `combine` and `bind_rows`, defined in `Collecter.h`.
`mutate(col2 = fun(col1))` on a grouped data frame calls `fun` once per group. It used to require that `fun` returns the exact same type and that was not desirable in functions that may return different (but compatible) types, such as integer and numeric. This PR changes that behaviour, so the returned vectors from each of the `fun` calls are combined using the same coercion rules than `combine` and `bind_rows`, defined in `Collecter.h`.
@hadley I give you two PR instead of one. The second one includes the commits of the first one, because it depends on it, but after the first PR is merged the second PR will consist only of a single commit. It was easier than expected to make it work, although my C++ and Rcpp still need some practice. If you find things done in a weird way, assume my Rcpp skills are not good and ask. |
`mutate(col2 = fun(col1))` on a grouped data frame calls `fun` once per group. It used to require that `fun` returns the exact same type and that was not desirable in functions that may return different (but compatible) types, such as integer and numeric. This PR changes that behaviour, so the returned vectors from each of the `fun` calls are combined using the same coercion rules than `combine` and `bind_rows`, defined in `Collecter.h`.
* Add difftime support to Collecter.h * Fix oldrel test * Make mutate use collecter.h. Closes #1892 `mutate(col2 = fun(col1))` on a grouped data frame calls `fun` once per group. It used to require that `fun` returns the exact same type and that was not desirable in functions that may return different (but compatible) types, such as integer and numeric. This PR changes that behaviour, so the returned vectors from each of the `fun` calls are combined using the same coercion rules than `combine` and `bind_rows`, defined in `Collecter.h`. * Add hms support and other @krlmlr feedback * Further suggestions to PR #2487 by @krlmlr * Difftime collecter corrections suggested by @hadley
This might be a won't fix, but I thought I'd float the balloon anyway.
Consider:
Which fails.
Versus:
Which does not fail.
As a matter of linguistic parsing one might not expect different outcomes based on ordering. Is there a reason dplyr can't coerce the returns from the group_by operation to share in the most general common data type (as done elsewhere in R) so that both functional orders yield successful results?
The text was updated successfully, but these errors were encountered: