Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

complete and expand with length zero values #331

Closed
mgirlich opened this issue Jul 24, 2017 · 9 comments
Closed

complete and expand with length zero values #331

mgirlich opened this issue Jul 24, 2017 · 9 comments
Labels
bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes"

Comments

@mgirlich
Copy link
Contributor

complete() raises an error when a value for a column to expand is of length zero. The reason for this is that expand() returns NULL resp. a 0 x 0 tibble for a length zero value.

In my opinion it would be better for expand() to return a dataframe/tibble with the same column specification as the original one (or raise a more informative error). This would also fix the problem with complete().

tidyr::complete(data.frame(a = character()), a = NULL)
#> Error in UseMethod("left_join"): no applicable method for 'left_join' applied to an object of class "NULL"
tidyr::expand(data.frame(a = character()), a = NULL)
#> NULL
tidyr::expand(tibble::tibble(a = character()), a = NULL)
#> # A tibble: 0 x 0
@colearendt
Copy link
Contributor

Related to #270 and the proposed fix #285. Not sure if that fix address this or not

@rtaph
Copy link

rtaph commented Aug 21, 2017

@colearendt I am not an admin on this repo but I tried your PR and it solves the problems I have with empty data frames. There are a few merge conflicts though and I think it also needs a grouped_df method.

@colearendt
Copy link
Contributor

Thanks for the feedback! I'll take a look when I have a chance. The merge conflicts are new, so thanks for the heads up!

@colearendt
Copy link
Contributor

@rtaph I have updated my PR to resolve the merge conflicts.

Were the problems you encountering related to factors? The examples you gave at the start of this issue have the same output in my PR, so it would look to me like your concerns were not addressed. The other thing - expand already has a respects groups test and it passes in my PR. Do you mind giving an example of what grouped_df output you expect?

@rtaph
Copy link

rtaph commented Aug 27, 2017

Hi @colearendt,

I think the PR is great. Yes, my issues were with complete()'s treatment of factors in empty data frames and the proposed change provides a solution.

I don't think that a grouped factor-complete on an empty data frame make sense as a desirable operation (after all, isn't that somewhat the role of nesting()?). I came across this case by accident when I used it on a grouped_df, and happily ungrouped it to get the output I needed. That said, the example below shows complete() return an empty data frame on agrouped_df. I think this is less expected than either an error or the output from Case 1.

# devtools::install_github('colearendt/tidyr@7c83699')
library(tidyr)
library(dplyr, warn.conflicts = FALSE)

d1 <- ToothGrowth[0, ]
d2 <- group_by(d1, dose)

# Case 1: expected output, introduced via PR
complete(d1, supp)
#> # A tibble: 2 x 3
#>     supp   len  dose
#>   <fctr> <dbl> <dbl>
#> 1     OJ    NA    NA
#> 2     VC    NA    NA

# Case 2: unexpected output
complete(d2, supp)
#> # A tibble: 0 x 3
#> # Groups:   dose [0]
#> # ... with 3 variables: dose <dbl>, supp <fctr>, len <dbl>

I imagine others could run into this as well, as it is easy to mistakenly pass in a grouped_df in an analytical chain.

@colearendt
Copy link
Contributor

Ahh yes, that's a great example. I will take a look - it is a bit strange to me that this does not work the way that would be expected - need to look a bit more into how the grouped_df is dispatched.

@hadley hadley added bug an unexpected problem or unintended behavior wip work in progress labels Nov 15, 2017
@colearendt
Copy link
Contributor

@rtaph It looks like the "complete" of the factors is happening appropriately inside the function (for each group). The problem is that there is no group actually present here, so do will not inject an NA for a group that does not exist. What we are running up against here is how do thinks about groups. In the docs it says:

For an empty data frame, the expressions will be evaluated once, even in the presence of a grouping. This makes sure that the format of the resulting data frame is the same for both empty and non-empty input.

However, even if this single execution (which I have verified is taking place as expected) returns a nonempty tbl_df (which it does), do does not recognize these records, presumably because no data was coming in. I do not understand the semantics of do enough to go further than that.

Perhaps @hadley can note whether do should ever return a non-empty data.frame when the input is a grouped_df with no data. If so, then this specific case can be discussed further. It would require somehow working around the implementation of do, which is probably not desirable for consistency's sake.

@rtaph
Copy link

rtaph commented Nov 16, 2017

Thanks for looking into this, @colearendt.

It would require somehow working around the implementation of do, which is probably not desirable for consistency's sake.

Let's see what @hadley's thoughts are, because as you say this has more to do with the philosophy of do in the presence of empty data. If anything a warning message for this corner case may be sufficient, as the problem is resolved if the user ungroups the input.

@hadley
Copy link
Member

hadley commented Nov 16, 2017

do should not be privileged. If anything, it's less likely to be correct.

@hadley hadley added pivoting ♻️ pivot rectangular data to different "shapes" and removed wip work in progress labels Nov 20, 2017
@hadley hadley closed this as completed in b26f91e Nov 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes"
Projects
None yet
Development

No branches or pull requests

4 participants