New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

complete and expand with length zero values #331

Closed
mgirlich opened this Issue Jul 24, 2017 · 9 comments

Comments

Projects
None yet
4 participants
@mgirlich

mgirlich commented Jul 24, 2017

complete() raises an error when a value for a column to expand is of length zero. The reason for this is that expand() returns NULL resp. a 0 x 0 tibble for a length zero value.

In my opinion it would be better for expand() to return a dataframe/tibble with the same column specification as the original one (or raise a more informative error). This would also fix the problem with complete().

tidyr::complete(data.frame(a = character()), a = NULL)
#> Error in UseMethod("left_join"): no applicable method for 'left_join' applied to an object of class "NULL"
tidyr::expand(data.frame(a = character()), a = NULL)
#> NULL
tidyr::expand(tibble::tibble(a = character()), a = NULL)
#> # A tibble: 0 x 0
@colearendt

This comment has been minimized.

Contributor

colearendt commented Aug 19, 2017

Related to #270 and the proposed fix #285. Not sure if that fix address this or not

@rtaph

This comment has been minimized.

rtaph commented Aug 21, 2017

@colearendt I am not an admin on this repo but I tried your PR and it solves the problems I have with empty data frames. There are a few merge conflicts though and I think it also needs a grouped_df method.

@colearendt

This comment has been minimized.

Contributor

colearendt commented Aug 21, 2017

Thanks for the feedback! I'll take a look when I have a chance. The merge conflicts are new, so thanks for the heads up!

@colearendt

This comment has been minimized.

Contributor

colearendt commented Aug 27, 2017

@rtaph I have updated my PR to resolve the merge conflicts.

Were the problems you encountering related to factors? The examples you gave at the start of this issue have the same output in my PR, so it would look to me like your concerns were not addressed. The other thing - expand already has a respects groups test and it passes in my PR. Do you mind giving an example of what grouped_df output you expect?

@rtaph

This comment has been minimized.

rtaph commented Aug 27, 2017

Hi @colearendt,

I think the PR is great. Yes, my issues were with complete()'s treatment of factors in empty data frames and the proposed change provides a solution.

I don't think that a grouped factor-complete on an empty data frame make sense as a desirable operation (after all, isn't that somewhat the role of nesting()?). I came across this case by accident when I used it on a grouped_df, and happily ungrouped it to get the output I needed. That said, the example below shows complete() return an empty data frame on agrouped_df. I think this is less expected than either an error or the output from Case 1.

# devtools::install_github('colearendt/tidyr@7c83699')
library(tidyr)
library(dplyr, warn.conflicts = FALSE)

d1 <- ToothGrowth[0, ]
d2 <- group_by(d1, dose)

# Case 1: expected output, introduced via PR
complete(d1, supp)
#> # A tibble: 2 x 3
#>     supp   len  dose
#>   <fctr> <dbl> <dbl>
#> 1     OJ    NA    NA
#> 2     VC    NA    NA

# Case 2: unexpected output
complete(d2, supp)
#> # A tibble: 0 x 3
#> # Groups:   dose [0]
#> # ... with 3 variables: dose <dbl>, supp <fctr>, len <dbl>

I imagine others could run into this as well, as it is easy to mistakenly pass in a grouped_df in an analytical chain.

@colearendt

This comment has been minimized.

Contributor

colearendt commented Aug 27, 2017

Ahh yes, that's a great example. I will take a look - it is a bit strange to me that this does not work the way that would be expected - need to look a bit more into how the grouped_df is dispatched.

@hadley hadley added bug wip labels Nov 15, 2017

@colearendt

This comment has been minimized.

Contributor

colearendt commented Nov 16, 2017

@rtaph It looks like the "complete" of the factors is happening appropriately inside the function (for each group). The problem is that there is no group actually present here, so do will not inject an NA for a group that does not exist. What we are running up against here is how do thinks about groups. In the docs it says:

For an empty data frame, the expressions will be evaluated once, even in the presence of a grouping. This makes sure that the format of the resulting data frame is the same for both empty and non-empty input.

However, even if this single execution (which I have verified is taking place as expected) returns a nonempty tbl_df (which it does), do does not recognize these records, presumably because no data was coming in. I do not understand the semantics of do enough to go further than that.

Perhaps @hadley can note whether do should ever return a non-empty data.frame when the input is a grouped_df with no data. If so, then this specific case can be discussed further. It would require somehow working around the implementation of do, which is probably not desirable for consistency's sake.

@rtaph

This comment has been minimized.

rtaph commented Nov 16, 2017

Thanks for looking into this, @colearendt.

It would require somehow working around the implementation of do, which is probably not desirable for consistency's sake.

Let's see what @hadley's thoughts are, because as you say this has more to do with the philosophy of do in the presence of empty data. If anything a warning message for this corner case may be sufficient, as the problem is resolved if the user ungroups the input.

@hadley

This comment has been minimized.

Member

hadley commented Nov 16, 2017

do should not be privileged. If anything, it's less likely to be correct.

@hadley hadley added rectangles 🗄 and removed wip labels Nov 20, 2017

@hadley hadley closed this in b26f91e Nov 20, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment