Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do() drops columns when function returns 0-row data frame #597

Closed
wch opened this issue Sep 12, 2014 · 9 comments
Closed

do() drops columns when function returns 0-row data frame #597

wch opened this issue Sep 12, 2014 · 9 comments
Assignees
Labels
Milestone

Comments

@wch
Copy link
Member

@wch wch commented Sep 12, 2014

For example, the result should have a column named blank:

blankdf <- function(x) data.frame(blank = numeric(0))

dat <- data.frame(a = 1:2, b = factor(1:2))
dat %>% group_by(b) %>% do(blankdf(.))
# Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':  0 obs. of  1 variable:
#  $ b: Factor w/ 2 levels "1","2": 
#  - attr(*, "vars")=List of 1
#   ..$ : symbol b
#  - attr(*, "drop")= logi TRUE
#  - attr(*, "indices")= list()
#  - attr(*, "group_sizes")= int 
#  - attr(*, "biggest_group_size")= int 0
#  - attr(*, "labels")='data.frame':    0 obs. of  1 variable:
#   ..$ b: Factor w/ 2 levels "1","2": 
#   ..- attr(*, "vars")=List of 1
#   .. ..$ : symbol b

Also, if the input has zero-rows but is grouped there is an error:

data.frame(a = numeric(0), b = factor()) %>% group_by(b) %>% do(blankdf(.))
# Error in rep(1:nrow(labels), rows) : invalid 'times' argument

data.frame(a = numeric(0), b = character()) %>% group_by(b) %>% do(blankdf(.))
# Error in rep(1:nrow(labels), rows) : invalid 'times' argument

However, if the input originally had more rows but now has zero, it doesn't error (but it still drops the blank column):

dat <- data.frame(a = 1:2, b = factor(1:2))
dat2 <- dat %>% group_by(b)
dat2 <- dat2[0,]
dat2 %>% do(blankdf(.)) %>% str()
# Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':  0 obs. of  1 variable:
#  $ b: Factor w/ 2 levels "1","2": 
#  - attr(*, "vars")=List of 1
#   ..$ : symbol b
#  - attr(*, "drop")= logi TRUE
#  - attr(*, "indices")= list()
#  - attr(*, "group_sizes")= int 
#  - attr(*, "biggest_group_size")= int 0
#  - attr(*, "labels")='data.frame':    0 obs. of  1 variable:
#   ..$ b: Factor w/ 2 levels "1","2": 
#   ..- attr(*, "vars")=List of 1
#   .. ..$ : symbol b
@eibanez
Copy link
Contributor

@eibanez eibanez commented Sep 18, 2014

I was going to file another issue, but I think my problem is related to this one.

When a do() operation if applied to a grouped tbl_df and the results are all empty data.frames, dplyr return an error (I'm assuming that this creates an empty data.frame that is supposed to be grouped). I would expect the result of the following operations to be an data.frame with zero rows.

data.frame(a = 1:3) %>% group_by(a) %>% do(data.frame())
# Error: upper value must be greater than lower value

The following example with rowwise seems to behave appropriately:

data.frame(a = 1:3) %>% rowwise %>% do(data.frame())
# Source: local data frame [0 x 0]
# Groups: <by row>

Loading

@hadley hadley added the bug label Sep 22, 2014
@hadley hadley added this to the 0.3 milestone Sep 22, 2014
@hadley hadley self-assigned this Sep 22, 2014
@hadley hadley assigned romainfrancois and unassigned hadley Sep 22, 2014
@hadley
Copy link
Member

@hadley hadley commented Sep 22, 2014

@romainfrancois I think this is now an rbind_all() bug:

empty <- data.frame(result = numeric())
expect_equal(rbind_list(empty), empty)
expect_equal(rbind_list(empty, empty), empty)
expect_equal(rbind_list(empty, empty, empty), empty)

Loading

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Sep 23, 2014

Now getting:

> dat %>% group_by(b) %>% do(blankdf(.)) -> res
> str(res)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 0 obs. of  2 variables:
 $ b    : Factor w/ 2 levels "1","2":
 $ blank: num
 - attr(*, "vars")=List of 1
  ..$ : symbol b
 - attr(*, "drop")= logi TRUE
 - attr(*, "indices")= list()
 - attr(*, "group_sizes")= int
 - attr(*, "biggest_group_size")= int 0
 - attr(*, "labels")='data.frame':  0 obs. of  1 variable:
  ..$ b: Factor w/ 2 levels "1","2":
  ..- attr(*, "vars")=List of 1
  .. ..$ : symbol b

@wch is this what you expect. I'm not that familiar with do. Is this normal that blank is a numeric vector or should it be something like a list of data frames ?

Loading

@wch
Copy link
Member Author

@wch wch commented Sep 23, 2014

That seems right. In this case blank is just a numeric vector. I'm not even sure what should happen if it's a list of data frames. :)

Loading

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Sep 23, 2014

Cool. I'm looking at the other examples now.

Loading

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Sep 23, 2014

@hadley I think I fixed rbind_list now, but I'm still getting :

> dat <- data.frame(a = 1:2, b = factor(1:2))
> dat2 <- dat %>% group_by(b)
> dat2 <- dat2[0,]
> dat2 %>% do(blankdf(.)) %>% str()
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 0 obs. of  1 variable:
 $ b: Factor w/ 2 levels "1","2":
 - attr(*, "vars")=List of 1
  ..$ : symbol b
 - attr(*, "drop")= logi TRUE
 - attr(*, "indices")= list()
 - attr(*, "group_sizes")= int
 - attr(*, "biggest_group_size")= int 0
 - attr(*, "labels")='data.frame':  0 obs. of  1 variable:
  ..$ b: Factor w/ 2 levels "1","2":
  ..- attr(*, "vars")=List of 1
  .. ..$ : symbol b

i.e. no blank column in the output. I think it is because of this loop in do.grouped_df:

  for (`_i` in seq_len(n)) {
    for (j in seq_len(m)) {
      out[[j]][`_i`] <- list(eval(args[[j]], envir = env))
      p$tick()$print()
    }
  }

because n is 0 out[[j]] never gets assigned anything, so label_output_dataframe has nothing to work with. Or perhaps I don't get it.

Loading

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Sep 23, 2014

@eibanez I'm now getting:

> data.frame(a = 1:3) %>% group_by(a) %>% do(data.frame())
Source: local data frame [0 x 1]
Groups: a

> data.frame(a = 1:3) %>% group_by(a) %>% do(data.frame()) %>% str
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 0 obs. of  1 variable:
 $ a: int
 - attr(*, "vars")=List of 1
  ..$ : symbol a
 - attr(*, "drop")= logi TRUE
 - attr(*, "indices")= list()
 - attr(*, "group_sizes")= int
 - attr(*, "biggest_group_size")= int 0
 - attr(*, "labels")='data.frame':  0 obs. of  1 variable:
  ..$ a: int
  ..- attr(*, "vars")=List of 1
  .. ..$ : symbol a

Is this what you expect ?

Loading

@wch
Copy link
Member Author

@wch wch commented Sep 23, 2014

Can you bump the dplyr version once this is fixed? Thanks!

Loading

romainfrancois added a commit that referenced this issue Sep 23, 2014
@eibanez
Copy link
Contributor

@eibanez eibanez commented Sep 23, 2014

@romainfrancois Looks good to me. Thanks, Romain!

Loading

@hadley hadley assigned hadley and unassigned romainfrancois Sep 24, 2014
@hadley hadley closed this Sep 30, 2014
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants