Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do() drops columns when input is a 0-row grouped_df #625

Closed
wch opened this issue Sep 24, 2014 · 10 comments
Closed

do() drops columns when input is a 0-row grouped_df #625

wch opened this issue Sep 24, 2014 · 10 comments
Assignees
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@wch
Copy link
Member

wch commented Sep 24, 2014

For example, the input here has two columns (and zero rows), but the output has only one column:

dat <- data.frame(x = numeric(0), g = character(0))
dat %>% group_by(g) %>% do(identity(.)) %>% str()
# Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':  0 obs. of  1 variable:
#  $ g: Factor w/ 0 levels: 
#  - attr(*, "vars")=List of 1
#   ..$ : symbol g
#  - attr(*, "drop")= logi TRUE
#  - attr(*, "indices")= list()
#  - attr(*, "group_sizes")= int 
#  - attr(*, "biggest_group_size")= int 0
#  - attr(*, "labels")='data.frame':    0 obs. of  1 variable:
#   ..$ g: Factor w/ 0 levels: 
#   ..- attr(*, "vars")=List of 1
#   .. ..$ : symbol g

In my particular case, I would like it to simply return the input unchanged, but I'm not sure what the correct output should be, in general.

Since the function isn't actually called on zero-row data, there's no way of knowing what columns it would return. Maybe it should be called once with the zero-row data frame? In this case, that would be equivalent to doing dat %>% group_by(g) %>% identity().

@hadley hadley added the bug an unexpected problem or unintended behavior label Sep 25, 2014
@hadley hadley added this to the 0.3.1 milestone Sep 25, 2014
@hadley hadley self-assigned this Sep 25, 2014
@wch
Copy link
Member Author

wch commented Sep 26, 2014

This is cropping up in many places in ggvis, for example rstudio/ggvis#281. Perhaps there should at least be an option for do() to call the function with a zero-row input?

@hadley
Copy link
Member

hadley commented Sep 26, 2014

Some more test cases:

df_str <- function(x) str(as.data.frame(x))
dat <- data.frame(x = numeric(0), g = character(0))
grp <- dat %>% group_by(g)

grp %>% do(.) %>% df_str()
grp %>% do(data.frame(y = integer(0))) %>% df_str()
grp %>% do(cbind(., y = integer(0))) %>% df_str()


f <- function() {
  cat("Hi!\n")
  data.frame()
}
dat %>% filter(FALSE) %>% do(f()) %>% df_str()
grp %>% filter(FALSE) %>% do(f()) %>% df_str()

@wch
Copy link
Member Author

wch commented Sep 30, 2014

A couple more test cases:

# Start with no groups, do()
dat <- data.frame(x = numeric(0), g = character(0)) %>% group_by(g)

res <- dat %>% do(identity(.))
expect_true(setequal(names(res), names(dat)))

res <- dat %>% do(blankdf(.))
expect_true(setequal(names(res), c("g", "blank", "blank2")))


# Start with some groups, drop all rows, then do()
# The resulting tbl_df after the filter() has a slightly different structure
# from the one that started with zero rows.
dat <- data.frame(x = 1:4, g = c("a", "c", "a", "b")) %>% group_by(g) %>%
  filter(FALSE)

res <- dat %>% do(identity(.))
expect_true(setequal(names(res), names(dat)))

res <- dat %>% do(blankdf(.))
expect_true(setequal(names(res), c("g", "blank", "blank2")))

@wch
Copy link
Member Author

wch commented Oct 2, 2014

dplyr still drops columns when do() calls a function that returns a vector (instead of a data frame).

f <- function(x) 1:4
dat <- mtcars %>% group_by(fcyl = factor(cyl))

# OK
dat %>% do(f = f(.)) %>% names()
# [1] "fcyl" "f"   

# Drops cols
dat %>% filter(FALSE) %>% do(f = f(.)) %>% names()
# [1] "fcyl"

@hadley
Copy link
Member

hadley commented Nov 20, 2014

Do you think these tests cover all the possibilities?

df_str <- function(x) str(as.data.frame(x))
dat <- data_frame(x = numeric(0), g = character(0))
grp <- dat %>% group_by(g)
emt <- grp %>% filter(FALSE)

dat %>% do(data.frame()) %>% type_sum()
dat %>% do(data.frame(y = integer(0))) %>% type_sum()
dat %>% do(data.frame(.)) %>% type_sum()
dat %>% do(data.frame(., y = integer(0))) %>% type_sum()
dat %>% do(y = ncol(.)) %>% type_sum()

grp %>% do(data.frame()) %>% type_sum()
grp %>% do(data.frame(y = integer(0))) %>% type_sum()
grp %>% do(data.frame(.)) %>% type_sum()
grp %>% do(data.frame(., y = integer(0))) %>% type_sum()
grp %>% do(y = ncol(.)) %>% type_sum()

emt %>% do(data.frame()) %>% type_sum()
emt %>% do(data.frame(y = integer(0))) %>% type_sum()
emt %>% do(data.frame(.)) %>% type_sum()
emt %>% do(data.frame(., y = integer(0))) %>% type_sum()
emt %>% do(y = ncol(.)) %>% type_sum()

@hadley
Copy link
Member

hadley commented Nov 26, 2014

@wch ping

@wch
Copy link
Member Author

wch commented Nov 26, 2014

Looks good to me.

@hadley
Copy link
Member

hadley commented Dec 2, 2014

dat <- data_frame(x = numeric(0), g = character(0))
grp <- dat %>% group_by(g)
emt <- grp %>% filter(FALSE)

dat %>% do(data.frame()) %>% type_sum() %>% 
  expect_equal(character())
dat %>% do(data.frame(y = integer(0))) %>% type_sum() %>% 
  expect_equal(c(y = "int"))
dat %>% do(data.frame(.)) %>% type_sum() %>% 
  expect_equal(c(x = "dbl", g = "chr"))
dat %>% do(data.frame(., y = integer(0))) %>% type_sum() %>% 
  expect_equal(c(x = "dbl", g = "chr", y = "int"))
dat %>% do(y = ncol(.)) %>% type_sum() %>% 
  expect_equal(c(y = "list"))

# Grouped data frame should have same col types as ungrouped, with addition
# of grouping variable
grp %>% do(data.frame()) %>% type_sum() %>%
  expect_equal(c(g = "chr"))
grp %>% do(data.frame(y = integer(0))) %>% type_sum() %>%
  expect_equal(c(g = "chr", y = "int"))
grp %>% do(data.frame(.)) %>% type_sum() %>%
  expect_equal(c(x = "dbl", g = "chr"))
grp %>% do(data.frame(., y = integer(0))) %>% type_sum() %>%
  expect_equal(c(x = "dbl", g = "chr", y = "int"))
grp %>% do(y = ncol(.)) %>% type_sum() %>%
  expect_equal(c(g = "chr", y = "list"))

# A empty grouped dataset should have same types as grp
emt %>% do(data.frame()) %>% type_sum() %>%
  expect_equal(c(g = "chr"))
emt %>% do(data.frame(y = integer(0))) %>% type_sum() %>%
  expect_equal(c(g = "chr", y = "int"))
emt %>% do(data.frame(.)) %>% type_sum() %>%
  expect_equal(c(x = "dbl", g = "chr"))
emt %>% do(data.frame(., y = integer(0))) %>% type_sum() %>%
  expect_equal(c(x = "dbl", g = "chr", y = "int"))
emt %>% do(y = ncol(.)) %>% type_sum() %>%
  expect_equal(c(g = "chr", y = "list"))

Currently the only failures are with named inputs

@wch
Copy link
Member Author

wch commented Dec 2, 2014

For the first test, perhaps you could use this so that it gets a named char vector:

dat %>% do(data.frame()) %>% type_sum() %>% 
  expect_equal(c(x="chr")[0])

@hadley hadley closed this as completed in ae6e962 Dec 2, 2014
@wch
Copy link
Member Author

wch commented Dec 2, 2014

Looks good to me!

krlmlr pushed a commit to krlmlr/dplyr that referenced this issue Mar 2, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants