New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spread() fails with empty data frame #269

Closed
hadley opened this Issue Jan 10, 2017 · 6 comments

Comments

Projects
None yet
3 participants
@hadley
Member

hadley commented Jan 10, 2017

library(tidyverse)

df <- tibble(x = character(), y = numeric(), z = character())
spread(df, x, y)
#> Error in enc2utf8(col_names(col_labels, sep = sep)): argumemt is not a character vector

df2 <- df[c("x", "y")]
spread(df2, x, y)
#> Error: Duplicate identifiers for rows (1, 2)
@hadley

This comment has been minimized.

Member

hadley commented Jan 10, 2017

(Also note the spelling mistake, so possibly three bugs here)

@cb4ds

This comment has been minimized.

cb4ds commented Feb 2, 2017

I'm picking this one up - I'd like to clarify the outcomes that should occur for the above examples:

If a user calls spread on an empty data frame with a key/value pair of columns, which (of course by definition here) are empty then there are no matched key-value pairs and what I see happening is essentially those columns would be dropped because there are no keys to put as columns.

Following on that:

  • spread(df, x, y) would return an empty data frame without the key/value columns (in this case just z). Should there be a message or warning when this type of drop takes place?

  • spread(df2, x, y) results in no remaining columns following the same logic. Should a 0x0 data frame being returned. Is this desired or would something such as NULL return be more useful?

@markhwhiteii

This comment has been minimized.

markhwhiteii commented Nov 8, 2017

I have run into this issue a number of times, and I'm wondering if it might be possible to throw an error if nrow(df) is zero, giving the user a meaningful error message that spread could not be done, because no data were present?

Perhaps I'm being too egocentric, but I can't think of a time where I would want to spread a data frame and not want to be notified if it were empty.

@hadley

This comment has been minimized.

Member

hadley commented Nov 16, 2017

I'm pretty sure @cb4ds suggestion is the right approach. @markhwhiteii, I agree that it's probably not useful but in my experience it seems to work out easier in the long-run if operations on empty data frames do return values rather than throwing errors.

@cb4ds are you still interested in working on this? I'll be working on tidyr for the next few weeks and I'd be very happy to review a PR. (If I don't hear from you within a couple of days I'll assume you're not interested and I'll do it).

@cb4ds

This comment has been minimized.

cb4ds commented Nov 17, 2017

@hadley Thanks for checking with me - I'm too swamped at this point in the year to tackle it now so you are welcome to have at it!

@hadley

This comment has been minimized.

Member

hadley commented Nov 17, 2017

@cb4ds sorry for taking so long to get back to you - we are thinking about and working on ways to make it easier for others to contribute. Unfortunately my workflow isn't the most conducive to outside contributions at the moment.

@hadley hadley closed this in 9d83369 Nov 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment