Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Spread should work with drop=FALSE and fill=NA and create columns (not rows) from empty factor levels #56
In the following non empty example
We get a data frame (like) with columns "a", "b".
Now assume I know it has "a" and "b", and in later steps I consume "a" and "b" in e.g. mutate, or ggvis.
What if there is no data?
This produces the following:
Which then goes horribly wrong if I try to consume "a"
What happened here? It pulled 'a' form the environment - in fact it's using
But right now that doesn't work - drop=FALSE does something different - it fills in missing data...
I can't even make this work properly, to produce an example, but never mind.
Interestingly, if ONE level is missing, but they others are present, it works as expected:
But fails if no data is present:
This the provided example:
colnames fails because it's expecting col_labels column count, but in fact has a column count of 0.
This failure to behave in a similar manner in empty cases (a seasonal problem in R code) starts:
the id function which nominally handles a data frame (for which lengths would be a repeated number), is actually capable of handling a list (for which other situations might occur). It's hard for me to see how that code - filtering out empty columns - would make sense in a data frame.
It continue in tidy::id_var with
which precedes the factor handling code:
I suggest that the order of these clauses in id_var be fixed to check factor first.
This was referenced
Jan 19, 2015
Here's my understanding of the issue, in code:
df_c <- data_frame( x = c("a", "a", "b", "b"), y = c("y", "z", "y", "z"), z = 1:4 ) df_f <- df_c %>% mutate(x = factor(x, levels = c("b", "a")), y = factor(y)) # Correct: only differ in order df_c %>% spread(y, z) %>% str() df_f %>% spread(y, z) %>% str() # Correct: only see y df_c[1,] %>% spread(y, z) %>% str() df_f[1,] %>% spread(y, z) %>% str() # Correct: expands out both y and z df_f[1,] %>% spread(y, z, drop = FALSE) %>% str() # Correct: don't see any values df_c[0,] %>% spread(y, z) %>% str() # Incorrect: from the levels of the factor, should have columns a and b df_f[0,] %>% spread(y, z, drop = FALSE) %>% str()