Spread should work with drop=FALSE and fill=NA and create columns (not rows) from empty factor levels #56
Comments
Interestingly, if ONE level is missing, but they others are present, it works as expected:
But fails if no data is present:
|
This the provided example:
in
but
Hence in
colnames fails because it's expecting col_labels column count, but in fact has a column count of 0. This failure to behave in a similar manner in empty cases (a seasonal problem in R code) starts: in tidyr::id
the id function which nominally handles a data frame (for which lengths would be a repeated number), is actually capable of handling a list (for which other situations might occur). It's hard for me to see how that code - filtering out empty columns - would make sense in a data frame. It continue in tidy::id_var with
which precedes the factor handling code:
I suggest that the order of these clauses in id_var be fixed to check factor first. |
Gah, wrong issue. |
Here's my understanding of the issue, in code: df_c <- data_frame(
x = c("a", "a", "b", "b"),
y = c("y", "z", "y", "z"),
z = 1:4
)
df_f <- df_c %>% mutate(x = factor(x, levels = c("b", "a")), y = factor(y))
# Correct: only differ in order
df_c %>% spread(y, z) %>% str()
df_f %>% spread(y, z) %>% str()
# Correct: only see y
df_c[1,] %>% spread(y, z) %>% str()
df_f[1,] %>% spread(y, z) %>% str()
# Correct: expands out both y and z
df_f[1,] %>% spread(y, z, drop = FALSE) %>% str()
# Correct: don't see any values
df_c[0,] %>% spread(y, z) %>% str()
# Incorrect: from the levels of the factor, should have columns a and b
df_f[0,] %>% spread(y, z, drop = FALSE) %>% str() |
I'm pretty sure I correctly identified the underlying problem. Please let me know if I missed anything. |
Short description
The function
id
fails to consider the possibility that it's input is a data frame containing (or only including) factors with zero length members. As does the functionid_var
, called byid
. This causes problems for the spread function, which should be able to handle these cases and generate named empty columns. This further causes problems for dplyr, which can result in missing column names (that should have been generated by the factor transformation) in later stages in the pipe. Instead the name is bound to some other thing in the global scope and the script (or shiny app) will error or otherwise fail.Long description
In the following non empty example
We get a data frame (like) with columns "a", "b".
Now assume I know it has "a" and "b", and in later steps I consume "a" and "b" in e.g. mutate, or ggvis.
What if there is no data?
This produces the following:
Which then goes horribly wrong if I try to consume "a"
What happened here? It pulled 'a' form the environment - in fact it's using
shiny::a
- a function to produce html.Ideally,
drop=FALSE
should address this - by creating columns for factor levels that don't exist in the data.But right now that doesn't work - drop=FALSE does something different - it fills in missing data...
I can't even make this work properly, to produce an example, but never mind.
The text was updated successfully, but these errors were encountered: