NAs should be excluded from the level of discrete X-axis by default #1584
Comments
They absolutely should be included by default. The bug is that they aren't in the character case, and that |
Here's a minimal reprex of the 2x2 combinations: df <- data.frame(
x = c("a", "b", NA),
y = 1:3,
stringsAsFactors = FALSE
)
# OK
ggplot(df, aes(x, y)) + geom_point(na.rm = TRUE)
ggplot(df, aes(factor(x), y)) + geom_point()
# Incorrect: NA can be plotted
ggplot(df, aes(x, y)) + geom_point()
# Incorrect: NA should be removed
ggplot(df, aes(factor(x), y)) + geom_point(na.rm = TRUE) |
As per the docs The reason for the discrepancy is that NA is just another level in factors and at the time the data arrives at geom point it is encoded with the integer mapping, whereas characters are still characters and NA thus persists. @hadley Can you comment on the true purpose of |
The goal should always be to display missing values on the plot, where possible. That's not possible for continuous scales, so the best we can do is display a warning. It is possible for categorical scales, so |
In that case we need to change the doc wording a bit...
|
Something like: If |
Is it clear what's needed? |
Yeah - I just need to find the best way to fix this |
Hmm this problem really sits on an intersection between ranges, scales, and geoms, in terms of where responsibility should be placed - lets talk about it in person to find out how it can be solved best... |
I think we should fix this for character vector by fixing We then need to make sure there's some way to actually drop these NA values, because the |
I'll handle this one — I think it also needs some minor changes to non-positional discrete scales for consistency. |
@yutannihilation I know I didn't implement what you requested, but hopefully now the principles are obvious, and they're consistently implemented for the three types of discrete missing value and both types of scale. |
@hadley Thanks for your great fix! (and sorry for being quiet as I couldn't follow this complicated discussion...) I love this consistency |
When the variable for
x
is character,NA
s are not treated as a break of X axis. But when the variable is factor,NA
s are included.I feel this behaviour is inconsistent.
NA
s should be included only whenNA
is intentionally included in the level of the factor variable.I guess this is simply because
scales:::clevels()
treats them differently. But, I'm wondering if this is by design... Why do we have to forceNA
s included, while we can includeNA
in the factor level by ourselves?Is it possible to eliminate NA from factor X-axis by default? (But I'm afraid this suggestion is too late and many people may already rely on this behaviour)
The text was updated successfully, but these errors were encountered: