Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of colors in case variable is 'character' #2429

Closed
manuelreif opened this issue Jan 30, 2018 · 11 comments · Fixed by #3579
Closed

Order of colors in case variable is 'character' #2429

manuelreif opened this issue Jan 30, 2018 · 11 comments · Fixed by #3579
Labels
bug an unexpected problem or unintended behavior good first issue ❤️ good issue for first-time contributors scales 🐍

Comments

@manuelreif
Copy link

manuelreif commented Jan 30, 2018

Hi!

I am using ggplot2 quite often, and I ran into a problem concerning the ordering of colors. So I am setting the breaks and the color values in scale_fill_manual(), and I would expect that the first color "black" would be assigned to the first break ('small') and so on. This seems to happen when the grouping variable is data type = character. I suppose you transform the variable into a factor internally, or something like that.
Is this the intended behaviour, because for me this is really confusing?!

test <- data.frame(a=as.character(gl(n = 4, length = 8, k = 2, labels = letters[1:4])), 
                   b=as.character(gl(n = 2, length = 8, k = 1, labels = c("small",  "big"))),
                   values = sample(10:300, 8))


pl1 <- ggplot(test, aes(x=a, y=values, fill=b))
pl1 + geom_bar(stat="identity", position="dodge") + 
scale_fill_manual("test", breaks=c("small", "big") , values = c("black","tomato2"))

Thank you!!
Manuel

@MajoroMask
Copy link

ggplot2 use factors to solve this. Try test$b <- factor(test$b, levels = unique(test$b)) before plotting. ggplot2 actrually did this for you with default levels argument, which is sorting the atomic vector alphabetically.

@manuelreif
Copy link
Author

I am using ggplot2 for years now, and I never had this problem, apparently because the variables have always been factors. So, to be honest, I am not very happy with this behaviour, because ggplot is doing something for me, which I am not aware about. I think a warning would have helped a lot to save my time.

Wouldn´t it be more sensible to set the levels to the breaks I submitted by scale_fill_manual() ? I don´t know whether this is possible or not.

Thanks.

@hadley
Copy link
Member

hadley commented Apr 27, 2018

Minimal reprex:

library(ggplot2)

df <- data.frame(x = c("red", "black"))

ggplot(df, aes(x, 1, fill = x)) + 
  geom_col() +
  scale_fill_manual(breaks = c("red", "black"), values = c("red", "black"))

@hadley hadley added bug an unexpected problem or unintended behavior scales 🐍 labels Apr 27, 2018
@clauswilke
Copy link
Member

Not sure this is a bug, since it behaves exactly as documented:

values—a set of aesthetic values to map data values to. If this is a named vector, then the values will be matched based on the names. If unnamed, values will be matched in order (usually alphabetical) with the limits of the scale. Any data values that don't match will be given na.value.

Importantly, the documentation also explains how to make this work, by using a named vector:

library(ggplot2)
test <- data.frame(a=as.character(gl(n = 4, length = 8, k = 2, labels = letters[1:4])), 
                   b=as.character(gl(n = 2, length = 8, k = 1, labels = c("small",  "big"))),
                   values = sample(10:300, 8))


ggplot(test, aes(x = a, y = values, fill = b)) +
  geom_bar(stat = "identity", position = "dodge") + 
  scale_fill_manual(
    "test",
    breaks = c("small", "big"),
    values = c("small" = "black", "big" = "tomato2")
  )

However, there is a small modification one could make to make the scale behavior more intuitive: Check if values is a named vector, and if not and breaks are provided, use the breaks as names for values, in order.

@clauswilke clauswilke added the good first issue ❤️ good issue for first-time contributors label May 12, 2018
@mikmart
Copy link
Contributor

mikmart commented May 27, 2018

Following @clauswilke's suggestion, I think this could be implemented by modifying DiscreteScale$map() to match on union(breaks, limits) (rather than just limits) when values is not named; i.e. match first on any specified breaks and then the rest on any remaining limits. (As breaks doesn't necessarily span the domain of the scale, it can't be used alone for the mapping.) This would still result in matching only on limits when breaks is not specified, which is current behaviour.

However, with this change breaks could no longer be used to specify the order in the legend without also modifying the mapping to the aesthetic range. Take for example:

library(ggplot2)

(p <- ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
    geom_point(aes(color = Species)))

Currently you can use breaks to modify the displayed legend without altering the colour mapping:

p + scale_colour_hue(breaks = c("virginica", "versicolor"))

After this change, the mapping would be altered, too (plot below with a branch with the change):

p + scale_colour_hue(breaks = c("virginica", "versicolor"))

For manual scales, this could still be overcome by specifying named values, but there would be no workaround for non-manual scales. As a result, it seems to me that implementing this change would end up taking away an existing feature.

PS. I think a good analogue would be that using breaks for the aesthetic mapping in non-positional discrete scales would be akin to using breaks to determine the order of the values on a discrete position scale.

@clauswilke
Copy link
Member

I was proposing to implement this only for manual scales.

@mikmart
Copy link
Contributor

mikmart commented May 27, 2018

Ah, right --- that makes sense!

I hadn't realized that scales stayed aware of being manual; but looking it over again, I see they could be picked out with self$scale_name == "manual" and the handled differently in DiscreteScale$map(). This way breaks could be used to specify only a subset of the mapping, and the rest could be handled by limits.

Alternatively (and probably more cleanly) if the manual_scale constructor was aware of the breaks, it could assign them as names before creating the palette function, such that breaks = c("small", "big"), values = c("black", "tomato2") would be equivalent to values = c("small" = "black", "big" = "tomato2"). But should manual scales then require that lengths of breaks and values match, like with labels? If implemented with #1497 (I have a draft of this), maybe not: the breaks could specify only the first names for values.

Still, wouldn't it be a bit confusing if these calls gave different results?

species_order <- c("virginica", "versicolor", "setosa")

p + scale_colour_manual(
  breaks = species_order,
  values = scales::hue_pal()(3)
)

p + scale_colour_hue(breaks = species_order)

@clauswilke
Copy link
Member

This is object-oriented programming, so what you'd want to do is make a ggproto object that derives from ScaleDiscrete() and then reimplements map(), just like ScaleDiscretePosition() is implemented:

ScaleDiscretePosition <- ggproto("ScaleDiscretePosition", ScaleDiscrete,

I'll let @hadley chime in whether the feature would be welcome or not. In my own use of scale_manual(), I have certainly struggled with getting the values assigned to the correct breaks and getting the breaks into the right order in the legend. The named-list approach would have helped, but I didn't know it exists and I don't think it is widely known even though it's in the docs.

@mikmart
Copy link
Contributor

mikmart commented May 28, 2018

Okay, I see! A new class for manual scales would also mean that the map() method in ScaleDiscrete could be cleaned up a bit, as it would no longer have to deal with named palettes, right?

Regarding the feature itself, I was just recently reading the ggplot2 book, and this quote from the scales chapter came to mind:

To distinguish breaks from limits, remember that breaks affect what appears
on the axes and legends, while limits affect what appears on the plot.

Using breaks for the aesthetic mapping for manual scales would be an exception to this logic.

Perhaps this would be better addressed with changes in the documentation? The help text for values is already quite long, but perhaps it would help to add emphasis by mentioning sooner that it is the limits that will be used for creating the mapping, if names are not specified?

Something along the lines of:

values—a set of aesthetic values to map data values to. The values will be matched in order (usually alphabetical) with the limits of the scale. If this is a named vector, then the values will be matched based on the names instead. Data values that don't match will be given na.value.

@lock
Copy link

lock bot commented Jun 24, 2020

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

1 similar comment
@lock
Copy link

lock bot commented Jun 24, 2020

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jun 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior good first issue ❤️ good issue for first-time contributors scales 🐍
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants