New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New dodging algorithm for box plots #2196
Conversation
Can you include an example with continuous x? I think that's what I was thinking of for the for loop. |
It also seems like we need a little padding between the individual elements. Could we make that a parameter? |
Ohhh continuous x. Yeah you're right, this doesn't work well for that. library("ggplot2")
## Without varwidth = TRUE it's ok aside from the padding issue
ggplot(diamonds, aes(y = price)) +
geom_boxplot(aes(x = cut_width(carat, 0.5), fill = depth < 65)) ## With the changes from this PR and varwidth = TRUE
ggplot(diamonds, aes(y = price)) +
geom_boxplot(aes(x = cut_width(carat, 0.5), fill = depth < 65), varwidth = TRUE) Here's how the second plot looks with the current dev version of ggplot. ggplot(diamonds, aes(y = price)) +
geom_boxplot(aes(x = cut_width(carat, 0.5), fill = depth < 65), varwidth = TRUE)
#> Warning: position_dodge requires non-overlapping x intervals for loop it is, then! |
Oh, and there is a larger problem with this: each pair of boxes is getting scaled separately to the group width, which makes the varying widths from |
Note that the order of the bars in the second example of the OP is different from the rest (I've confirmed it with a checkout of your current position-dodge branch). |
Thanks for pointing that out, @mcol. Right now the boxes get placed in the order in which they appear in the data passed to |
Also, my example above doesn't really have a continuous x variable since set.seed(582)
dat <- data.frame(x = sample(4:5, size = 20, replace = TRUE),
y = rnorm(20),
class = sample(c("a", "b"), size = 20, replace = TRUE))
ggplot(dat, aes(x = x, y = y)) +
geom_boxplot(aes(group = interaction(x, class), fill = class), varwidth = TRUE) |
This is still wrong though
…be dodged from one another i.e. if there are 2 boxes per group all widths get divided by 2
Alright, here is the new box plot algorithm. Below are a bunch of examples of how it looks in practice with continuous and discrete x variables and various combinations of library("ggplot2")
## From the original example:
ggplot(data = iris, aes(Species, Sepal.Length)) +
geom_boxplot(aes(colour = Sepal.Width < 3.2)) ggplot(data = iris, aes(Species, Sepal.Length)) +
geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = TRUE) ## Diamonds data -- padding between boxes is less noticeable
ggplot(diamonds, aes(y = price)) +
geom_boxplot(aes(x = cut_width(carat, 0.5), fill = depth < 65), varwidth = TRUE)
## A plot with truly continuous x
set.seed(582)
dat <- data.frame(x = sample(4:5, size = 20, replace = TRUE),
y = rnorm(20),
class = sample(c("a", "b"), size = 20, replace = TRUE))
ggplot(dat, aes(x = x, y = y)) +
geom_boxplot(aes(group = interaction(x, class), fill = class)) ## Truly continuous x and varwidth = TRUE
ggplot(dat, aes(x = x, y = y)) +
geom_boxplot(aes(group = interaction(x, class), fill = class), varwidth = TRUE) ## Preserve total width
ggplot(mtcars, aes(factor(cyl), y = mpg, fill = factor(vs))) +
geom_boxplot(position = position_boxdodge(preserve = "total")) ## Preserve individual width with varwidth = TRUE
ggplot(mtcars, aes(factor(cyl), y = mpg, fill = factor(vs))) +
geom_boxplot(position = position_boxdodge(preserve = "single"), varwidth = TRUE) ## preserve = "total" is incompatible with varwidth = TRUE
ggplot(mtcars, aes(factor(cyl), y = mpg, fill = factor(vs))) +
geom_boxplot(position = position_boxdodge(preserve = "total"), varwidth = TRUE)
#> Warning: Can't preserve total widths when varwidth = TRUE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Results look great!
R/position-boxdodge.r
Outdated
@@ -0,0 +1,116 @@ | |||
#' Position dodge for box plots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to give this a name that reflects that it works for any geom with variable widths - i.e. it would also work for geom_rect()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
position_vardodge()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe position_flexdodge()
, since it's a more flexible dodge that can handle variable width boxes and arbitrary rectangles?
R/position-boxdodge.r
Outdated
#' @export | ||
#' @examples | ||
#' ggplot(data = iris, aes(Species, Sepal.Length)) + | ||
#' geom_boxplot(aes(colour = Sepal.Width < 3.2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing indent
R/position-boxdodge.r
Outdated
} | ||
|
||
# xid represents groups of boxes that share the same position | ||
df$xid <- match(df$x, sort(unique(df$x))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is why it won't work with geom_rect()
. It's only a few lines of code, so I think it's worth using the for-loop that I originally proposed
R/position-boxdodge.r
Outdated
df$xmax <- df$x + (df$new_width / 2) | ||
|
||
# Find the total width of each group of boxes | ||
group_sizes <- plyr::ddply(df, "xid", plyr::summarize, size = sum(new_width)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be replaced by a tapply()
? I'd prefer to not use plyr in new code, since one day I'd like to eliminate the dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah sure. aggregate()
might be even better since it'll return a data frame.
R/position-boxdodge.r
Outdated
} | ||
|
||
# x values get moved to between xmin and xmax | ||
df$x <- rowMeans(df[, c("xmin", "xmax")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think df$x <- (df$xmin + df$xmax) / 2
would be clearer
tests/testthat/test-geom-boxplot.R
Outdated
overlaps <- vector(length = nrow(d) - 1) | ||
for (i in 2:nrow(d)) { | ||
if (d$xmin[i] < d$xmax[i - 1]) { | ||
print(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to remove this print()
tests/testthat/test-geom-boxplot.R
Outdated
|
||
overlaps <- vector(length = nrow(d) - 1) | ||
for (i in 2:nrow(d)) { | ||
if (d$xmin[i] < d$xmax[i - 1]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for loop could become (with a small modification) a separate group_overlapping
function, and you could use above.
The default padding looks slightly too big to my eyes — maybe try 0.05. It would also be handy to show a couple of examples using with other geoms (i.e. |
I did some more research and it'll take a bit more work before |
Ok. Let's make it the default for boxplots and bars and then take a look at the others later. |
I've made some slight changes to hopefully make this as easy to use for both boxes and bars as possible. Something has gone wrong again with the ordering of the boxes/bars compared to the legend, I'll track that down and fix it. ggplot(data = iris, aes(Species, Sepal.Length)) +
geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = TRUE) ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
geom_bar(position = "dodge2") |
This is looking great! (Although now padding of 0.05 looks too small. Maybe switch back to 0.10?) |
…order to match the legend order
Soooo close. I am rethinking the defaults that I changed above. Right now this adds padding between the boxes because ggplot(mtcars, aes(factor(cyl), y = mpg, fill = factor(vs))) +
geom_boxplot() But if someone wants to change the ggplot(mtcars, aes(factor(cyl), y = mpg, fill = factor(vs))) +
geom_boxplot(position = position_dodge2(preserve = "total")) which results in no padding (they'd need to also add |
Yeah, I think that's a good idea. I rather like the padding between bars too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to merge once you've made the news and doc tweaks.
NEWS.md
Outdated
@@ -1,5 +1,8 @@ | |||
# ggplot2 2.2.1.9000 | |||
|
|||
* Box plot position is now controlled by `position_dodge2()` (@karawoo, | |||
#2143). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And bar
. And you should add a brief descrption.
R/position-dodge2.r
Outdated
@@ -0,0 +1,154 @@ | |||
#' Alternate method for dodging overlapping objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's worth documenting together with position_dodge()
?
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Fixes #2143. This creates a new ggproto object for box plot positions,
PositionBoxdodge
, and a functionpos_boxdodge()
that can handle regular and variable-width boxes.collide()
gets split into several functions.collide_setup()
does the initial setup of the data and gets called by bothcollide()
(which behaves the same as it has been) orcollide_box()
.@hadley this is slightly different than the algorithm we talked about because it doesn't use a for loop to find the non-overlapping groups. The
x
value in the data already seems to capture that information, so I think we can avoid the loop, but maybe I'm missing something...With these changes, box plots look like this: