Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dodging algorithm for box plots #2196

Merged
merged 47 commits into from Jul 28, 2017
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
512fb33
New position calculations for box plots
karawoo Jul 1, 2017
4be63d9
Update documentation
karawoo Jul 6, 2017
54c1a2a
Add test to ensure that variable width boxes don't overlap
karawoo Jul 6, 2017
1f4a984
Don't warn about overlapping intervals for box plots
karawoo Jul 6, 2017
698ced7
Scale boxes all at once, rather than by group
karawoo Jul 9, 2017
10942d1
Scale boxes across all the data based on the max number that need to …
karawoo Jul 9, 2017
8e058b9
Dodge boxes when there's a truly continuous x
karawoo Jul 12, 2017
6c5dd86
Make sure boxes are ordered consistently
karawoo Jul 12, 2017
133ed71
Don't overwrite the n argument passed to pos_boxdodge
karawoo Jul 12, 2017
1e08220
Ensure proper behavior when `preserve = "total"`.
karawoo Jul 13, 2017
502ad20
Modify test for overlapping boxes
karawoo Jul 14, 2017
4707fc3
Add padding between boxes that occupy the same position
karawoo Jul 14, 2017
e6e6f00
Add note to NEWS.md
karawoo Jul 14, 2017
e2b1ded
Merge branch 'master' into position-dodge
karawoo Jul 14, 2017
59c014e
Remove print statement in test :flushed:
karawoo Jul 14, 2017
327597f
Indent code in examples
karawoo Jul 14, 2017
d169dc9
Replace rowMeans with (df$xmin + df$xmax) / 2
karawoo Jul 14, 2017
64c3688
Replace plyr code
karawoo Jul 14, 2017
f916986
Find overlapping groups with a for loop
karawoo Jul 14, 2017
8213782
Change padding to 0.05
karawoo Jul 14, 2017
0efaed7
Modifications that make pos_boxdodge work with geom_rect
karawoo Jul 14, 2017
3da2f49
Make sure elements are placed at the correct x location
karawoo Jul 17, 2017
b689a4c
"boxes" -> "elements" since this is no longer only used for boxes
karawoo Jul 17, 2017
03e3d46
Add bar examples to position_boxdodge documentation
karawoo Jul 17, 2017
029a5d0
Ordering in collide_box needs to be the reverse of what it was to mat…
karawoo Jul 17, 2017
dd78a80
PositionBoxdodge should use find_x_overlaps to find n when x is missing
karawoo Jul 17, 2017
ae594df
Drop extra computed columns when they're no longer needed
karawoo Jul 17, 2017
99fe422
Fix bug that was subtly flipping boxes horizontally
karawoo Jul 17, 2017
30b189a
Add tests for position_boxdodge
karawoo Jul 17, 2017
9f7dcbc
Rename position_boxdodge to position_dodge2
karawoo Jul 20, 2017
d05a7df
Merge branch 'master' into position-dodge
karawoo Jul 20, 2017
dbdc9a9
Update geom-bar documentation to mention position_dodge2()
karawoo Jul 20, 2017
79aedfb
Don't dodge if current xmin is *equal* to previous xmax
karawoo Jul 25, 2017
7c9aec8
Set default padding to 0 for position_dodge2, but override for boxes
karawoo Jul 25, 2017
3706b49
Change default box plot padding back to 0.1
karawoo Jul 25, 2017
10e8616
Merge branch 'master' into position-dodge
karawoo Jul 25, 2017
b059e39
Update position_dodge2 documentation
karawoo Jul 25, 2017
4e2052f
collide_box() does need to reorder the differently than collide() in …
karawoo Jul 25, 2017
09ee427
Rename collide_box() to collide2() to match position_dodge2()
karawoo Jul 25, 2017
04666e7
Return to default padding of 0.1 for position_dodge2()
karawoo Jul 26, 2017
8398e7e
Document position_dodge2 together with position_dodge
karawoo Jul 28, 2017
b7553ac
Merge branch 'master' into position-dodge
karawoo Jul 28, 2017
34002fa
Add description of position_dodge2 to NEWS.md
karawoo Jul 28, 2017
b437848
Revert the order of `preserve` arguments for dodge2
karawoo Jul 28, 2017
a05c3e0
Update dodge examples
karawoo Jul 28, 2017
1cdcc09
Add stats:: before aggregate()
karawoo Jul 28, 2017
9cea3ef
Merge branch 'master' into position-dodge
karawoo Jul 28, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion DESCRIPTION
Expand Up @@ -157,8 +157,9 @@ Collate:
'plot-last.r'
'plot.r'
'position-.r'
'position-collide.r'
'position-dodge.r'
'position-boxdodge.r'
'position-collide.r'
'position-identity.r'
'position-jitter.r'
'position-jitterdodge.R'
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Expand Up @@ -159,6 +159,7 @@ export(GeomTile)
export(GeomViolin)
export(GeomVline)
export(Position)
export(PositionBoxdodge)
export(PositionDodge)
export(PositionFill)
export(PositionIdentity)
Expand Down Expand Up @@ -353,6 +354,7 @@ export(median_hilow)
export(merge_element)
export(panel_cols)
export(panel_rows)
export(position_boxdodge)
export(position_dodge)
export(position_fill)
export(position_identity)
Expand Down
3 changes: 3 additions & 0 deletions NEWS.md
@@ -1,5 +1,8 @@
# ggplot2 2.2.1.9000

* Box plot position is now controlled by `position_boxdodge()` (@karawoo,
#2143).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And bar. And you should add a brief descrption.


* Default colour maps for continuous data are controlled by global options
`ggplot2.continuous.colour` and `ggplot2.continuous.fill`, which can be set to
either `"gradient"` or `"viridis"` (@karawoo).
Expand Down
11 changes: 10 additions & 1 deletion R/geom-boxplot.r
Expand Up @@ -95,7 +95,7 @@
#' )
#' }
geom_boxplot <- function(mapping = NULL, data = NULL,
stat = "boxplot", position = "dodge",
stat = "boxplot", position = "boxdodge",
...,
outlier.colour = NULL,
outlier.color = NULL,
Expand All @@ -110,6 +110,15 @@ geom_boxplot <- function(mapping = NULL, data = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {

# varwidth = TRUE is not compatible with preserve = "total"
if (!is.character(position)) {
if (identical(position$preserve, "total") & varwidth == TRUE) {
warning("Can't preserve total widths when varwidth = TRUE.", call. = FALSE)
position$preserve <- "single"
}
}

layer(
data = data,
mapping = mapping,
Expand Down
116 changes: 116 additions & 0 deletions R/position-boxdodge.r
@@ -0,0 +1,116 @@
#' Position dodge for box plots
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to give this a name that reflects that it works for any geom with variable widths - i.e. it would also work for geom_rect().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

position_vardodge()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe position_flexdodge(), since it's a more flexible dodge that can handle variable width boxes and arbitrary rectangles?

#'
#' Dodging preserves the vertical position of an geom while adjusting the
#' horizontal position. `position_boxdodge` is a special case of
#' `position_dodge` for arranging box plots, which can have variable widths.
#'
#' @include position-dodge.r
#' @inheritParams position_dodge
#' @param padding Padding between boxes at the same position. Boxes are shrunk
#' by this proportion to make room for space between them.
#' @family position adjustments
#' @export
#' @examples
#' ggplot(data = iris, aes(Species, Sepal.Length)) +
#' geom_boxplot(aes(colour = Sepal.Width < 3.2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing indent

#'
#' ggplot(data = iris, aes(Species, Sepal.Length)) +
#' geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = TRUE)
position_boxdodge <- function(width = NULL, preserve = c("single", "total"),
padding = 0.1) {
ggproto(NULL, PositionBoxdodge,
width = width,
preserve = match.arg(preserve),
padding = padding
)
}

#' @rdname ggplot2-ggproto
#' @format NULL
#' @usage NULL
#' @export
PositionBoxdodge <- ggproto("PositionBoxdodge", PositionDodge,
preserve = "single",
padding = 0.1,
setup_params = function(self, data) {
if (is.null(data$xmin) && is.null(data$xmax) && is.null(self$width)) {
warning("Width not defined. Set with `position_boxdodge(width = ?)`",
call. = FALSE)
}

if (identical(self$preserve, "total")) {
n <- NULL
} else {
n <- max(table(data$x))
}

list(
width = self$width,
n = n,
padding = self$padding
)
},

compute_panel = function(data, params, scales) {
collide_box(
data,
params$width,
name = "position_boxdodge",
strategy = pos_boxdodge,
n = params$n,
padding = params$padding,
check.width = FALSE
)
}
)

pos_boxdodge <- function(df, width, n = NULL, padding = 0.1) {

if (!all(c("xmin", "xmax") %in% names(df))) {
df$xmin <- df$x
df$xmax <- df$x
}

# xid represents groups of boxes that share the same position
df$xid <- match(df$x, sort(unique(df$x)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is why it won't work with geom_rect(). It's only a few lines of code, so I think it's worth using the for-loop that I originally proposed


if (is.null(n)) {
# If n is null, preserve total widths of boxes at each position by dividing
# widths by the number of elements at that position
n <- table(df$xid)
df$new_width <- (df$xmax - df$xmin) / n[df$xid]
} else {
df$new_width <- (df$xmax - df$xmin) / n
}

df$xmin <- df$x - (df$new_width / 2)
df$xmax <- df$x + (df$new_width / 2)

# Find the total width of each group of boxes
group_sizes <- plyr::ddply(df, "xid", plyr::summarize, size = sum(new_width))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be replaced by a tapply()? I'd prefer to not use plyr in new code, since one day I'd like to eliminate the dependency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah sure. aggregate() might be even better since it'll return a data frame.


# Starting xmin for each group of boxes
starts <- group_sizes$xid - (group_sizes$size / 2)

# Set the boxes in place
for (i in seq_along(starts)) {
divisions <- cumsum(c(starts[i], df[df$xid == i, "new_width"]))
df[df$xid == i, "xmin"] <- divisions[-length(divisions)]
df[df$xid == i, "xmax"] <- divisions[-1]
}

# x values get moved to between xmin and xmax
df$x <- rowMeans(df[, c("xmin", "xmax")])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think df$x <- (df$xmin + df$xmax) / 2 would be clearer


# If no boxes occupy the same position, there is no need to add padding
if (!any(duplicated(df$xid))) {
return(df)
}

# Shrink boxes to add space between them
df$pad_width <- df$new_width * (1 - padding)
df$xmin <- df$x + (df$pad_width / 2)
df$xmax <- df$x - (df$pad_width / 2)

df
}
33 changes: 30 additions & 3 deletions R/position-collide.r
@@ -1,6 +1,7 @@
# Detect and prevent collisions.
# Powers dodging, stacking and filling.
collide <- function(data, width = NULL, name, strategy, ..., check.width = TRUE, reverse = FALSE) {
collide_setup <- function(data, width = NULL, name, strategy,
check.width = TRUE, reverse = FALSE) {
# Determine width
if (!is.null(width)) {
# Width set manually
Expand All @@ -26,6 +27,15 @@ collide <- function(data, width = NULL, name, strategy, ..., check.width = TRUE,
width <- widths[1]
}

list(data = data, width = width)
}

collide <- function(data, width = NULL, name, strategy,
..., check.width = TRUE, reverse = FALSE) {
dlist <- collide_setup(data, width, name, strategy, check.width, reverse)
data <- dlist$data
width <- dlist$width

# Reorder by x position, then on group. The default stacking order reverses
# the group in order to match the legend order.
if (reverse) {
Expand All @@ -34,7 +44,6 @@ collide <- function(data, width = NULL, name, strategy, ..., check.width = TRUE,
data <- data[order(data$xmin, -data$group), ]
}


# Check for overlap
intervals <- as.numeric(t(unique(data[c("xmin", "xmax")])))
intervals <- intervals[!is.na(intervals)]
Expand All @@ -44,7 +53,7 @@ collide <- function(data, width = NULL, name, strategy, ..., check.width = TRUE,
# This is where the algorithm from [L. Wilkinson. Dot plots.
# The American Statistician, 1999.] should be used
}

if (!is.null(data$ymax)) {
plyr::ddply(data, "xmin", strategy, ..., width = width)
} else if (!is.null(data$y)) {
Expand All @@ -56,3 +65,21 @@ collide <- function(data, width = NULL, name, strategy, ..., check.width = TRUE,
stop("Neither y nor ymax defined")
}
}

collide_box <- function(data, width = NULL, name, strategy,
..., check.width = TRUE, reverse = FALSE) {
dlist <- collide_setup(data, width, name, strategy, check.width, reverse)
data <- dlist$data
width <- dlist$width

# Reorder by x position, then on group. The default stacking order reverses
# the group in order to match the legend order.
if (reverse) {
data <- data[order(data$x, data$group), ]
} else {
data <- data[order(data$x, -data$group), ]
}

pos <- match.fun(strategy)
pos(data, width, ...)
}
2 changes: 1 addition & 1 deletion R/position-dodge.r
Expand Up @@ -23,7 +23,7 @@
#' \donttest{
#' ggplot(diamonds, aes(price, fill = cut)) +
#' geom_histogram(position="dodge")
#' # see ?geom_boxplot and ?geom_bar for more examples
#' # see ?geom_bar for more examples
#'
#' # In this case a frequency polygon is probably a better choice
#' ggplot(diamonds, aes(price, colour = cut)) +
Expand Down
2 changes: 1 addition & 1 deletion R/stat-boxplot.r
Expand Up @@ -14,7 +14,7 @@
#' }
#' @export
stat_boxplot <- function(mapping = NULL, data = NULL,
geom = "boxplot", position = "dodge",
geom = "boxplot", position = "boxdodge",
...,
coef = 1.5,
na.rm = FALSE,
Expand Down
6 changes: 3 additions & 3 deletions man/geom_boxplot.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 11 additions & 10 deletions man/ggplot2-ggproto.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

40 changes: 40 additions & 0 deletions man/position_boxdodge.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 3 additions & 2 deletions man/position_dodge.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/position_identity.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/position_jitter.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.