Winston Chang edited this page Jun 12, 2012 · 13 revisions

I've added an implementation of Wilkinson-style dot plots with geom_dotplot.

The parameters are:

  • binwidth: width of the bins.
  • method: dotdensity (default), which uses the dot-density algorithm from Wilkinson (1999), or histodot, which uses fixed-width bins, just like a regular histogram.
  • stackratio: vertical spacing between dots, relative to the dot diameter (default 1).
  • dotsize: diameter of each dot, relative to the maximum bin width (for dotdensity) or the bin width (for histodot).
  • binaxis Which axis to bin along. "x" (default) or "y".
  • stackdir: The way to stack the dots. "up", "down", "center", or "centerwhole". See below for examples.
  • It also supports alpha, colour, and fill.
  • binpositions: This is used for dot-density binning. bygroup (default) tells is to find bin positions for each group. all tells it to find bin positions across all groups. This is used for aligning dot stacks across groups.
  • stackgroups: should dots be stacked across groups? This has the effect that ‘position = "stack"’ should have, but can't (because this geom has some odd properties).

There are some weird things about it at this point:

  • If stacking along the y axis (binning along x), the y axis label is "count" and the y axis has a total range of 1, but these are meaningless. You can hide them with scale_y_continuous(name = "", breaks = NA).

This happens becuase the dots are stacked visually, which may or may not align with the y tick marks. Unfortunately, it's not possible with ggplot2 to align the circles to a a y scale. (To see how this works, try resizing the window vertically -- the dots stay visually stacked but the y scaling changes.)

Other notes:

  • Coord transforms (other than coord_flip) don't work. At this point I don't know if they even make sense conceptually for these objects.
  • With dot-density binning, it is possible for dot stacks to overlap with each other, by up to 50% of the dot width. This is a consequence of the binning algorithm. Wilkinson also mentions a smoothing algorithm, but I haven't implemented this yet.


# Generate data
dat <- data.frame(x=rnorm(20), y=rnorm(20))

Bin along x axis

# Stack vertically
dp1 <- ggplot(dat, aes(x)) + geom_rug() + scale_x_continuous(breaks=seq(-4,4,.4))
dp1 + geom_dotplot(binwidth=.4)
# Notice each dot stack is centered over a set of observations. The binning is done with
# Wilkinson's (1999) dot density algorithm. 'binwidth' sets the maximum bin width.
# The y range is set from 0 to 1, but the y axis scale actually has nothing
# to do within y positioning of the dots. The dot diameter is the same as the maximum
# bin width and they're stacked visually; if you resize the window to make it taller
# or shorter, they stay visually stacked. You could resize the window so that the dots
# align with the tick marks

# Use histodot binning
# This uses the algorithm from stat_bin: with fixed-width intervals. However, I
# couldn't directly use stat_bin because I needed to generalize the binning to work
# along x and y.
dp1 + geom_dotplot(binwidth=.4, method="histodot")

# Squish together vertically with smaller stackratio
dp1 + geom_dotplot(binwidth=.4, stackratio=.8)

# Dot diameter expanded to 1.4 * max binwidth. Stacking stays so that they're just touching
dp1 + geom_dotplot(binwidth=.4, dotsize=1.4)

Stacking methods

# stack up (default)
dp1 + geom_dotplot(binwidth=.4, stackdir="up")

# stack down
dp1 + geom_dotplot(binwidth=.4, stackdir="down")

# stack center
dp1 + geom_dotplot(binwidth=.4, stackdir="center")

# stack centerwhole - add one dot up, then one down, then one up, etc.
dp1 + geom_dotplot(binwidth=.4, stackdir="centerwhole")

Bin along y axis

To bin along the y axis, you need to set binaxis="y".

# Y direction
dp1y <- ggplot(dat, aes(x=0, y=y)) + geom_rug() + scale_y_continuous(breaks=seq(-4,4,.4))

dp1y +  geom_dotplot(binwidth=.4, binaxis="y", stackdir="center")

# Y direction, stack centerwhole
dp1y +  geom_dotplot(binwidth=.4, binaxis="y", stackdir="centerwhole")

Grouped data

# New data with x and g as factors
dat2 <- data.frame(x=LETTERS[1:3], y=rnorm(90), g=LETTERS[1:2])
# Plot with groups on x axis
dp2 <- ggplot(dat2, aes(x=x, y=y)) + scale_y_continuous(breaks=seq(-4,4,.4))

dp2 + geom_dotplot(binwidth=.25, binaxis="y", stackdir="centerwhole")

# Groups on x axis with violins (also smaller bin size)
dp2 + geom_violin() + 
  geom_dotplot(binwidth=.15, binaxis="y", stackdir="center")

# With boxplots and violins
dp2 + geom_violin() + 
  geom_boxplot(width=.2, outlier.colour=NA) +
  geom_dotplot(alpha=.3, binwidth=.15, binaxis="y", stackdir="center")

# Above corresponding box plots
# This uses a little hack to move the dots above the boxplots
dp2 + geom_boxplot(width=.4) +
  geom_dotplot(aes(x=as.numeric(x)+.2, group=x),
               binwidth=0.15, binaxis="y", stackdir="up") +

# Beside corresponding box plots
# This uses a hack to move the dot clusters and  boxplots: convert their x-values
# to continuous, then make the continuous axis look like it is discrete
dp2 +
  geom_boxplot(aes(x=as.numeric(x) - 0.2, group=x), width=0.4) +
  geom_dotplot(aes(x=as.numeric(x) + 0.2, group=x),
               binwidth=0.15, binaxis="y", stackdir="center") +
  scale_x_continuous(breaks=1:nlevels(dat2$x), labels=levels(dat2$x))

# Dodging, mapping "x" to fill instead of x
ggplot(dat2, aes(x="foo", y=y, fill=x)) + scale_y_continuous(breaks=seq(-4,4,.4)) +
  geom_dotplot(binwidth=.25, alpha=.4, position="dodge", binaxis="y", stackdir="center")

# grouping on x and g, dodging
ggplot(dat2, aes(x=x, y=y, fill=g)) + scale_y_continuous(breaks=seq(-4,4,.4)) +
  geom_dotplot(binwidth=.2, alpha=.2, position="dodge", binaxis="y", stackdir="center")
# These clusters don't have an "real" x width, so dodging is a bit weird. In this case
# the clusters are too close together, but if you just make the window wider, the clusters
# will move apart (within each cluster the dots will stay together).

# Stacking groups, using dotdensity
ggplot(dat2, aes(x=y, fill=x)) +
  geom_dotplot(binwidth=.25, stackgroups=TRUE, binpositions="all")

# Stacking groups, using histodot
ggplot(dat2, aes(x=y, fill=x)) +
  geom_dotplot(binwidth=.25, stackgroups=TRUE, method="histodot")

# Stacking groups, using histodot, along y axis
ggplot(dat2, aes(x=1, y=y, fill=x)) +
  geom_dotplot(binaxis="y", binwidth=.25, stackgroups=TRUE, method="histodot")