New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

position_nudge does not work with geom_boxplot #2733

Closed
jpasquier opened this Issue Jul 6, 2018 · 14 comments

Comments

Projects
None yet
4 participants
@jpasquier

jpasquier commented Jul 6, 2018

position_nudge seems to not not work with geom_boxplot

library(ggplot2)
df <- data.frame(x = factor("x"), y = 1:10)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(position = position_nudge(x = -0.2, y = 0))
#> Error: position_nudge requires the following missing aesthetics: y

What I want to do eventually, is to place a dotplot next to a boxplot like on this image
tmp
(I manage to produce this image with ggplot2. Oddly, this issue does not occur with some data. That is why I am completely lost...)

I use R 3.5.1 with ggplot2 3.0.0 on debian stretch. The reproducible example was made with reprex.

@ptoche

This comment has been minimized.

ptoche commented Jul 6, 2018

Do you have a reprex of an example that works? In your example you have just one discrete value, does it work if you have two values? Reading the text below made me think that perhaps with one discrete value there isn't enough information to position the nudge (I mean one value does not define a unit, no idea if that's relevant to the code... reading the position-nudge.R, it does look like it just takes a (small) numerical value for x and a value for y. I haven't looked beyond this. Not much useful to say actually.

quoting from https://github.com/tidyverse/ggplot2/blob/master/R/position-nudge.R

#' position_nudge is generally useful for adjusting the position of
#' items on discrete scales by a small amount. Nudging is built in to
#' [geom_text()] because it's so useful for moving labels a small
#' distance from what they're labelling.

@jpasquier

This comment has been minimized.

jpasquier commented Jul 6, 2018

Thank you for your response.

Here, an example where it woks:

library(ggplot2)
df <- data.frame(
  x = factor(c("3m", "1d", "6w", "24m", "preop", "12m", "6m",
               "1d", "3m", "3m", "1d", "3m", "1d", "1d", "3w",
               "3w", "12m", "preop", "3m", "preop"),
             levels = c("preop", "1d", "3w", "6w", "3m", "6m",
                        "12m", "24m")),
  y = c(10, 13, 12, 11, 21, 16, 12, 4, 12, 13, 15, 7, 12,
        15, 10, 16, 9, 18, 14, 30)
)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(width = .3, position = position_nudge(x = -0.2, y = 0),
               outlier.shape=NA) +
  geom_dotplot(binaxis = "y", binwidth = 0.5, dotsize = 0.7)

tmp2

And here a similar example, where it does not work:

library(ggplot2)
df <- data.frame(
  x = factor(c("preop", "3m", "1w", "1d", "1w", "3w", "1d",
               "1d", "1w", "3m", "1d", "1w", "6m", "preop",
               "6m", "6w", "6w", "preop", "3w", "6m"),
               levels = c("preop", "1d", "1w", "3w", "6w",
                          "3m", "6m")),
  y = c(14, 10, 13, 12, 10, 11, 7, 14, 15, 18, 15, 9, 17,
        26, 8, 8, 14, 23, 21, 13)
)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(width = .3, position = position_nudge(x = -0.2, y = 0),
               outlier.shape=NA) +
  geom_dotplot(binaxis = "y", binwidth = 0.5, dotsize = 0.7)
#> Error: position_nudge requires the following missing aesthetics: y

It is odd, isn't it ?

(I used a sample of my data)

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jul 6, 2018

I can confirm this. Simplified reprex below. For one data set the y aesthetic makes it all the way into the final data frame and for the other it doesn't. I can't see yet what triggers this.

library(ggplot2)
df1 <- data.frame(
  x = factor(c("3m", "1d", "6w", "24m", "preop", "12m", "6m",
               "1d", "3m", "3m", "1d", "3m", "1d", "1d", "3w",
               "3w", "12m", "preop", "3m", "preop"),
             levels = c("preop", "1d", "3w", "6w", "3m", "6m",
                        "12m", "24m")),
  y = c(10, 13, 12, 11, 21, 16, 12, 4, 12, 13, 15, 7, 12,
        15, 10, 16, 9, 18, 14, 30)
)

df2 <- data.frame(
  x = factor(c("preop", "3m", "1w", "1d", "1w", "3w", "1d",
               "1d", "1w", "3m", "1d", "1w", "6m", "preop",
               "6m", "6w", "6w", "preop", "3w", "6m"),
             levels = c("preop", "1d", "1w", "3w", "6w",
                        "3m", "6m")),
  y = c(14, 10, 13, 12, 10, 11, 7, 14, 15, 18, 15, 9, 17,
        26, 8, 8, 14, 23, 21, 13)
)

layer_data(ggplot(df1, aes(x = x, y = y)) + geom_boxplot())
#>   ymin lower middle upper ymax outliers notchupper notchlower x PANEL
#> 1   18 19.50   21.0 25.50   30            26.47328  15.526719 1     1
#> 2   12 12.00   13.0 15.00   15        4   15.11979  10.880208 2     1
#> 3   10 11.50   13.0 14.50   16            16.35169   9.648314 3     1
#> 4   12 12.00   12.0 12.00   12            12.00000  12.000000 4     1
#> 5    7 10.00   12.0 13.00   14            14.11979   9.880208 5     1
#> 6   12 12.00   12.0 12.00   12            12.00000  12.000000 6     1
#> 7    9 10.75   12.5 14.25   16            16.41030   8.589700 7     1
#> 8   11 11.00   11.0 11.00   11            11.00000  11.000000 8     1
#>   group  y ymin_final ymax_final  xmin  xmax xid newx new_width weight
#> 1     1 NA         18         30 0.625 1.375   1    1      0.75      1
#> 2     2 NA          4         15 1.625 2.375   2    2      0.75      1
#> 3     3 NA         10         16 2.625 3.375   3    3      0.75      1
#> 4     4 12         12         12 3.625 4.375   4    4      0.75      1
#> 5     5 NA          7         14 4.625 5.375   5    5      0.75      1
#> 6     6 12         12         12 5.625 6.375   6    6      0.75      1
#> 7     7 NA          9         16 6.625 7.375   7    7      0.75      1
#> 8     8 11         11         11 7.625 8.375   8    8      0.75      1
#>   colour  fill size alpha shape linetype
#> 1 grey20 white  0.5    NA    19    solid
#> 2 grey20 white  0.5    NA    19    solid
#> 3 grey20 white  0.5    NA    19    solid
#> 4 grey20 white  0.5    NA    19    solid
#> 5 grey20 white  0.5    NA    19    solid
#> 6 grey20 white  0.5    NA    19    solid
#> 7 grey20 white  0.5    NA    19    solid
#> 8 grey20 white  0.5    NA    19    solid
layer_data(ggplot(df2, aes(x = x, y = y)) + geom_boxplot())
#>   ymin lower middle upper ymax outliers notchupper notchlower x PANEL
#> 1   14 18.50   23.0 24.50   26            28.47328  17.526719 1     1
#> 2    7 10.75   13.0 14.25   15            15.76500  10.235000 2     1
#> 3    9  9.75   11.5 13.50   15            14.46250   8.537500 3     1
#> 4   11 13.50   16.0 18.50   21            21.58614  10.413856 4     1
#> 5    8  9.50   11.0 12.50   14            14.35169   7.648314 5     1
#> 6   10 12.00   14.0 16.00   18            18.46891   9.531085 6     1
#> 7    8 10.50   13.0 15.00   17            17.10496   8.895040 7     1
#>   group ymin_final ymax_final  xmin  xmax xid newx new_width weight colour
#> 1     1         14         26 0.625 1.375   1    1      0.75      1 grey20
#> 2     2          7         15 1.625 2.375   2    2      0.75      1 grey20
#> 3     3          9         15 2.625 3.375   3    3      0.75      1 grey20
#> 4     4         11         21 3.625 4.375   4    4      0.75      1 grey20
#> 5     5          8         14 4.625 5.375   5    5      0.75      1 grey20
#> 6     6         10         18 5.625 6.375   6    6      0.75      1 grey20
#> 7     7          8         17 6.625 7.375   7    7      0.75      1 grey20
#>    fill size alpha shape linetype
#> 1 white  0.5    NA    19    solid
#> 2 white  0.5    NA    19    solid
#> 3 white  0.5    NA    19    solid
#> 4 white  0.5    NA    19    solid
#> 5 white  0.5    NA    19    solid
#> 6 white  0.5    NA    19    solid
#> 7 white  0.5    NA    19    solid

ggplot(df1, aes(x = x, y = y)) +
  geom_boxplot(position = position_nudge(x = -0.2))

ggplot(df2, aes(x = x, y = y)) +
  geom_boxplot(position = position_nudge(x = -0.2))
#> Error: position_nudge requires the following missing aesthetics: y

Created on 2018-07-06 by the reprex package (v0.2.0).

@ptoche

This comment has been minimized.

ptoche commented Jul 6, 2018

Is this related to some na.rm = TRUE that drops y? I notice that even in df1, there are many NAs.

There is a na.rm = TRUE on line 84 of https://github.com/tidyverse/ggplot2/blob/master/R/stat-boxplot.r
but I couldn't say if it's got anything to do with this...

@hadley

This comment has been minimized.

Member

hadley commented Jul 6, 2018

@ptoche you can link to a specific line, like so: https://github.com/tidyverse/ggplot2/blob/master/R/stat-boxplot.r#L84 (just click on the line number to get the link in the address bar)

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jul 6, 2018

It seems the difference is whether there is any group with only a single data point. If there is at least one, the final data set has a y column. That column has NAs everywhere but in the rows corresponding to groups with just a single data point. If there is no such group, the final data set does not have a y column.

The bigger question though is whether required_aes = c("x", "y") is appropriate in position_nudge():

required_aes = c("x", "y"),

What is required is some y aesthetic, but it could be ymin, ymax, etc.

@hadley

This comment has been minimized.

Member

hadley commented Jul 6, 2018

I only imagined position_nudge() working with 0d geoms (like point and text). But it seems reasonable to extend it.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jul 6, 2018

I think the correct way to fix this is to come up with some way to check for classes of aesthetics in Position$setup_data(), i.e., do we have any of c("x", "xmin", "xmax") rather than exactly "x".

ggplot2/R/position-.r

Lines 51 to 54 in 8922e24

setup_data = function(self, data, params) {
check_required_aesthetics(self$required_aes, names(data), snake_class(self))
data
},

I note though that position adjustment in the y direction would likely fail because stat_boxplot() creates aesthetics called lower, middle, upper.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jul 6, 2018

@jpasquier As a workaround for your problem, you can just define a new position adjustment that only goes horizontally.

library(ggplot2)

# horizontal nudge position adjustment
position_hnudge <- function(x = 0) {
  ggproto(NULL, PositionHNudge, x = x)
}

PositionHNudge <- ggproto("PositionHNudge", Position,
  x = 0,
  required_aes = "x",
  setup_params = function(self, data) {
    list(x = self$x)
  },
  compute_layer = function(data, params, panel) {
    transform_position(data, function(x) x + params$x)
  }
)

df <- data.frame(
  x = factor(c("preop", "3m", "1w", "1d", "1w", "3w", "1d",
               "1d", "1w", "3m", "1d", "1w", "6m", "preop",
               "6m", "6w", "6w", "preop", "3w", "6m"),
             levels = c("preop", "1d", "1w", "3w", "6w",
                        "3m", "6m")),
  y = c(14, 10, 13, 12, 10, 11, 7, 14, 15, 18, 15, 9, 17,
        26, 8, 8, 14, 23, 21, 13)
)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(width = .3, position = position_hnudge(x = -0.2),
               outlier.shape=NA) +
  geom_dotplot(binaxis = "y", binwidth = 0.5, dotsize = 0.7)

Created on 2018-07-06 by the reprex package (v0.2.0).

@hadley

This comment has been minimized.

Member

hadley commented Jul 6, 2018

(@clauswilke this is one place where I think a data frame column might make sense - y could be a data frame with columns min, max, mid etc. I'm not proposing we do this, as it would be a huge change to the internals, but I think it's an interesting thought experiment)

@jpasquier

This comment has been minimized.

jpasquier commented Jul 6, 2018

@clauswilke Thank you very much for the workaround. It solves my problem and allows me to produce the figures I need.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jul 7, 2018

It bugged me that I didn't understand why the two data frames give different results, so I hunted it down. The relevant code is here:

ggplot2/R/stat-.r

Lines 107 to 117 in 3d022ed

stats <- mapply(function(new, old) {
if (empty(new)) return(data.frame())
unique <- uniquecols(old)
missing <- !(names(unique) %in% names(new))
cbind(
new,
unique[rep(1, nrow(new)), missing,drop = FALSE]
)
}, stats, groups, SIMPLIFY = FALSE)
do.call(plyr::rbind.fill, stats)

Line 109 finds any data columns that are constant within a group, and the rest of the code then copies those data columns from the data frame before the stat transformation to the data frame after the stat transformation. If a group consists of only 1 value, all data columns meet that condition and are copied over. Then, in line 117, if some of those columns are not constant for other groups, they are filled with NAs for those groups. This explains why y is copied over only when some groups have only one value, and why when that happens all groups with more than one y value have NA in the final y column.

I think this code could use some commenting.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 29, 2018

@hadley How do you feel about addressing this issue by introducing horizontal and vertical nudge position adjustments, just like I did in my workaround? And if we do, how should they be called? hnudge, nudge_h, xnudge, nudge_x?

@clauswilke

This comment has been minimized.

Member

clauswilke commented Aug 29, 2018

Actually, right after I wrote this I realized it should be possible to just write different ggproto objects and pick the appropriate one in the position_nudge() constructor based on whether any of x or y are zero. Let me try that.

clauswilke added a commit to clauswilke/ggplot2 that referenced this issue Aug 29, 2018

clauswilke added a commit that referenced this issue Sep 1, 2018

Make nudging more robust (#2874)
* make nudging more robust. closes #2733.

* add regression tests for position_nudge()

* simplify position_nudge, remove required aesthetics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment