Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geom_boxplot() with data rows >1 and stat = "identity" fails #3316

Closed
paleolimbot opened this issue May 8, 2019 · 5 comments
Closed

geom_boxplot() with data rows >1 and stat = "identity" fails #3316

paleolimbot opened this issue May 8, 2019 · 5 comments
Milestone

Comments

@paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented May 8, 2019

In the revdep checks (#3303) at least one plot in an example failed to build that used geom_boxplot(stat = "identity") with more than one data row.

library(ggplot2)
ggplot(
  data.frame(x = "one value", y = 3, value = 4:6),
  aes(x, ymin = 0, lower = 1, middle = y, upper = value, ymax = 10)
) + 
  geom_boxplot(stat = "identity", position = "identity")
#> Error: Elements must equal the number of rows or 1

In 3.1.1, this works as expected (position dodge would make more sense here, but I wanted to make sure this wasn't the issue).

library(ggplot2)
ggplot(
  data.frame(x = "one value", y = 3, value = 4:6),
  aes(x, ymin = 0, lower = 1, middle = y, upper = value, ymax = 10)
) + 
  geom_boxplot(stat = "identity", position = "identity")

@paleolimbot paleolimbot added this to the ggplot2 3.2.0 milestone May 8, 2019
@paleolimbot paleolimbot changed the title geom_boxplot() with data rows >1 fails and stat = "identity" geom_boxplot() with data rows >1 and stat = "identity" fails May 8, 2019
@clauswilke
Copy link
Member

@clauswilke clauswilke commented May 9, 2019

This is a problem with the group aesthetic. We infer grouping based on the categorical variables, and x has only one value, so all data gets thrown into one group. However, we need three groups here for proper drawing. Not sure why this worked in the past.

library(ggplot2)

# without group aesthetic
ggplot(
  data.frame(x = "one value", y = 3, value = 4:6),
  aes(x, ymin = 0, lower = 1, middle = y, upper = value, ymax = 10)
) + 
  geom_boxplot(stat = "identity")
#> Error: Elements must equal the number of rows or 1

# with group aesthetic
ggplot(
  data.frame(x = "one value", y = 3, value = 4:6),
  aes(x, group = 1:3, ymin = 0, lower = 1, middle = y, upper = value, ymax = 10)
) + 
  geom_boxplot(stat = "identity")

Created on 2019-05-08 by the reprex package (v0.2.1)

The solution is that the package authors need to supply a group aesthetic. I don't think there's anything to be done on our side.

@paleolimbot
Copy link
Member Author

@paleolimbot paleolimbot commented May 9, 2019

I will contact the developer...there are some other things about that particular plot that could be improved.

On our end, GeomBoxplot$draw_group() implicitly assumes that nrow(data) == 1 without checking it.

ggplot2/R/geom-boxplot.r

Lines 211 to 220 in 92d2777

whiskers <- new_data_frame(c(
list(
x = c(data$x, data$x),
xend = c(data$x, data$x),
y = c(data$upper, data$lower),
yend = c(data$ymax, data$ymin),
alpha = c(NA_real_, NA_real_)
),
common
), n = 2)

If the previous version of new_data_frame() supported recycling, this probably would have worked with any length data. We could

  • Do nothing and hope nobody ever does this again
  • Support multiple rows
  • Keep only the first row and drop the rest with a warning about setting a group aesthetic
  • Error and tell the user to set a group aesthetic

I'd be happy to PR any of those if it's worth doing.

@clauswilke
Copy link
Member

@clauswilke clauswilke commented May 9, 2019

A couple of comments, in no particular order:

  • This problem will only occur when people don't use geom_boxplot() with its own stat, so it's likely rather rare.
  • The correct way to support multiple rows is via groups.
  • If you add a test, it needs to have very low overhead, because this test will be run for every single boxplot that is drawn.
  • I don't see a good case for taking just the first row and moving on. geom_boxplot() is encountering data in a format that it doesn't understand and that doesn't make sense, so it should error out.

paleolimbot added a commit to paleolimbot/ggplot2 that referenced this issue May 9, 2019
paleolimbot added a commit to paleolimbot/ggplot2 that referenced this issue May 9, 2019
clauswilke added a commit that referenced this issue May 9, 2019
…an 1 row (#3321)

* fix typo in geom_boxplot comment

* better error message when geom_boxplot() gets data with more than 1 row (Fixes #3316)

* Fix typo, Fixes #3316
@paleolimbot
Copy link
Member Author

@paleolimbot paleolimbot commented Jun 3, 2019

I've contacted the developer of HistDAWass with a suggestion of to fix this problem in their package code.

@lock
Copy link

@lock lock bot commented Nov 30, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Nov 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants