Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scale_x_binned() doesn't work with geom_tile() #5294

Open
hughjonesd opened this issue May 2, 2023 · 7 comments
Open

scale_x_binned() doesn't work with geom_tile() #5294

hughjonesd opened this issue May 2, 2023 · 7 comments

Comments

@hughjonesd
Copy link

hughjonesd commented May 2, 2023

The following code works, producing output with xmin and xmax aligned to bins:

ggplot(mtcars, aes(mpg, hp)) + 
  geom_rect(aes(xmin = mpg - 5, xmax = mpg + 5, ymin = hp - 5, ymax = hp + 5)) + 
  scale_x_binned(breaks = seq(0,60, 10))

But this code, which ought to do the same thing, produces an empty plot:

ggplot(mtcars, aes(mpg, hp)) + 
  geom_tile(aes(x = mpg, y = hp, height = 10, width = 10)) + 
  scale_x_binned(breaks = seq(0,60, 10))

The underlying reason is in the second call to layout$map_position() in ggplot_build().

  1. There, the binned scale tries to remap x variables back from a factor to (the binned version of) their original values. For GeomRect which has xmax and xmin from the start, this works.
  2. But GeomTile calculates xmin and xmax from x and width. By the time it gets to layer$compute_geom_1, x has been transformed to a "factor"-style numeric of bins. The geom doesn't realise this and happily adds the original width to the bin.
  3. Then the second call to layout$map_position() takes this wonky data and turns it back, typically to NA.
  4. Finally when the geom displays, the NA values for xmin and xmax are removed.

In other words, GeomTile$setup_data() is being called after the first map_position(), but in this case at least, it needs to be called before it.

This bug exists in ggplot2 3.4.2, and also on github main as of today.

@teunbrand
Copy link
Collaborator

Thanks for the report. geom_tile() and geom_rect() indeed aren't equivalent under scale transformations. The binned scale is equivalent to a scale transformation. I agree that the example is undesirable, and we've recently added this bit to the documentation to make the difference more clear:

ggplot2/R/geom-tile.R

Lines 16 to 21 in f7246d4

#' @details
#' `geom_rect()` and `geom_tile()`'s respond differently to scale
#' transformations due to their parametrisation. In `geom_rect()`, the scale
#' transformation is applied to the corners of the rectangles. In `geom_tile()`,
#' the transformation is applied only to the centres and its size is determined
#' after transformation.

@hughjonesd
Copy link
Author

Sure, but that doesn't quite cover it. The size of the tiles isn't being determined after transformation... it's being determined wrongly, and then the tiles aren't being displayed.

I think this is a real bug.
Here's an example where the tiles are actually displayed in the wrong place:

ggp <- ggplot(data.frame(x = 2:4 + 0.5, y = 2:4), aes(x, y)) + geom_tile(width = .8, height = .25)

ggp # These should bin to 2, 3 and 4...

# but in fact...
ggp + scale_x_binned(breaks = 2:4)

@teunbrand
Copy link
Collaborator

teunbrand commented May 4, 2023

I'm sorry I don't quite understand. How are they displayed wrongly? I've rendered an example below.

library(ggplot2)

tiled <- ggplot(data.frame(x = 2:4 + 0.5, y = 2:4), aes(x, y)) + 
  geom_tile(width = .8, height = .25)

tiled

tiled + scale_x_binned(breaks = 2:4)

To me, it seems that geom_rect() is doing the wrong thing with equivalent parametrisation:

rects <- ggplot(data.frame(xmin = 2:4 + 0.1, xmax = 2:4 + 0.9,
                           ymin = 2:4 - 0.125, ymax = 2:4 + 0.125)) +
  geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax))

rects

rects + scale_x_binned(breaks = 2:4)

Created on 2023-05-04 with reprex v2.0.2

@hughjonesd
Copy link
Author

So my thought was: "2.5 - 0.4 = 2.1, should bin to 2; 2.5 + 0.4 = 2.9, should bin to 3". Actually I think the bins ought to be 2.5, 3.5 etc. i.e. midpoints of the breaks. But neither of those things are happening. Indeed, x and y are scaled and then width is not.

Here's a more extreme example:

ggp <- ggplot(data = NULL, aes(x, y)) + 
  geom_tile(data = data.frame(x = c(0, 5, 10), y = 1:3), width = 2, height = .25) + 
  geom_point(data = data.frame(x = c(-1, 1, 4, 6, 9, 11), y = rep(1:3, each = 2)), color = "red")
ggp 

image

ggp + scale_x_binned(breaks = c(0, 5, 10))

image

My expectation would be that the rectangle limits would be (-1, +1); (4,6) and (9, 11). The first and last ones have edges which are out of the limits of the binned scale, so maybe they are dropped, or maybe like the points they are just left alone. The second one would bin to (2.5, 7.5).

In fact: the first rectangle disappears. The second one goes to (-0.5, 7.5). The third one goes to (2.5, 10.5).

I don't think anyone would expect that - why would a rectangle (4,6) be mapped to (-0.5, 7.5) by binning to two bins from 0 to 5 and 5 to 10?

The real reason is that the first call to map_position has mapped x to c(1,2,3), representing the levels. Then the width gets calculated from this, creating xmin of c(0,1,2) and xmax of c(2,3,4). The second call to map_position then translates these back to their corresponding bin centres, creating
xmin = c(NA, -0.5, 2.5) and xmax = c(2.5, 7.5, 10.5).

I don't think anyone who hasn't read the source code will understand this, or be able to use it for any practical purpose.

So yeah, the disclaimer in the documentation is better than nothing, but I think it would be simpler to just put "geom_tile doesn't work with binned scales".

Similar concerns apply with a logged scale:

ggp <- ggplot(data = NULL, aes(x, y)) + ylim(0,2)+
        geom_tile(data = data.frame(x = 10, y = 1), width = 2, height = .25) +
        geom_point(data = data.frame(x = c(9, 11), y = 1), color = "green")
ggp

image

ggp + scale_x_log10()

image

This makes it look as if 9 is 1 and 11 is 100. Again, you can say that it is working according to the documentation, but the point is, how is it meant to represent data?

My expectation as a user would be that I can use geom_tile to represent some data. Then if I choose to put that data on a log scale, or bin it or whatever, geom_tile keeps displaying the same answers using the new scale.

@jfmusso
Copy link

jfmusso commented May 8, 2023

Does this also mean that geom_tile does not work with discrete scales (scale_x_discrete)? I'm struggling to get my plotted data into the correct categories on the X axis.

@teunbrand
Copy link
Collaborator

So my thought was: "2.5 - 0.4 = 2.1, should bin to 2; 2.5 + 0.4 = 2.9, should bin to 3". Actually I think the bins ought to be 2.5, 3.5 etc. i.e. midpoints of the breaks.

I think that binning works slightly different than you're expecting here. It is more of a findInterval() situation than 'snap to nearest break'.

The underlying reason that geom_tile() doesn't behave like geom_rect(), is that the width and height are not position aesthetics, and thus aren't transformed by scales. So a width = 2 on a log10 scale spans 2 orders of magnitude. While admittedly not great for scale transforms, this parametrisation does allow it to work with many stats seamlessly.

@jfmusso It works for discrete scales because you can combine continuous values on a discrete scale (but not the other way around). Discrete position scales are esstentially seq_along(limits), so there is 1 axis unit between each level and a width = 2 spans 2 level's worth of axis.

@hughjonesd
Copy link
Author

Perhaps is one issue that there are different potential users for GeomTile? I get that it might be useful for developers who want to e.g. place something at x,ywith a "real" onscreen width. But this makes it hard to understand for end users, who have to think in terms of two different sets of coordinates.

Perhaps it might be helpful to separate the two functionalities, and provide a public-facing version of geom_tile that indeed works in data coordinates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants