Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should limits be enforced on the calculated geoms? #2887

Closed
yutannihilation opened this issue Sep 10, 2018 · 18 comments · Fixed by #3421
Closed

Should limits be enforced on the calculated geoms? #2887

yutannihilation opened this issue Sep 10, 2018 · 18 comments · Fixed by #3421
Labels
documentation scales 🐍 tidy-dev-day 🤓

Comments

@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Sep 10, 2018

(This is originally reported on #2879, and I asked on RStudio Community. But, I couldn't find any reasonable answer so far...)

When the x axis is continuous and has limits, geom_bar() and geom_col() won't automatically consider the width of bars, which means the bars at both ends disappear even if the values are within the range of the limits. I feel this behaviour is counter-intuitive, whereas this won't happen with discrete scales.

library(ggplot2)

d <- data.frame(x = 1:3)

# bars at both ends disappear
ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous(limit = c(1, 3))
#> Warning: Removed 2 rows containing missing values (geom_bar).

# no bars disappear
ggplot(d) +
  geom_bar(aes(factor(x))) +
  scale_x_discrete(limit = c("1", "2", "3"))

Created on 2018-09-10 by the reprex package (v0.2.0).

Though I don't come up with how to control whether or not to ignore limits, IIUC about the semantics, limits should not be enforced on the computed geoms, but on the values before calculation.

After some investigation, I'm wondering if it's possible to reset limits as long as range of the scale after computing the widths of the geoms (compute_geom_1()) and computing positions (compute_position()). More preciesely, reset_scales() here:

ggplot2/R/plot-build.r

Lines 68 to 78 in f636f34

# Reparameterise geoms from (e.g.) y and width to ymin and ymax
data <- by_layer(function(l, d) l$compute_geom_1(d))
# Apply position adjustments
data <- by_layer(function(l, d) l$compute_position(d, layout))
# Reset position scales, then re-train and map. This ensures that facets
# have control over the range of a plot: is it generated from what is
# displayed, or does it include the range of underlying data
layout$reset_scales()
layout$train_position(data, scale_x(), scale_y())

Though I don't come up with how to implement this yet, does my explanation make sense?

@hadley
Copy link
Member

@hadley hadley commented Sep 10, 2018

I'm pretty sure it worked that way in the past, and it was a deliberate choice to change it. Regardless, I don't think it's possible to make this sort of very large change to how ggplot2 computation works with out much thought and study, and I doubt the payoff is worth it for this one case.

@yutannihilation
Copy link
Member Author

@yutannihilation yutannihilation commented Sep 10, 2018

Thanks, I didn't know the history. I see. I don't want to brave a danger to change the current behaviour, but just want to see if it's possible to add some optional processing (though it seems very difficult). Sorry for filing this issue before enough study...

But, one thing, I moderately disagree that this is only for this one case. While I agree that this is not so common issue, I suspect it might be not too rare when a geom or position calculation goes beyond the limits.

I'm going to close this soon, but let me keep this open for a while until I find the history...

@batpigandme
Copy link
Member

@batpigandme batpigandme commented Sep 10, 2018

I suspect it might be not too rare when a geom or position calculation goes beyond the limits.

I think that's the reason for the warning.

@yutannihilation
Copy link
Member Author

@yutannihilation yutannihilation commented Sep 10, 2018

Hmm, but, I don't think it's enough because the user don't know how not to let the bars get removed. I don't complain about that there's no hint in the current warning message, but about that there's no general way to know priorly how much margins to add to ensure the geoms are included...

@batpigandme
Copy link
Member

@batpigandme batpigandme commented Sep 10, 2018

there's no general way to know priorly how much margins to add to ensure the geoms are included...

Other than not specifying the limits manually, you mean?

library(ggplot2)

d <- data.frame(x = 1:3)

ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous()

Created on 2018-09-10 by the reprex package (v0.2.0.9000).

@yutannihilation
Copy link
Member Author

@yutannihilation yutannihilation commented Sep 10, 2018

Thanks, in this case, I can just remove limits argument since it actually doesn't make sense to set the limit c(1, 3) whereas the data's range is c(1, 3). But, it's not possible when limits actually filters the data. I have to dplyr::filter() beforehand to limit the data. (this is what I got suggested in RStudio Community)

@clauswilke
Copy link
Member

@clauswilke clauswilke commented Sep 10, 2018

If we set aside for a moment whether it is feasible or advisable to fix this, I think we should agree that there is an issue. The behavior is rather counter-intuitive. See examples below.

library(ggplot2)

d <- data.frame(x = 1:3)

# expanding scale doesn't help
ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous(limits = c(1, 3), expand = c(0, 1))
#> Warning: Removed 2 rows containing missing values (geom_bar).

# making the limits somewhat bigger doesn't help
ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous(limits = c(0.6, 3.4))
#> Warning: Removed 2 rows containing missing values (geom_bar).

# it only works if I make the limits wide enough so the
# unexpanded limits contain the entire bar widths
ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous(limits = c(0.5, 3.5))

Created on 2018-09-10 by the reprex package (v0.2.0).

@batpigandme
Copy link
Member

@batpigandme batpigandme commented Sep 10, 2018

The behavior is rather counter-intuitive.

For sure! I've been following the community thread and the previous issue. The units aren't absolute, though, correct? I ask because, in lieu of actually fixing it (which doesn't seem near-term), we could try to better document the behaviour and/or give a more helpful error message.

@hadley
Copy link
Member

@hadley hadley commented Sep 10, 2018

My gut feeling is that it is counter-intutive for this example, but if you change the behaviour it will just be counter-intuitve for a different case, so the net change would not be uniformly positive. My very vague recollection is this is something to do with ylim() correctly truncating the standard error ribbon of geom_smooth().

I think Chesteron's fence applies here; until we know why the code works this way, we shouldn't consider changing it.

@clauswilke
Copy link
Member

@clauswilke clauswilke commented Sep 10, 2018

@hadley Sure. Let's just document behavior for now. Another example follows, using geom_smooth().

In terms of potential fixes, one possibility could be via opt-in: Geoms could be given the option to declare aesthetics for which scale limits should not be enforced, so that for those specific aesthetics scale limits would behave similarly to coord limits. Then, at least the interaction of calculated aesthetics with scale expansion would appear to be more predictable.

library(ggplot2)

# complete confidence band is shown
ggplot(mtcars, aes(hp, mpg)) + 
  geom_smooth(level = 0.99)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

# part of the confidence band is missing, even though
# there is sufficient space to show it
ggplot(mtcars, aes(hp, mpg)) + 
  geom_smooth(level = 0.99) +
  scale_y_continuous(limits = c(10.4, 33.9), expand = c(0, 10))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Created on 2018-09-10 by the reprex package (v0.2.0).

@clauswilke
Copy link
Member

@clauswilke clauswilke commented Sep 10, 2018

Here is a reason for why the code may currently work the way it does: Internally modifying the limits that have been set explicitly by the user would be a bad design choice. It would inevitably lead to unpredictable behavior where it's entirely unclear how the scale range was obtained. If I provide explicit limits (and explicit expansion values), I should be able to rely on getting a scale with a defined range.

So any fix has to be about not removing computed aesthetics that still would fit into the final (expanded) scale range, not about changing the internal limits.

@yutannihilation
Copy link
Member Author

@yutannihilation yutannihilation commented Sep 10, 2018

So any fix has to be about not removing computed aesthetics that still would fit into the final (expanded) scale range, not about changing the internal limits.

Thanks, agreed. Apparently, my proposed fix was too naive and thoughtless, sorry... 🙄

@thomasp85
Copy link
Member

@thomasp85 thomasp85 commented Apr 23, 2019

Can we agree that a fix for this will be a better documentation of the limits argument with a clear statement that setting limits on positional scales can remove data, sometimes unpredictably, and if the purpose is to zoom, one should use the limit argument in the coordinate system?

@hadley
Copy link
Member

@hadley hadley commented Apr 23, 2019

Yes, agreed. This would be a good TDD activity.

@thomasp85
Copy link
Member

@thomasp85 thomasp85 commented Apr 23, 2019

Great - I’ll remove it from the milestone then

@thomasp85 thomasp85 removed this from the ggplot2 3.2.0 milestone Apr 23, 2019
@thomasp85 thomasp85 added tidy-dev-day 🤓 documentation labels Apr 23, 2019
@yutannihilation
Copy link
Member Author

@yutannihilation yutannihilation commented Apr 23, 2019

I don't agree it's the "fix" (at least before knowing the technical details why this is wontfix), but I do agree we can close this issue by making documentation better because I feel I can never come up with the real fix to this problem... Thanks for the suggestion!

@paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented May 27, 2019

I've solved this problem from the user end a few times by setting the oob on the scale:

library(ggplot2)

d <- data.frame(x = 1:3)
ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous(limit = c(1, 3), oob = function(x, limits) x)

Created on 2019-05-27 by the reprex package (v0.2.1)

@lock
Copy link

@lock lock bot commented Jan 4, 2020

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation scales 🐍 tidy-dev-day 🤓
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants