Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in stage()/after_stat() with scale transformations. #4155

Closed
teunbrand opened this issue Jul 31, 2020 · 8 comments · Fixed by #4194
Closed

Possible bug in stage()/after_stat() with scale transformations. #4155

teunbrand opened this issue Jul 31, 2020 · 8 comments · Fixed by #4194
Labels
bug an unexpected problem or unintended behavior scales 🐍

Comments

@teunbrand
Copy link
Collaborator

I think I might have found a bug with either the scales or the stage()/after_stat() functions in combination with dates.
I came across this when making a custom stat where I wanted to put default_aes = aes(xmin = after_stat(start), xmax = after_stat(end)) in the ggproto methods, where start and end are computed variables.

What happens is that when I try to apply a stage() or after_stat() to a date variable, it gives me an error that the date transformation is invalid. I don't think it is a problem with stats, as it also shows up with stat_identity().
The attempted plot below is not very interesting, but it is the most minimal example I could come up with.

library(ggplot2)

ggplot(economics) +
  geom_point(aes(x = stage(date, after_stat = x), 
                 y = unemploy))
#> Error: Invalid input: date_trans works with objects of class Date only

The error above doesn't happen for integer or numeric data.

ggplot(economics) +
  geom_point(aes(x = stage(as.integer(date), after_stat = x), 
                 y = unemploy))

And also the error can be rescued by formatting the output as a date again, but this seems overly verbose and seems difficult to anticipate in a custom stat.

ggplot(economics) +
  geom_point(aes(x = stage(date, after_stat = structure(x, class = "Date")), 
                 y = unemploy))

The same is true for other geoms and aesthetics as well. Here are similar examples with after_stat() instead of stage().

ggplot(economics) +
  geom_segment(aes(x = date, xend = after_stat(x + 30),
                   y = unemploy, yend = unemploy))
#> Error: Invalid input: date_trans works with objects of class Date only
ggplot(economics) +
  geom_segment(aes(x = as.integer(date), xend = after_stat(x + 30),
                   y = unemploy, yend = unemploy))

ggplot(economics) +
  geom_segment(aes(x = date, xend = after_stat(structure(x + 30, class = "Date")),
                   y = unemploy, yend = unemploy))

Created on 2020-07-31 by the reprex package (v0.3.0)

@yutannihilation
Copy link
Member

I'm yet to figure out what's happening, but probably we need to somehow bypass this transformation in the end of map_statistic() because the non-stat value should be already transformed.

ggplot2/R/layer.r

Lines 326 to 330 in 3be0acc

# Transform the values, if the scale say it's ok
# (see stat_spoke for one exception)
if (self$stat$retransform) {
stat_data <- scales_transform_df(plot$scales, stat_data)
}

Actually, abusing retransform param makes this work (this parameter seems meaningless as it is always TRUE in the current version, btw). I guess you are facing more complex case than this example. Could you try retransform <- FALSE works in your case?

library(ggplot2)

l <- geom_point(aes(x = stage(date, after_stat = x), 
                    y = unemploy))
l$stat$retransform <- FALSE

ggplot(economics) + l

Created on 2020-08-14 by the reprex package (v0.3.0)

@teunbrand
Copy link
Collaborator Author

Yes, the retransform trick totally works in my case, see example below. Note that the after_stat() occurs in the default_aes ggproto member and not in the layer mapping argument.

I never noticed the retransform setting, but it seems useful to my problem. Bit selfish/off-topic question maybe, but would there be any downsides to setting retransform = FALSE at the ggproto level?

I still have the gut feeling that the after_stat() + date class should work by default though, but I guess it is not a common use case. Another idea might also be to adjust the scales::date_trans()$transform() method to accept any numeric input, but that might cause more problems than it fixes.

library(ggh4x) # https://github.com/teunbrand/ggh4x
#> Loading required package: ggplot2

p <- ggplot(economics, aes(date)) +
  stat_rle(
    aes(label = date > as.Date("1980-01-01") & date < as.Date("1990-01-01")),
    align = "center"
  ) +
  geom_line(aes(y = unemploy))

# Gives error
print(p)
#> Error: Invalid input: date_trans works with objects of class Date only

p$layers[[1]]$stat$retransform <- FALSE

# Output is fine
print(p)

Created on 2020-08-14 by the reprex package (v0.3.0)

@yutannihilation
Copy link
Member

I still have the gut feeling that the after_stat() + date class should work by default though

Yes, I totally agree with you! Sorry probably my comment was confusing. I meant

  1. The staged values should be bypass the transformation as probably it is already transformed (I'm not 100% sure about this yet, though...).
  2. Though we need to introduce a proper fix here, retransform, which seems an unused parameter, is a quick and dirty way to test if bypassing works. I didn't mean this is a workaround, but just wanted to see there's more thing to consider in more complex cases.

would there be any downsides to setting retransform = FALSE at the ggproto level?

I bet there's no downsides. It appears only the following two places:

ggplot2/R/stat-.r

Lines 52 to 56 in 112f960

Stat <- ggproto("Stat",
# Should the values produced by the statistic also be transformed
# in the second pass when recently added statistics are trained to
# the scales
retransform = TRUE,

ggplot2/R/layer.r

Lines 328 to 330 in 3aa2937

if (self$stat$retransform) {
stat_data <- scales_transform_df(plot$scales, stat_data)
}

@yutannihilation
Copy link
Member

Another example why I think this should be fixed. The value becomes 4 because it gets sqrt-ed twice.

library(ggplot2)

d <- data.frame(value = 16)

ggplot(d) +
  geom_point(aes(value, 0), colour = "red", size = 10) +
  geom_point(aes(stage(value, after_stat = x), 0), colour = "black", size = 10) +
  scale_x_sqrt(limits = c(0, 16), breaks = c(0, 4, 16))

@teunbrand
Copy link
Collaborator Author

Ah yes that result indeed doesn't make sense, so the bug applies to non-date transformations too. I should probably generalise the title of the issue for any transformation. As an aside, shouldn't the 0-break/label have been rendered too?

@teunbrand teunbrand changed the title Possible bug in stage()/after_stat() with dates. Possible bug in stage()/after_stat() with scale transformations. Aug 15, 2020
@thomasp85 thomasp85 added bug an unexpected problem or unintended behavior scales 🐍 labels Aug 31, 2020
@yutannihilation
Copy link
Member

As an aside, shouldn't the 0-break/label have been rendered too?

Good catch, thanks! I've filed another issue here: #4193

@teunbrand
Copy link
Collaborator Author

Great stuff @yutannihilation, thanks for solving the issue!

@yutannihilation
Copy link
Member

Thank you, too! Your exploration really helped. I'm glad that we solved this at last!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior scales 🐍
Projects
None yet
3 participants