New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descent heights still not right #2687

Closed
clauswilke opened this Issue Jun 7, 2018 · 22 comments

Comments

Projects
None yet
5 participants
@clauswilke
Member

clauswilke commented Jun 7, 2018

An issue to be considered after 2.3.0: I have noticed that descent heights are still not correct. The fault lies in this line:

descent <- descentDetails(temp)

It looks like a reasonable line, but it turns out that the function descentDetails() ignores the font settings of the grob it is given:

library(grid)
descentDetails(textGrob("pqgj", gp = gpar(fontsize = 5)))
#> [1] 0.0350748697916667inches
descentDetails(textGrob("pqgj", gp = gpar(fontsize = 10)))
#> [1] 0.0350748697916667inches
descentDetails(textGrob("pqgj", gp = gpar(fontsize = 20)))
#> [1] 0.0350748697916667inches

Instead it takes the font settings from the currently active viewport. The following works correctly:

pushViewport(viewport(gp = gpar(fontsize = 5)))
descentDetails(textGrob("pqgj"))
#> [1] 0.0146145290798611inches
popViewport()

pushViewport(viewport(gp = gpar(fontsize = 10)))
descentDetails(textGrob("pqgj"))
#> [1] 0.0292290581597222inches
popViewport()

pushViewport(viewport(gp = gpar(fontsize = 20)))
descentDetails(textGrob("pqgj"))
#> [1] 0.0584581163194444inches
popViewport()

The result in ggplot is that we're getting the exact same descent height every time, regardless of theme settings:

library(ggplot2)
df <- data.frame(x = 1)
ggplot(df, aes(x, x)) + geom_point() +
  ggtitle("gjpjQ") +
  theme(plot.title = element_text(size = 10))

ggplot(df, aes(x, x)) + geom_point() +
  ggtitle("gjpjQ") +
  theme(plot.title = element_text(size = 20))

ggplot(df, aes(x, x)) + geom_point() +
  ggtitle("gjpjQ") +
  theme(plot.title = element_text(size = 100))

The reason why things seem to work correctly is because the default fontsize is 12, so we're getting a descent height that is approximately right for most typical font sizes.

I have started collecting some grid code for text handling here, in the hopes that some of the code will be useful for future ggplot2 development. Among the code, there is a set of functions to correctly calculate descent heights: https://github.com/clauswilke/gridtext/blob/master/R/grob-descent.R

The approach I'm using works correctly for arbitrary fonts, fontsizes, angles, and vjust / hjust values.

library(grid)
library(gridtext)
library(tibble)

label_data <- tibble(
  label = c("Descenders: pgqjy"),
  x = unit(c(.2, .5, .8), "npc"),
  y = unit(c(.8, .5, .1), "npc"),
  box_hjust = c(0, 0.5, 1),
  box_vjust = c(0, 0.5, 1),
  fontsize = c(20, 17, 14),
  fontfamily = c("Comic Sans MS", "Helvetica", "Times New Roman"),
  angle = c(0, 35, -35)
)

grid.newpage()
g <- labels_grob(label_data, debug = TRUE)
grid.draw(g)

@hadley

This comment has been minimized.

Member

hadley commented Jun 7, 2018

We should also engage with @pmur002 and see if we can make this better in grid itself.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jun 7, 2018

Indeed, @pmur002, regardless of whether descentDetails() should or should not be modified, it would be tremendously helpful if a function could be added to grid that simply returns relevant font metrics for a given font. Something like this:
https://github.com/clauswilke/gridtext/blob/ac404acd3a3a28f1b78fddf288478b82a0e3423c/R/grob-descent.R#L17-L21

We would like the same descent height for any string rendered with a specified font family, size, and face, and getting that by making a text grob, then replacing the label, and then calculating properties thereof seems backwards. In fact, in my code I added a simple lookup table that caches descent values for different font parameters, and this is about 100 times faster than going the viewport/grob route every time:
https://github.com/clauswilke/gridtext/blob/ac404acd3a3a28f1b78fddf288478b82a0e3423c/R/grob-descent.R#L29-L55

@baptiste

This comment has been minimized.

Contributor

baptiste commented Jun 7, 2018

cf also #1882
(this part of the issue got overlooked)

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jun 7, 2018

@baptiste Thanks for the link. The gap between legend title and legend should be fine now either way, because I added code to calculate it based on font size:

vgap <- height_cm(theme$legend.spacing.y %||% (0.5 * unit(title_fontsize, "pt")))

library(ggplot2)
set.seed(123)
dummy <- data.frame(x=runif(10), y=runif(10), f = gl(2,5))
ggplot(dummy, aes(x,y, shape=f)) + 
  geom_point() +
  labs(shape="guide") +
  theme(legend.title = element_text(size = 28))

(The code based on grob height did cause weird alignment issues with multiple legends for me.)

I'm still not entirely happy with the spacing choices for guides, but they are at least acceptable in most cases.

@baptiste

This comment has been minimized.

Contributor

baptiste commented Jun 7, 2018

@clauswilke indeed, this particular issue got fixed; I just wanted to point out that the descenders issue has been recurring (and closed issues generally disappear from sight).

How does "grid2" sound? IMO it'd be nice to sit down and think of a long-term rewrite of grid, learning from current feature requests, evolving needs and technologies (modern devices, for instance, notably bypassing grDevices), and also reimplementing some bottlenecks, harmonising/merging with gtable, etc.

@pmur002

This comment has been minimized.

Contributor

pmur002 commented Jun 8, 2018

@clauswilke, an R-level function that returns metric info sounds reasonable, but my first thought is that this should be something in 'grDevices' rather than 'grid'. I will take a look.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jun 8, 2018

@pmur002 Sure, grDevices instead of grid would be fine.

While we're at it, I have noticed that the value returned by grobAscent() can exceed the value returned by grobHeight(). Is this correct? I thought the grob height was the maximum ascent at the given font settings.

library(grid)
pushViewport(viewport(gp = gpar(fontsize = 20)))
convertHeight(grobHeight(textGrob("abc")), "pt")
#> [1] 14.3700927734375pt
convertHeight(grobAscent(textGrob("abc")), "pt")
#> [1] 14.3700927734375pt
convertHeight(grobHeight(textGrob("Q")), "pt")
#> [1] 14.3700927734375pt
convertHeight(grobAscent(textGrob("Q")), "pt")
#> [1] 14.624951171875pt
popViewport()

Created on 2018-06-07 by the reprex package (v0.2.0).

@thomasp85

This comment has been minimized.

Member

thomasp85 commented Jun 14, 2018

Maybe only tangentially related, but something to keep in mind. Experimentation with profiling gganimate code have shown that not rendering labels more than doubles the rendering speed of ggplot objects and the lion share of that falls into the call to descentDetails in titleGrob. If it is not possible to improve calculation of font metrics dramatically we should consider some sort of memoisation...

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jun 14, 2018

@thomasp85 I was hinting at this issue in my comment above about the lookup table that caches descent values. It gives a 100x speed increase on average, though you wouldn't see the whole effect in ggplot2 because my function also pushes and pops viewports to get the descent details right and ggplot2 currently doesn't do that.

library(gridtext)
library(microbenchmark)
gp <- grid::gpar(fontfamily = "Helvetica", fontsize = 12, fontface = "plain")
microbenchmark(
  gridtext:::lookup_font_details(gp),
  gridtext:::calc_font_details(gp)
)
#> Unit: microseconds
#>                                expr      min       lq       mean   median
#>  gridtext:::lookup_font_details(gp)   11.805   19.605   61.93275   27.079
#>    gridtext:::calc_font_details(gp) 2072.938 2428.107 2869.23680 2612.757
#>        uq       max neval
#>    37.491  3165.092   100
#>  3012.104 14423.639   100

Created on 2018-06-14 by the reprex package (v0.2.0).

I've been wondering more broadly whether reimplementing the whole label rendering system will make things faster or slower on the whole. The current code is overly complex, I think, but it also has issues, and fixing the issues might on the whole slow things down rather than speed them up.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jun 14, 2018

To elaborate on the code complexity issue: To get descent heights right, we need to place each individual piece of text into a separate grob and create a viewport that includes a gap for the descent height. For rotated text, we then rotate the entire viewport. This works beautifully, as shown above, and removes the need for the awkward trigonometry in the current code, which isn't correct for all vjust settings and which is also not correct for multi-label grobs:

ggplot2/R/margins.R

Lines 69 to 73 in 4db5122

# Use trigonometry to calculate grobheight and width for rotated grobs. This is only
# exactly correct when vjust = 1. We need to take the absolute value so we don't make
# the grob smaller when it's flipped over.
text_height <- unit(1, "grobheight", text_grob) + abs(cos(angle / 180 * pi)) * descent
text_width <- unit(1, "grobwidth", text_grob) + abs(sin(angle / 180 * pi)) * descent

Now, it would be easy to just add margins to the text label at the same time, into the same viewport, as seen here:

library(grid)
library(gridtext)
library(tibble)

label_data <- tibble(
  label = c("Descenders: pgqjy"),
  x = unit(c(.2, .5, .8), "npc"),
  y = unit(c(.8, .5, .1), "npc"),
  box_hjust = c(0, 0.5, 1),
  box_vjust = c(0, 0.5, 1),
  padding = list(ggplot2::margin(5, 10, 5, 10)),
  fontsize = c(20, 17, 14),
  fontfamily = c("Comic Sans MS", "Helvetica", "Times New Roman"),
  angle = c(0, 35, -35)
)

grid.newpage()
g <- labels_grob(label_data, debug = TRUE)
grid.draw(g)

However, this means that the margins are rotated in the frame of the label, which is not the current ggplot2 convention. To retain the current ggplot2 convention, we need to wrap the labels into another grob with its own viewport.

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jun 14, 2018

One more example, how things like x and y axes could be generated. All of this works in principle. The challenge will be to integrate into ggplot2 without breaking currently existing themes.

library(grid)
library(gridtext)
library(tibble)

label_data <- tibble(
  label = c("This", "is", "a", "test", "x-axis"),
  x = unit(c(.1, .3, .5, .7, .9), "npc"),
  y = unit(0.8, "npc"),
  box_hjust = 0.5,
  box_vjust = 1,
  hjust = 0.5,
  vjust = 1,
  angle = 0,
  fontsize = 10, fontfamily = "Comic Sans MS",
  padding = list(mar(5, 5, 3, 5)),
  margin = list(mar(5, 5, 5, 5))
)
grid.newpage()
grid.draw(labels_grob(label_data, align_frames = TRUE, debug = TRUE))

label_data$angle <- 45
label_data$hjust <- 1
label_data$box_hjust <- 1
label_data$box_vjust <- 0.5
label_data$y <- grid::unit(0.55, "npc")
grid.draw(labels_grob(label_data, align_frames = TRUE, debug = TRUE))

label_data$angle <- 90
label_data$hjust <- 0
label_data$y <- unit(0.3, "npc")
grid.draw(labels_grob(label_data, align_frames = TRUE, debug = TRUE))

Created on 2018-06-14 by the reprex package (v0.2.0).

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jun 14, 2018

One more comment about speed. For any somewhat complex rendering task, I think it makes sense to do all internal calculation in a fixed unit and convert to grid units only once, rather than doing calculations in grid units. The grid units are elegant but not that efficient, due to all the overhead they carry.

A very simple example follows, and things will only get worse the more unit calculations and conversions need to be done.

library(grid)
microbenchmark::microbenchmark(
  5 + 7,
  unit(5 + 7, "pt"),
  unit(5, "pt") + unit(7, "pt")
)
#> Unit: nanoseconds
#>                           expr   min      lq     mean  median      uq
#>                          5 + 7    83   124.5   184.49   177.5   221.5
#>              unit(5 + 7, "pt") 25132 26028.5 34192.16 26675.5 32975.5
#>  unit(5, "pt") + unit(7, "pt") 55266 56836.0 64465.85 58249.0 60979.0
#>     max neval
#>     687   100
#>  577149   100
#>  256298   100

Created on 2018-06-14 by the reprex package (v0.2.0).

@thomasp85

This comment has been minimized.

Member

thomasp85 commented Jun 14, 2018

I think with a proper caching mechanism it doesn’t matter much if your implementation is a bit slower as it will only impact the first plot (most font settings are reused across plots and can thus be fetched from the cache)

@thomasp85

This comment has been minimized.

Member

thomasp85 commented Jun 14, 2018

Agree on the units - I’ve long wanted something more performant. There might be instances where it is not possible to get the dimension before render time, but it should be possible to abstract that away and do as much work unit-less

@clauswilke

This comment has been minimized.

Member

clauswilke commented Jun 14, 2018

@thomasp85 Yes, it caches across plots. What I was trying to say was that the realized speed gain in ggplot2 will be less than the benchmark indicates, because the current, buggy code in ggplot2 is faster than the correct but uncached code would be.

@thomasp85

This comment has been minimized.

Member

thomasp85 commented Jun 14, 2018

Sure, but only for the first plot (unless you change theme)

@pmur002

This comment has been minimized.

Contributor

pmur002 commented Jun 15, 2018

@clauswilke Regarding the difference between grobHeight() and grobAscent(), they can differ for rotated text because grobHeight() is based on the bounding box whereas grobAscent() is just the metric info (though both are ultimately based on the same GEMetricInfo() function). Your examples all produce the same result for me - can you provide more system context (or, better, an example that fails on a Linux system so that I can easily replicate it) ?

@pmur002

This comment has been minimized.

Contributor

pmur002 commented Jun 15, 2018

In defense of the units concept (there is no defence for their performance), staying in units (and viewports) can make it easier to post-modify 'ggplot2' output (modify things and query where things are and add new things).

@pmur002

This comment has been minimized.

Contributor

pmur002 commented Jun 15, 2018

@clauswilke Not sure if this is still an issue, but with regard to the descentDetails() "problem", I think the issue is that you are calling descentDetails() directly. You should not do that. If you use "grobDescent" units (the grobDescent() function) things work better ...

> convertHeight(grobDescent(textGrob("pqgj", gp = gpar(fontsize = 5))), "pt")
[1] 1.50799606299213pt
> convertHeight(grobDescent(textGrob("pqgj", gp = gpar(fontsize = 10))), "pt")
[1] 2.26199409448819pt
> convertHeight(grobDescent(textGrob("pqgj", gp = gpar(fontsize = 20))), "pt")
[1] 4.52398818897638pt
@clauswilke

This comment has been minimized.

Member

clauswilke commented Jun 15, 2018

@pmur002 I like the unit concept and I don't advocate for getting rid of it throughout ggplot2. However, for certain grobs we write it may make sense to not use units internally. This is particularly the case for grobs that render text, put margins around it, etc. For example, for my crazy idea of writing an html renderer in grid, doing all the coordinate calculations internally in units is not necessary and would make things way too slow.

In any case, would it be possible to rewrite much of the unit code in C to speed it up?

Thanks for your pointer to grobDescent(). I wasn't aware of this function.

For the difference between grobHeight() and grobAscent(), I'm using OS X and I don't have easy access to a Linux system right now. It's only a minor difference, though, so not something I'm particularly worried about.

@thomasp85

This comment has been minimized.

Member

thomasp85 commented Jun 15, 2018

@pmur002 regarding grid/unit performance I think there has been less push to improve speed as it has been fast enough for generating static plots. I would be very interested in helping out with a general grid performance improvement exercise and see how far we can push it, as I’m hitting a wall when trying to create animation in real time.

I’m also looking into developing more performant graphic devices, but I think the whole stack (ggplot2, grid, device) needs performance improvements in order for high fps animation rendering

@pmur002

This comment has been minimized.

Contributor

pmur002 commented Jun 17, 2018

Yep, units are not performance oriented :) But neither is much of that stack. It was very much written for static output. However, happy to hear about areas where performance could be gained, without loss of features or unacceptable increase in complexity. I know that some of the 'grid' stuff that I wrote almost 20 years ago is embarrassingly awful, so there may be some stuff that is just "dumb slow". Agree that the whole stack would need work - even if you got a really fast device going, the rest of it would kill you.

thomasp85 added a commit to thomasp85/ggplot2 that referenced this issue Nov 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment