Make `ScaleContinuous$map()` faster #5513

teunbrand · 2023-11-08T12:52:34Z

These are 2 small changes:

Using palette(x) directly is faster than only running palette() on the unique x and then matching all values to those.
Using vec_assign() instead of ifelse() is faster and is the main speed boost.

In benchmarks below, 'classic' is the current algorithm, 'direct' is replacing the matching with palette(x) and 'assigned' is the method proposed in this PR.

To throw a single number at it, the ScaleContinuous$map() method in this PR is ~3.7x faster than the current one at 100.000 values to map. Memory allocation is also 20-25% less.

If we have mosty unique values, palette(x) has an advantage over the current algorithm.

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2

classic <- function(x, limits, palette, na.value = 'grey50') {
  x <- rescale(oob_censor(x, limits), limits)
  uniq <- unique0(x)
  pal <- palette(uniq)
  scaled <- pal[match(x, uniq)]
  ifelse(!is.na(scaled), scaled, na.value)
}

direct <- function(x, limits, palette, na.value = "grey50") {
  x <- rescale(oob_censor(x, limits), limits)
  scaled <- palette(x)
  ifelse(!is.na(scaled), scaled, na.value)
}

assigned <- function(x, limits, palette, na.value = "grey50") {
  x <- rescale(oob_censor(x, limits), limits)
  scaled <- palette(x)
  vec_assign(scaled, is.na(scaled), na.value)
}

pal <- gradient_n_pal(viridis_pal()(7))
lim <- c(10, 100)

nvalues <- round(10^seq(0, 7, length.out = 20))

bm <- bench::press(
  nvalues = nvalues,
  rep = 1:2,
  {
    values <- runif(nvalues, 10, 100)
    bench::mark(
      classic  = classic( values, lim, pal),
      direct   = direct(  values, lim, pal),
      assigned = assigned(values, lim, pal),
      min_iterations = 10
    )
  }
)

bm$expression <- as.character(bm$expression)

ggplot(bm) +
  aes(
    x = nvalues, y = as.numeric(median), 
    colour =  expression, 
    group = interaction(rep, expression)
  ) +
  geom_line() +
  scale_y_log10(name = "Seconds") +
  scale_x_log10(name = "Number of values")

If there are only 10 possible unique values, the palette(x) method is no longer faster but on par with the current method. It only seems beneficial to replace the matching strategy, regardless of uniqueness of data.

bm <- bench::press(
  nvalues = nvalues,
  rep = 1:2,
  {
    values <- sample(runif(10, 10, 100), nvalues, replace = TRUE)
    bench::mark(
      classic  = classic( values, lim, pal),
      direct   = direct(  values, lim, pal),
      assigned = assigned(values, lim, pal),
      min_iterations = 10
    )
  }
)

bm$expression <- as.character(bm$expression)

ggplot(bm) +
  aes(
    x = nvalues, y = as.numeric(median), 
    colour =  expression, 
    group = interaction(rep, expression)
  ) +
  geom_line() +
  scale_y_log10(name = "Seconds") +
  scale_x_log10(name = "Number of values")

^{Created on 2023-11-08 with reprex v2.0.2}

Caveat to these benchmarks is that there are no NAs to replace, but I've checked that the vec_assign() advantage persists when there are actual NA values.

teunbrand · 2023-11-08T13:38:58Z

I benchmarked incorrectly, nevermind this, sorry for the noise

make ScaleContinuous$map() faster

a1523bd

teunbrand added the performance label Nov 8, 2023

teunbrand closed this Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `ScaleContinuous$map()` faster #5513

Make `ScaleContinuous$map()` faster #5513

Uh oh!

teunbrand commented Nov 8, 2023 •

edited

Loading

Uh oh!

teunbrand commented Nov 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Make ScaleContinuous$map() faster #5513

Make ScaleContinuous$map() faster #5513

Uh oh!

Conversation

teunbrand commented Nov 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

teunbrand commented Nov 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Make `ScaleContinuous$map()` faster #5513

Make `ScaleContinuous$map()` faster #5513

teunbrand commented Nov 8, 2023 •

edited

Loading