Skip to content

Conversation

@teunbrand
Copy link
Collaborator

@teunbrand teunbrand commented Nov 8, 2023

These are 2 small changes:

  • Using palette(x) directly is faster than only running palette() on the unique x and then matching all values to those.
  • Using vec_assign() instead of ifelse() is faster and is the main speed boost.

In benchmarks below, 'classic' is the current algorithm, 'direct' is replacing the matching with palette(x) and 'assigned' is the method proposed in this PR.

To throw a single number at it, the ScaleContinuous$map() method in this PR is ~3.7x faster than the current one at 100.000 values to map. Memory allocation is also 20-25% less.

If we have mosty unique values, palette(x) has an advantage over the current algorithm.

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2

classic <- function(x, limits, palette, na.value = 'grey50') {
  x <- rescale(oob_censor(x, limits), limits)
  uniq <- unique0(x)
  pal <- palette(uniq)
  scaled <- pal[match(x, uniq)]
  ifelse(!is.na(scaled), scaled, na.value)
}

direct <- function(x, limits, palette, na.value = "grey50") {
  x <- rescale(oob_censor(x, limits), limits)
  scaled <- palette(x)
  ifelse(!is.na(scaled), scaled, na.value)
}

assigned <- function(x, limits, palette, na.value = "grey50") {
  x <- rescale(oob_censor(x, limits), limits)
  scaled <- palette(x)
  vec_assign(scaled, is.na(scaled), na.value)
}

pal <- gradient_n_pal(viridis_pal()(7))
lim <- c(10, 100)

nvalues <- round(10^seq(0, 7, length.out = 20))

bm <- bench::press(
  nvalues = nvalues,
  rep = 1:2,
  {
    values <- runif(nvalues, 10, 100)
    bench::mark(
      classic  = classic( values, lim, pal),
      direct   = direct(  values, lim, pal),
      assigned = assigned(values, lim, pal),
      min_iterations = 10
    )
  }
)

bm$expression <- as.character(bm$expression)

ggplot(bm) +
  aes(
    x = nvalues, y = as.numeric(median), 
    colour =  expression, 
    group = interaction(rep, expression)
  ) +
  geom_line() +
  scale_y_log10(name = "Seconds") +
  scale_x_log10(name = "Number of values")

If there are only 10 possible unique values, the palette(x) method is no longer faster but on par with the current method. It only seems beneficial to replace the matching strategy, regardless of uniqueness of data.

bm <- bench::press(
  nvalues = nvalues,
  rep = 1:2,
  {
    values <- sample(runif(10, 10, 100), nvalues, replace = TRUE)
    bench::mark(
      classic  = classic( values, lim, pal),
      direct   = direct(  values, lim, pal),
      assigned = assigned(values, lim, pal),
      min_iterations = 10
    )
  }
)

bm$expression <- as.character(bm$expression)

ggplot(bm) +
  aes(
    x = nvalues, y = as.numeric(median), 
    colour =  expression, 
    group = interaction(rep, expression)
  ) +
  geom_line() +
  scale_y_log10(name = "Seconds") +
  scale_x_log10(name = "Number of values")

Created on 2023-11-08 with reprex v2.0.2

Caveat to these benchmarks is that there are no NAs to replace, but I've checked that the vec_assign() advantage persists when there are actual NA values.

@teunbrand
Copy link
Collaborator Author

I benchmarked incorrectly, nevermind this, sorry for the noise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant