Optimize Histogram.Boundaries.exponential #7966

myazinn · 2023-03-30T12:45:20Z

Depending on the amount of required buckets, creation of Boundaries might be up to 2-3 times faster. Metric.timer will benefit a lot from it.

Here's a benchmark that I used (I doubt it's reasonable to use 1k buckets, it's here just for completeness).

import org.openjdk.jmh.annotations._
import zio.Chunk

import java.util.concurrent.TimeUnit

@State(Scope.Benchmark)
@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(1)
@Measurement(iterations = 10, timeUnit = TimeUnit.SECONDS, time = 1)
@Warmup(iterations = 10, timeUnit = TimeUnit.SECONDS, time = 1)
@Threads(1)
class ZIOMetricBench {

  val start = 1.0

  val factor = 2

  @Param(Array("1", "10", "100", "1000"))
  var count: Int = _

  @Benchmark
  def Current: Unit = Chunk.fromArray(0.until(count).map(i => start * Math.pow(factor, i.toDouble)).toArray)

  @Benchmark
  def Iterate: Unit = Chunk.iterate(start, count)(_ * factor)

}

[info] Benchmark                (count)  Mode  Cnt      Score     Error  Units
[info] BoundariesBench.Current        1  avgt   10     53.327 ±   3.942  ns/op
[info] BoundariesBench.Current       10  avgt   10    291.636 ±   3.451  ns/op
[info] BoundariesBench.Current      100  avgt   10   3042.538 ±  76.611  ns/op
[info] BoundariesBench.Current     1000  avgt   10  35358.496 ± 510.720  ns/op
[info] BoundariesBench.Iterate        1  avgt   10     44.207 ±   0.927  ns/op
[info] BoundariesBench.Iterate       10  avgt   10    146.497 ±   8.731  ns/op
[info] BoundariesBench.Iterate      100  avgt   10    955.949 ±  20.623  ns/op
[info] BoundariesBench.Iterate     1000  avgt   10   7497.474 ± 143.413  ns/op

myazinn · 2023-03-30T12:46:25Z

core-tests/shared/src/test/scala/zio/metrics/MetricSpec.scala

@@ -223,6 +223,16 @@ object MetricSpec extends ZIOBaseSpec {
          r1 <- base.tagged(MetricLabel("dyn", "x")).value
          r2 <- base.tagged(MetricLabel("dyn", "xyz")).value
        } yield assertTrue(r0.count == 0L, r1.count == 1L, r2.count == 1L)
+      },
+      test("linear boundaries") {


Turns out linear boundaries are good as is, but I've already written a test and decided not to remove it

adamgfraser

Thank you! 🙏

jdegoes · 2023-03-30T14:24:53Z

@myazinn Can you add this benchmark in your pull request?

adamgfraser · 2023-03-30T14:31:52Z

This only occurs when a metric is created so it shouldn't be material to most applications though fine to optimize it.

adamgfraser

I think if we are going to add a benchmark it should actually be a benchmark of using a histogram versus just two arbitrary operations on chunks.

adamgfraser · 2023-03-30T19:34:23Z

Also agree with you that 1,000 buckets with an exponential distribution does not seem realistic. Even with 64 buckets and a factor of two the largest bucket would be larger than Long.MaxValue.

myazinn · 2023-03-30T19:46:10Z

This only occurs when a metric is created so it shouldn't be material to most applications though fine to optimize it.

@adamgfraser yes, but if not treated carefully, metrics might be created quite often. E.g.

def updateLag(topic: String, partition: Int): Histogram[Long] =
  Metric
    .histogram("topic_lag", Boundaries.exponential(1, 2, 20))
    .contramap[Long](_.toDouble)
    .tagged(MetricLabel("topic", topic), MetricLabel("partition", partition.toString))

val consumer: Consumer = ??? // from zio-kafka

def process(a: Any): UIO[Unit] = ZIO.debug(a)

def consume(r: ConsumerRecord[_, _]): UIO[Unit] =
  for {
    now <- Clock.currentTime(ChronoUnit.MILLIS)
    _ <- updateLag(r.topic, r.partition).update(now - r.timestamp)
    _ <- process(r.value)
  } yield ()

consumer.consumeWith(topics("topic1", "topic2"), Serde.byteArray, Serde.byteArray)(consume(_))

There're ways how it can be mitigated (e.g. "cache" metric by making a val without metrics and then just add metrics in a def), but still it's a quite common issue.

It should actually be a benchmark of using a histogram versus just two arbitrary operations on chunks.

Could you please give an example? Not sure I understand what to do

Even with 64 buckets and a factor of two the largest bucket would be larger than Long.MaxValue.

Agree, but at the moment Metric.timer uses 100 buckets by default 🤔 Not 1k, but still a lot

adamgfraser · 2023-03-30T19:55:04Z

@myazinn Metrics are already automatically cached so in the example above the metric would only be constructed once for every topic and partition and hopefully we are processing a lot more than one message for each topic and partition so again I think it would not be material.

An example would be create a histogram and update it with a bunch of values.

Will take a look at Metric.timer.

myazinn · 2023-03-30T20:17:57Z

@adamgfraser I could be wrong, but it doesn't seem like it's cached, at least not in the way that I'm talking about.
We create a new metric object each time we call a topicLag in my example, same is true for timers. Here's what I'm talking about
https://scastie.scala-lang.org/ZMBWbjCXS26wlIsINYVcsQ
New metric object - new boundaries each time, which is expensive now.
Same is here
https://scastie.scala-lang.org/OURb5kfoRUidqh9Vs9uqUg
I'm assuming you are talking about calling .update(???) multiple times on already created object. In that case, the object is constructed only once. But it's too easy to accidentally construct a new metric object on each call if you make it def for any reason.

adamgfraser · 2023-03-30T20:24:41Z

No the metric is only created a single time but the metric key may have to be created multiple times.

myazinn · 2023-03-30T20:37:28Z

@adamgfraser oh, yeah, you are right, I's all about metric key. Sorry.
But anyway we need boundaries to call zio.metrics.MetricKey#histogram, and while it is possible to store metric key in a val and use it to create a metric with different tags (though there's no such method for timer now), it's a bit awkward and I imagine a lot of people wouldn't do that.

adamgfraser · 2023-03-30T20:41:52Z

Yes the real problem is that we are not being lazy enough in the construction of the chunk in the metric key. Will take a look at that but we may be limited in our ability to fix that due to binary compatibility.

Anyway, here is a benchmark that we can use:

package zio

import org.openjdk.jmh.annotations.{Scope => JScope, _}
import zio.metrics._

import java.util.concurrent.TimeUnit

@State(JScope.Thread)
@BenchmarkMode(Array(Mode.Throughput))
@OutputTimeUnit(TimeUnit.SECONDS)
@Measurement(iterations = 5, timeUnit = TimeUnit.SECONDS, time = 3)
@Warmup(iterations = 5, timeUnit = TimeUnit.SECONDS, time = 3)
@Fork(value = 3)
class MetricBenchmarks {

  @Benchmark
  def exponentialStatic(): Unit = {
    val metric = Metric.histogram("exponential", MetricKeyType.Histogram.Boundaries.exponential(1.0, 2.0, 64))
    var i = 0
    while (i <= 100000) {
      metric.update(i.toDouble)
      i += 1
    }
  }

  @Benchmark
  def exponentialDynamic(): Unit = {
    var i = 0
    while (i <= 100000) {
      Metric.histogram("exponential", MetricKeyType.Histogram.Boundaries.exponential(1.0, 2.0, 64)).update(i.toDouble)
      i += 1
    }
  }
}

Constructing the array more efficiently is definitely faster but creating the metric once is dramatically faster.

myazinn · 2023-03-30T20:45:21Z

Thank you for the help with benchmark :) Will update a MR with it soon

Yeah I agree that it's faster, I was about to send my updated example on how it can be fixed
https://scastie.scala-lang.org/t5RgCI4wTdqSDLLwOXB7Ag
But again, it's easy to forget about it

adamgfraser · 2023-03-30T20:47:00Z

Totally! Creating metrics statically is always going to be faster but we definitely want creating metrics dynamically to be as fast as possible

myazinn · 2023-03-30T20:57:59Z

I've updated MR, thanks again for helping with benchmark
BTW, since .update is an effect, it's unused and doesn't actually update a metric. I think for this benchmark it's not necessary anyway, but highlighting it just in case.

adamgfraser · 2023-03-30T20:59:40Z

That's a great point. I was thinking of the unsafe variant. Yes we should compose the workflows and run them.

myazinn · 2023-03-30T21:00:59Z

Intellij has an inspection for it now 😄 Alright, let me update a MR

adamgfraser · 2023-03-30T21:02:18Z

I think something like this:

package zio

import org.openjdk.jmh.annotations.{Scope => JScope, _}
import zio.BenchmarkUtil._
import zio.metrics._

import java.util.concurrent.TimeUnit

@State(JScope.Thread)
@BenchmarkMode(Array(Mode.Throughput))
@OutputTimeUnit(TimeUnit.SECONDS)
@Measurement(iterations = 5, timeUnit = TimeUnit.SECONDS, time = 3)
@Warmup(iterations = 5, timeUnit = TimeUnit.SECONDS, time = 3)
@Fork(value = 3)
class MetricBenchmarks {

  @Benchmark
  def exponentialStatic(): Unit = {
    val metric = Metric.histogram("exponential", MetricKeyType.Histogram.Boundaries.exponential(1.0, 2.0, 64))
    unsafeRun(ZIO.foreachDiscard(1 to 100000)(i => metric.update(i.toDouble)))
  }

  @Benchmark
  def exponentialDynamic(): Unit = {
    unsafeRun(ZIO.foreachDiscard(1 to 100000)(i => Metric.histogram("exponential", MetricKeyType.Histogram.Boundaries.exponential(1.0, 2.0, 64)).update(i.toDouble)))
  }
}

myazinn · 2023-03-30T21:09:04Z

done
And I've also moved it to zio.metrics instead of just zio, hope you don't mind :)

adamgfraser · 2023-03-30T21:10:13Z

👍

myazinn commented Mar 30, 2023

View reviewed changes

adamgfraser previously approved these changes Mar 30, 2023

View reviewed changes

myazinn dismissed adamgfraser’s stale review via 53865fc March 30, 2023 19:17

myazinn force-pushed the optimize_Boundaries.exponential branch from 2ca10d9 to 53865fc Compare March 30, 2023 19:17

adamgfraser suggested changes Mar 30, 2023

View reviewed changes

myazinn force-pushed the optimize_Boundaries.exponential branch from 53865fc to ca36395 Compare March 30, 2023 19:53

myazinn force-pushed the optimize_Boundaries.exponential branch from ca36395 to b4fd319 Compare March 30, 2023 20:55

adamgfraser previously approved these changes Mar 30, 2023

View reviewed changes

optimize Histogram.Boundaries.exponential

540c9bf

myazinn dismissed adamgfraser’s stale review via 540c9bf March 30, 2023 21:08

myazinn force-pushed the optimize_Boundaries.exponential branch from b4fd319 to 540c9bf Compare March 30, 2023 21:08

adamgfraser approved these changes Mar 30, 2023

View reviewed changes

adamgfraser merged commit 171b825 into zio:series/2.x Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Histogram.Boundaries.exponential #7966

Optimize Histogram.Boundaries.exponential #7966

myazinn commented Mar 30, 2023

myazinn Mar 30, 2023

adamgfraser left a comment

jdegoes commented Mar 30, 2023

adamgfraser commented Mar 30, 2023

adamgfraser left a comment

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023 •

edited

Loading

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023 •

edited

Loading

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023 •

edited

Loading

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023 •

edited

Loading

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023 •

edited

Loading

adamgfraser commented Mar 30, 2023

Optimize Histogram.Boundaries.exponential #7966

Optimize Histogram.Boundaries.exponential #7966

Conversation

myazinn commented Mar 30, 2023

myazinn Mar 30, 2023

Choose a reason for hiding this comment

adamgfraser left a comment

Choose a reason for hiding this comment

jdegoes commented Mar 30, 2023

adamgfraser commented Mar 30, 2023

adamgfraser left a comment

Choose a reason for hiding this comment

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023 • edited Loading

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023 • edited Loading

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023 • edited Loading

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023 • edited Loading

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023 • edited Loading

adamgfraser commented Mar 30, 2023

myazinn commented Mar 30, 2023 •

edited

Loading

myazinn commented Mar 30, 2023 •

edited

Loading

myazinn commented Mar 30, 2023 •

edited

Loading

myazinn commented Mar 30, 2023 •

edited

Loading

myazinn commented Mar 30, 2023 •

edited

Loading