-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Histogram.Boundaries.exponential #7966
Optimize Histogram.Boundaries.exponential #7966
Conversation
@@ -223,6 +223,16 @@ object MetricSpec extends ZIOBaseSpec { | |||
r1 <- base.tagged(MetricLabel("dyn", "x")).value | |||
r2 <- base.tagged(MetricLabel("dyn", "xyz")).value | |||
} yield assertTrue(r0.count == 0L, r1.count == 1L, r2.count == 1L) | |||
}, | |||
test("linear boundaries") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out linear boundaries are good as is, but I've already written a test and decided not to remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! 🙏
@myazinn Can you add this benchmark in your pull request? |
This only occurs when a metric is created so it shouldn't be material to most applications though fine to optimize it. |
2ca10d9
to
53865fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we are going to add a benchmark it should actually be a benchmark of using a histogram versus just two arbitrary operations on chunks.
Also agree with you that 1,000 buckets with an exponential distribution does not seem realistic. Even with 64 buckets and a factor of two the largest bucket would be larger than |
@adamgfraser yes, but if not treated carefully, metrics might be created quite often. E.g. def updateLag(topic: String, partition: Int): Histogram[Long] =
Metric
.histogram("topic_lag", Boundaries.exponential(1, 2, 20))
.contramap[Long](_.toDouble)
.tagged(MetricLabel("topic", topic), MetricLabel("partition", partition.toString))
val consumer: Consumer = ??? // from zio-kafka
def process(a: Any): UIO[Unit] = ZIO.debug(a)
def consume(r: ConsumerRecord[_, _]): UIO[Unit] =
for {
now <- Clock.currentTime(ChronoUnit.MILLIS)
_ <- updateLag(r.topic, r.partition).update(now - r.timestamp)
_ <- process(r.value)
} yield ()
consumer.consumeWith(topics("topic1", "topic2"), Serde.byteArray, Serde.byteArray)(consume(_)) There're ways how it can be mitigated (e.g. "cache" metric by making a
Could you please give an example? Not sure I understand what to do
Agree, but at the moment |
53865fc
to
ca36395
Compare
@myazinn Metrics are already automatically cached so in the example above the metric would only be constructed once for every topic and partition and hopefully we are processing a lot more than one message for each topic and partition so again I think it would not be material. An example would be create a histogram and update it with a bunch of values. Will take a look at |
@adamgfraser I could be wrong, but it doesn't seem like it's cached, at least not in the way that I'm talking about. |
No the metric is only created a single time but the metric key may have to be created multiple times. |
@adamgfraser oh, yeah, you are right, I's all about metric key. Sorry. |
Yes the real problem is that we are not being lazy enough in the construction of the chunk in the metric key. Will take a look at that but we may be limited in our ability to fix that due to binary compatibility. Anyway, here is a benchmark that we can use: package zio
import org.openjdk.jmh.annotations.{Scope => JScope, _}
import zio.metrics._
import java.util.concurrent.TimeUnit
@State(JScope.Thread)
@BenchmarkMode(Array(Mode.Throughput))
@OutputTimeUnit(TimeUnit.SECONDS)
@Measurement(iterations = 5, timeUnit = TimeUnit.SECONDS, time = 3)
@Warmup(iterations = 5, timeUnit = TimeUnit.SECONDS, time = 3)
@Fork(value = 3)
class MetricBenchmarks {
@Benchmark
def exponentialStatic(): Unit = {
val metric = Metric.histogram("exponential", MetricKeyType.Histogram.Boundaries.exponential(1.0, 2.0, 64))
var i = 0
while (i <= 100000) {
metric.update(i.toDouble)
i += 1
}
}
@Benchmark
def exponentialDynamic(): Unit = {
var i = 0
while (i <= 100000) {
Metric.histogram("exponential", MetricKeyType.Histogram.Boundaries.exponential(1.0, 2.0, 64)).update(i.toDouble)
i += 1
}
}
} Constructing the array more efficiently is definitely faster but creating the metric once is dramatically faster. |
Thank you for the help with benchmark :) Will update a MR with it soon Yeah I agree that it's faster, I was about to send my updated example on how it can be fixed |
Totally! Creating metrics statically is always going to be faster but we definitely want creating metrics dynamically to be as fast as possible |
ca36395
to
b4fd319
Compare
I've updated MR, thanks again for helping with benchmark |
That's a great point. I was thinking of the |
I think something like this: package zio
import org.openjdk.jmh.annotations.{Scope => JScope, _}
import zio.BenchmarkUtil._
import zio.metrics._
import java.util.concurrent.TimeUnit
@State(JScope.Thread)
@BenchmarkMode(Array(Mode.Throughput))
@OutputTimeUnit(TimeUnit.SECONDS)
@Measurement(iterations = 5, timeUnit = TimeUnit.SECONDS, time = 3)
@Warmup(iterations = 5, timeUnit = TimeUnit.SECONDS, time = 3)
@Fork(value = 3)
class MetricBenchmarks {
@Benchmark
def exponentialStatic(): Unit = {
val metric = Metric.histogram("exponential", MetricKeyType.Histogram.Boundaries.exponential(1.0, 2.0, 64))
unsafeRun(ZIO.foreachDiscard(1 to 100000)(i => metric.update(i.toDouble)))
}
@Benchmark
def exponentialDynamic(): Unit = {
unsafeRun(ZIO.foreachDiscard(1 to 100000)(i => Metric.histogram("exponential", MetricKeyType.Histogram.Boundaries.exponential(1.0, 2.0, 64)).update(i.toDouble)))
}
} |
b4fd319
to
540c9bf
Compare
done |
👍 |
Depending on the amount of required buckets, creation of
Boundaries
might be up to 2-3 times faster.Metric.timer
will benefit a lot from it.Here's a benchmark that I used (I doubt it's reasonable to use 1k buckets, it's here just for completeness).