Add more metrics for processing latency #5974

deepthidevaki · 2020-12-07T10:45:42Z

Description

Since we have been speculating about several root causes/solutions for performance bottlenecks, I thought it would be good to added following metrics:

Time taken for processing a record
Latency to write a record to the log
Latency to commit a record

Related issues

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

The changes are backwards compatibility with previous versions
If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/0.25) to the PR, in case that fails you need to create backports manually.

Testing:

There are unit/integration tests that verify all acceptance criterias of the issue
New tests are written to ensure backwards compatibility with further versions
The behavior is tested manually
The impact of the changes is verified by a benchmark

Documentation:

The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
New content is added to the release announcement

deepthidevaki · 2020-12-07T10:46:05Z

npepinpe · 2020-12-07T13:26:17Z

Writing has a bunch of outliers that are much slower than I expected 😅

npepinpe

🎉

I had some questions, one's a comment, and the other is: is there a reason we can't filter by pod on the new metrics in Grafana?

But no blockers 🙂

npepinpe · 2020-12-07T13:27:46Z

engine/src/main/java/io/zeebe/engine/processing/streamprocessor/ProcessingStateMachine.java

-    metrics.processingLatency(
-        metadata.getRecordType(), event.getTimestamp(), ActorClock.currentTimeMillis());
+    final long processingStartTime = ActorClock.currentTimeMillis();
+    metrics.processingLatency(metadata.getRecordType(), event.getTimestamp(), processingStartTime);


Can you explain the choice to include deserialization as part of the processing time?

As a first step I did not want to narrow the scope of a metrics. If processing duration is higher than what we expect, then we can narrow down the metrics to smaller blocks. Would you like to take deserialization out of the processing time?

I mostly wanted to know if it was a conscious choice, and if so, why. Do you think it might be confusing/unexpected that deserializing is part of the processing time? I think it might be a little unexpected, but once you know it doesn't sound out of place imho, you can make a case that it is part of the processing.

Actually, I wanted to include the complete processing - from this event is ready to process until the next event is ready. This would give us an idea how much time we spent in StreamProcessor. So that should include the steps updateState and writeEvent, which is not currently included in the processing time. Wdyt? Shall I update it? Then it wouldn't be weird to have deserializing also part of the processing time.

That makes more sense - we can make it more granular by adding a metric per step later on (i.e. updateState, writeFollowUpEvents, etc.). If we add one here, can we also add one in reprocessing? Both will be very useful when refactoring how we do stream processing next quarter 👍

deepthidevaki · 2020-12-07T13:55:00Z

I had some questions, one's a comment, and the other is: is there a reason we can't filter by pod on the new metrics in Grafana?

I just copied existing panel's for processing latency and adjusted the descriptions. I guess they were not filtering by pods. But I can add add it both on the old and new metrics.

deepthidevaki · 2020-12-11T07:02:13Z

@npepinpe I have update processing duration calculation.

npepinpe

🎉

npepinpe · 2020-12-14T13:02:59Z

engine/src/main/java/io/zeebe/engine/metrics/StreamProcessorMetrics.java

+  public void processingDuration(
+      final RecordType recordType, final long started, final long processed) {
+    PROCESSING_DURATION
+        .labels(recordType.name(), partitionIdLabel)


I think it would be interesting to find hotspots by adding value type/intent as labels - however I'm not sure if that creates too much dimensions/data explosion. We can definitely do that as a second iteration though.

Yes. We can improve it when we need it.

deepthidevaki · 2020-12-14T13:10:41Z

bors r+

zeebe-bors · 2020-12-14T13:50:11Z

Build succeeded:

continuous-integration/jenkins/branch

deepthidevaki requested a review from npepinpe December 7, 2020 10:45

npepinpe approved these changes Dec 7, 2020

View reviewed changes

deepthidevaki requested a review from npepinpe December 11, 2020 07:00

npepinpe approved these changes Dec 14, 2020

View reviewed changes

deepthidevaki added 3 commits December 14, 2020 14:09

chore(broker): add commit latency metrics

df654fe

chore(monitor): add panels for commit latency metrics

82b2cc2

chore(monitor): filter by pod in latency panels

05dc428

deepthidevaki force-pushed the dd-metrics branch from 81740e0 to 05dc428 Compare December 14, 2020 13:10

zeebe-bors bot merged commit c4ab987 into develop Dec 14, 2020

zeebe-bors bot deleted the dd-metrics branch December 14, 2020 13:50

npepinpe added the Release: 0.26.0 label Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more metrics for processing latency #5974

Add more metrics for processing latency #5974

deepthidevaki commented Dec 7, 2020

deepthidevaki commented Dec 7, 2020

npepinpe commented Dec 7, 2020

npepinpe left a comment

npepinpe Dec 7, 2020

deepthidevaki Dec 7, 2020

npepinpe Dec 7, 2020 •

edited

deepthidevaki Dec 7, 2020

npepinpe Dec 7, 2020

deepthidevaki commented Dec 7, 2020

deepthidevaki commented Dec 11, 2020

npepinpe left a comment

npepinpe Dec 14, 2020

deepthidevaki Dec 14, 2020

deepthidevaki commented Dec 14, 2020

zeebe-bors bot commented Dec 14, 2020

Add more metrics for processing latency #5974

Add more metrics for processing latency #5974

Conversation

deepthidevaki commented Dec 7, 2020

Description

Related issues

Definition of Done

deepthidevaki commented Dec 7, 2020

npepinpe commented Dec 7, 2020

npepinpe left a comment

Choose a reason for hiding this comment

npepinpe Dec 7, 2020

Choose a reason for hiding this comment

deepthidevaki Dec 7, 2020

Choose a reason for hiding this comment

npepinpe Dec 7, 2020 • edited

Choose a reason for hiding this comment

deepthidevaki Dec 7, 2020

Choose a reason for hiding this comment

npepinpe Dec 7, 2020

Choose a reason for hiding this comment

deepthidevaki commented Dec 7, 2020

deepthidevaki commented Dec 11, 2020

npepinpe left a comment

Choose a reason for hiding this comment

npepinpe Dec 14, 2020

Choose a reason for hiding this comment

deepthidevaki Dec 14, 2020

Choose a reason for hiding this comment

deepthidevaki commented Dec 14, 2020

zeebe-bors bot commented Dec 14, 2020

npepinpe Dec 7, 2020 •

edited