-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Add more metric reporting to MSQ #18121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerImpl.java
Fixed
Show fixed
Hide fixed
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Outdated
Show resolved
Hide resolved
...ti-stage-query/src/main/java/org/apache/druid/msq/dart/controller/DartControllerContext.java
Outdated
Show resolved
Hide resolved
...core/multi-stage-query/src/main/java/org/apache/druid/msq/dart/worker/DartWorkerContext.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerImpl.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerImpl.java
Outdated
Show resolved
Hide resolved
...ti-stage-query/src/main/java/org/apache/druid/msq/dart/controller/DartControllerContext.java
Outdated
Show resolved
Hide resolved
...ti-stage-query/src/main/java/org/apache/druid/msq/dart/controller/DartControllerContext.java
Outdated
Show resolved
Hide resolved
...sions-core/multi-stage-query/src/main/java/org/apache/druid/msq/counters/CounterTracker.java
Outdated
Show resolved
Hide resolved
...sions-core/multi-stage-query/src/main/java/org/apache/druid/msq/counters/CounterTracker.java
Outdated
Show resolved
Hide resolved
...ti-stage-query/src/main/java/org/apache/druid/msq/dart/controller/DartControllerContext.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/MSQMetricUtils.java
Outdated
Show resolved
Hide resolved
...ti-stage-query/src/main/java/org/apache/druid/msq/dart/controller/DartControllerContext.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerImpl.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/MSQMetricUtils.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/MSQMetricUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few line comments. Please also include tests that verify the controller and workers are emitting the correct metrics with the correct dimensions. One way you can do that is:
- update
MSQTestControllerContext
andMSQTestWorkerContext
to haveemitMetric
write emitted metrics into some list - provide a mechanism for test cases to get the emitted metrics
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/MSQMetricUtils.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/MSQMetricUtils.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerContext.java
Outdated
Show resolved
Hide resolved
...nsions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerContext.java
Show resolved
Hide resolved
context.emitMetric( | ||
"query/time", | ||
MSQMetricUtils.createQueryMetricDimensions(datasources, intervals, success), | ||
startTime - System.currentTimeMillis() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel that the context#emitMetric
method is odd; it could be something like:
MsqServiceMetricEventBuilder metricBuilder = new MsqServiceMetricEventBuilder(querySpec);
metricBuilder.setDimension(DruidMetrics.DATASOURCE, DefaultQueryMetrics.getTableNamesAsString(datasources));
metricBuilder.setDimension(DruidMetrics.INTERVAL, DefaultQueryMetrics.getIntervalsAsStringArray(intervals));
metricBuilder.setDimension(DruidMetrics.DURATION, BaseQuery.calculateDuration(intervals));
metricBuilder.setDimension(DruidMetrics.SUCCESS, success);
metricBuilder.setMetric("query/time", startTime - System.currentTimeMillis());
context.getEmitter().emit(metricBuilder);
probably MsqServiceMetricEventBuilder
could also provide a more natural home for those static methods which could become instance methods
..or alter the signature of context#emitrMetric
to context#emit(metricBuilder)
? so that the context could still
add stuff if it wants - but the caller has the control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about the above author :D it was me...I was just checking some github features ; which involved switching between 2 users ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that would work too. It's kind of equivalent (the metricBuilder
has a metric name, value, and dimensions in it) but bundled together more nicely. @adarshsanjeev - could you try this approach? It looks like it could be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgyrtkirk Made these changes. The function now takes a service emitter.
There's an awkward bit in TaskControllerContext
, where we have to check if the dataSource has already been provided in the metricBuilder before setting the default value, for backward compatibility. I wonder if it is worth adding yet another function of getDefaultMetricBuilder()
from the context instead, so that the caller can decide what all to override instead of this weird code.
Please let me know your thoughts.
extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQReplaceTest.java
Dismissed
Show dismissed
Hide dismissed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the updates! looks good
left a few notes here and there - the IT failures are unrelated
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/MSQMetricUtils.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/MSQMetricUtils.java
Outdated
Show resolved
Hide resolved
Set<String> datasourceDim = (Set<String>) queryMetricDimensions.computeIfAbsent(DruidMetrics.DATASOURCE, s -> new HashSet<>()); | ||
Set<Interval> intervalDim = (Set<Interval>) queryMetricDimensions.computeIfAbsent(DruidMetrics.INTERVAL, s -> new HashSet<>()); | ||
for (StageDefinition stageDef : queryDef.getStageDefinitions()) { | ||
datasourceDim.addAll(MSQMetricUtils.getDatasources(stageDef)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is this a one time need - or maybe there should be queryDef.getDatasources
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I believe this is a one time need. Now that I have changed getDatasources to a member of stageDefinition, maybe this isn't required.
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerImpl.java
Show resolved
Hide resolved
...stage-query/src/test/java/org/apache/druid/msq/dart/controller/http/DartSqlResourceTest.java
Show resolved
Hide resolved
.../multi-stage-query/src/test/java/org/apache/druid/msq/test/MSQTestOverlordServiceClient.java
Outdated
Show resolved
Hide resolved
* concurrency issue with StubServiceEmitter - #18121 have added a few new metrics which have increased the load on it and have caused random appearances of an issue arising from the fact it used an `ArrayList` under the hood. * added some catches to shut down queries properly in case some unexpected exceptions occur - this could give better exceptions and reduce time to fix in the future this should reduce the probability that `QTest` splits remain hanging
* concurrency issue with StubServiceEmitter - apache#18121 have added a few new metrics which have increased the load on it and have caused random appearances of an issue arising from the fact it used an `ArrayList` under the hood. * added some catches to shut down queries properly in case some unexpected exceptions occur - this could give better exceptions and reduce time to fix in the future this should reduce the probability that `QTest` splits remain hanging
* concurrency issue with StubServiceEmitter - #18121 have added a few new metrics which have increased the load on it and have caused random appearances of an issue arising from the fact it used an `ArrayList` under the hood. * added some catches to shut down queries properly in case some unexpected exceptions occur - this could give better exceptions and reduce time to fix in the future this should reduce the probability that `QTest` splits remain hanging Co-authored-by: Zoltan Haindrich <kirk@rxd.hu>
Adds reporting of additional metrics and dimensions to MSQ task and Dart.
Summary
Currently, MSQ reports the a few metrics. (
ingest/tombstones/count
,ingest/segments/count
andingest/input/bytes
).This PR adds a few new metrics:
query/time
: Reported by controller and worker at the end of the query.query/cpu/time
: Reported by each worker at the end of the query.Additionally, adds some more dimensions to the metrics emitted. Dimensions added to metrics:
queryId
sqlQueryId
engine
: Denotes the engine used for the query,msq-dart
ormsq-task
.dartQueryId
: (Dart only)type
: Alwaysmsq
dataSource
interval
duration
success
This PR has: