-
Notifications
You must be signed in to change notification settings - Fork 900
Implement span started and live health metrics #7430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
exporters/common/src/main/java/io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.java
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7430 +/- ##
============================================
+ Coverage 89.82% 89.84% +0.02%
- Complexity 6998 7026 +28
============================================
Files 798 801 +3
Lines 21199 21297 +98
Branches 2055 2062 +7
============================================
+ Hits 19041 19135 +94
- Misses 1496 1498 +2
- Partials 662 664 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
||
@Test | ||
void doNotCrash() { | ||
Span span = new NonRecordingSpan(DUMMY_CONTEXT, mockRecording); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is pretty much a copy of PropagatedSpanTest.doNotCrash. I don't think it's very useful, but it keeps the CodeCov bot happy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid stuff. One recommendation about renaming to *Instrumentation - since there's going to be more and more internal telemetry, we should try to standardize naming / patterns where we can.
Thanks!
* Sets the {@link MeterProvider} supplier used to collect self-monitoring metrics. If not set, | ||
* uses {@link GlobalOpenTelemetry#getMeterProvider()}. | ||
*/ | ||
public SdkTracerProviderBuilder setMeterProvider(Supplier<MeterProvider> meterProviderSupplier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is the contract of other components, but I wonder if we may want to think bigger, and start accepting Supplier<OpenTelemetry>
for internal monitoring. That would position us well if there are semantic conventions that involve emitting traces or logs or events for internal monitoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 74fdbae. I changed it just at the API-level, the implementation still directly uses Supplier<MeterProvider>
for now, because this is the only part used currently..
I went for setInternalTelemetryOpenTelemetry
as name, because I feel setOpenTelemetry
would be confusing: It could make it look like the counterpart to SdkOpenTelemetryBuilder.setTracerProvider
.
sdk/trace/src/main/java/io/opentelemetry/sdk/trace/internal/metrics/SpanMetrics.java
Outdated
Show resolved
Hide resolved
@@ -74,6 +81,19 @@ public static SdkTracerProviderBuilder builder() { | |||
this.tracerConfigurator = tracerConfigurator; | |||
} | |||
|
|||
private static SpanMetrics createSpanMetrics( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow the pattern of ExporterInstrumentation
:
- Let's have a class called SpanInstrumentation (or maybe TracerProviderInstrumentation)
- Class should have a constructor which accepts
InternalTelemetryVersion
andSupplier<MeterProvider>
and initializes accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in 6f4b0af.
However, I kept SpanInstrumentation
as an interface instead of class with a static create
method for now: It doesn't contain any logic except for create
, which is why I think making it a class and adding delegation like ExporterInstrumentation
would be unnecessary boilerplate. It however does allow us to change to this model later if we need it without touching the consumers of SpanInstrumentation
.
sdk/trace/src/main/java/io/opentelemetry/sdk/trace/NonRecordingSpan.java
Outdated
Show resolved
Hide resolved
sdk/trace/src/main/java/io/opentelemetry/sdk/trace/internal/metrics/SemConvSpanMetrics.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public synchronized void recordSpanEnd() { | ||
if (endAlreadyReported) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mark endAlreadyReported
concurrent safe? If a span is ended multiple times on different threads, then the state of these metrics could become corrupted. On the other hand, SdkSpan#endInternal
is already protected by a lock, so the only thing that could cause is is concurrent calls to end a span + NonRecordSpan, which isn't protected by a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a @GuardedBy("this")
, is this what you meant? It was already concurrent safe because recordSpanEnd()
is synchronized
.
Also this is in fact necessary, because for NonRecordingSpan
end()
isn't synchronized.
sdk/trace/src/test/java/io/opentelemetry/sdk/trace/SdkSpanBuilderTest.java
Outdated
Show resolved
Hide resolved
I'm moving this back to draft, because this implementation actally violates the spec. See this comment. We'll fix the definition of the metrics first on that Sem-Conv PR, and then update this one after reaching consensus there. |
Ready for review again, as the spec changes from this merged semconv PR have now been applied:
|
Implements the semantic conventions SDK health metrics
otel.sdk.span.live
andotel.sdk.span.ended
otel.sdk.span.started
metrics.Follows the approach of #7265 that these metrics are currently disabled by default, but can be enabled via the
SdkTracerProviderBuilder
.Some examples of insights gained from these metrics:
end()
calls (via a steadily increasingotel.sdk.span.live
count)