Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not shade opentelemetry-extension-incubator #20074

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gaeljw
Copy link

@gaeljw gaeljw commented Dec 10, 2023

Description

Solves #19958

Otherwise, OTEL's HttpMetricsAdvice class checks don't work, resulting in many harmful labels being exposed in OkHttp metrics like the http_response_content_length and http_url.

I've tested the change against the reproduction repository at https://github.com/gaeljw/otel9972 and it does fix the issue.

Behavior before:

  • more than 1000 different metrics rows exported
  • example of such rows (Prometheus format): http_client_duration_milliseconds_bucket{otel_scope_name="io.opentelemetry.okhttp-3.0",http_method="GET",http_response_content_length="313",http_status_code="200",http_url="http://localhost:32769/v1/statement/queued/20231210_161430_00046_fhxz7/y1e8d019656c9d40982bc8bc8acd01931a65d7162/1",net_peer_name="localhost",net_peer_port="32769",net_protocol_name="http",net_protocol_version="1.1",user_agent_original="Trino JDBC Driver/434",le="0.0"} 0.0 1702224879638

Behavior with this change:

  • 80 different metrics rows exported
  • example of such rows: http_client_duration_milliseconds_bucket{otel_scope_name="io.opentelemetry.okhttp-3.0",http_method="GET",http_status_code="200",net_peer_name="localhost",net_peer_port="32771",net_protocol_name="http",net_protocol_version="1.1",le="0.0"} 0.0 1702225905514
  • there's no more the http_response_content_length, http_url and user_agent_original labels

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Section
* Fix OpenTelemetry metrics containing harmful labels (`http_response_content_length`, `http_url` and `user_agent_original`). ({issue}`19958`)

…rinodb#19958)

Otherwise, OTEL's HttpMetricsAdvice class checks doesn't work and this result in many harmful labels
to be exposed in OkHttp metrics like the http_response_content_length and http_url.

Fixes: trinodb#19958
Copy link

cla-bot bot commented Dec 10, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added the jdbc Relates to Trino JDBC driver label Dec 10, 2023
Copy link
Member

@nineinchnick nineinchnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We include OTEL in JDBC to support propagating traces to the server. Registering client metrics is a side effect. I wonder if we should disable that.

I found this repo: https://github.com/open-telemetry/opentelemetry-java/tree/main/extensions/incubator and there are examples that this can be used with tracing too, so I think this is the right change.

@gaeljw
Copy link
Author

gaeljw commented Dec 12, 2023

If there's a way to keep traces and disable metrics, this could be interesting in the context of Trino but I'm not sure it's possible and also in a auto-instrumentation context I guess people would expect to have the OkHttp metrics.

FYI I've sent the files for the CLA, not sure how long it will take to get approved. I believe this PR still makes sense.

@electrum
Copy link
Member

I'm not sure this change is correct. We have a compile time dependency on opentelemetry-okhttp-3.0 which has a compile time dependency on opentelemetry-instrumentation-api which in turn has a runtime dependency on opentelemetry-extension-incubator. I assume that the instrumentation code will fail at runtime if that dependency is not present.

In NonRegisteringTrinoDriver, we try to instantiate OkHttpTelemetry while ignoring NoClassDefFoundError. Is that code guaranteed to throw NoClassDefFoundError if opentelemetry-extension-incubator is not present on the classpath? And can the shaded instrumentation code work correctly with a non-shaded version of opentelemetry-extension-incubator?

At a minimum, we would need integration tests to verify that the JDBC driver still works correctly when the OpenTelemetry API is on the classpath but the extension JAR is not.

@electrum
Copy link
Member

I also don't understand the actual problem here. We're shading both io.opentelemetry.extension and io.opentelemetry.instrumentation, so the instanceof checks in HttpMetricsAdvice should work. This could be verified in a debugger.

Ideally, we'd like to have both trace propagation and HTTP client traces when the OpenTelemetry API is on the classpath.

We've never considered OpenTelemetry metrics as we don't use these in Trino, so the fact that these are collected is accidental. Someone such as yourself who is more familiar with how metrics should work will need to help to make sure they are working appropriately. For now, it's probably best to disable them, since it sounds like they are more harmful than helpful.

The nature of JDBC drivers introduces constraints that necessitates the shading and complex dependency handling:

  • Must work standalone, without OpenTelemetry or any other dependency on the classpath. Applications will simply add the JAR to their classpath and expect it to work. There is no facility for requiring external dependencies.
  • Must not conflict with an application's own dependencies. This is why OkHttp, etc., are shaded. Otherwise, for example, the driver would conflict with an application that used its own version of OkHttp.
  • Must not depend on a specific version of OpenTelemetry. We expect/hope that the core OpenTelemetry API is stable and won't conflict with whatever version an application happens to be using. This isn't true for instrumentation, extensions, etc., which are rapidly changing, so we shade those.

If we can't easily disable metrics collection, we might be forced to remove the OkHttp instrumentation and instead do manual trace propagation (and HTTP client trace collection if that is useful).

@gaeljw
Copy link
Author

gaeljw commented Dec 13, 2023

Thanks for the feedbacks @electrum

Is that code guaranteed to throw NoClassDefFoundError if opentelemetry-extension-incubator is not present on the classpath?
(...)
At a minimum, we would need integration tests to verify that the JDBC driver still works correctly when the OpenTelemetry API is on the classpath but the extension JAR is not.

I'll check that.

And can the shaded instrumentation code work correctly with a non-shaded version of opentelemetry-extension-incubator?

I believe the manual integration tests I did are proving this works.

I also don't understand the actual problem here. We're shading both io.opentelemetry.extension and io.opentelemetry.instrumentation, so the instanceof checks in HttpMetricsAdvice should work. This could be verified in a debugger.

I can run the reproduction case and provide debugging details for this.

We've never considered OpenTelemetry metrics as we don't use these in Trino, so the fact that these are collected is accidental. Someone such as yourself who is more familiar with how metrics should work will need to help to make sure they are working appropriately. For now, it's probably best to disable them, since it sounds like they are more harmful than helpful.

I can give it more time to see if we could entirely disable metrics for the OkHttp calls made by Trino.
However I'm definitely not that familiar with this 🫤

@gaeljw
Copy link
Author

gaeljw commented Jan 1, 2024

I also don't understand the actual problem here. We're shading both io.opentelemetry.extension and io.opentelemetry.instrumentation, so the instanceof checks in HttpMetricsAdvice should work. This could be verified in a debugger.

FYI, I had a bit of time to debug the instanceof checks. I confirm it doesn't work because of shading.

The call stack is as follows:

applyOldClientDurationAdvice:40, HttpMetricsAdvice (io.trino.jdbc.$internal.opentelemetry.instrumentation.api.instrumenter.http)
<init>:69, HttpClientMetrics (io.trino.jdbc.$internal.opentelemetry.instrumentation.api.instrumenter.http)
create:-1, HttpClientMetrics$$Lambda/0x00007fa400285630 (io.trino.jdbc.$internal.opentelemetry.instrumentation.api.instrumenter.http)
buildOperationListeners:320, InstrumenterBuilder (io.trino.jdbc.$internal.opentelemetry.instrumentation.api.instrumenter)
<init>:92, Instrumenter (io.trino.jdbc.$internal.opentelemetry.instrumentation.api.instrumenter)
create:-1, InstrumenterBuilder$InstrumenterConstructor$$Lambda/0x00007fa400286690 (io.trino.jdbc.$internal.opentelemetry.instrumentation.api.instrumenter)
buildInstrumenter:284, InstrumenterBuilder (io.trino.jdbc.$internal.opentelemetry.instrumentation.api.instrumenter)
buildInstrumenter:276, InstrumenterBuilder (io.trino.jdbc.$internal.opentelemetry.instrumentation.api.instrumenter)
create:62, OkHttpInstrumenterFactory (io.trino.jdbc.$internal.opentelemetry.instrumentation.okhttp.v3_0.internal)
build:112, OkHttpTelemetryBuilder (io.trino.jdbc.$internal.opentelemetry.instrumentation.okhttp.v3_0)
instrumentClient:69, NonRegisteringTrinoDriver (io.trino.jdbc)
connect:62, NonRegisteringTrinoDriver (io.trino.jdbc)
...

Here's a screenshot of the behaviour in IntelliJ debugger when being at the HttpMetricsAdvice#applyOldClientDurationAdvice method which is the one supposed to add the advices filtering the noisy metrics:
2024-01-01_15-42

You can see in the bottom section that:

  • the builder parameter is not shaded, it's a io.opentelemetry.sdk.metrics.SdkDoubleHistogramBuilder extends io.opentelemetry.extension.incubator.metrics.ExtendedDoubleHistogramBuilder, belongs to opentelemetry-sdk-metrics
  • but the instanceof is checking against the shaded io.trino.jdbc.$internal.opentelemetry.extension.incubator.metrics.ExtendedDoubleHistogramBuilder
  • the same instanceof check against the non shaded ExtendedDoubleHistogramBuilder would work as intended

@gaeljw
Copy link
Author

gaeljw commented Jan 1, 2024

If you're not willing to accept this current PR (I don't have enough knowledge or expertise on the requirements for a JDBC driver, I'd need help from someone else to check further if the change doesn't bring unexpected issues), I don't see any easy way out of the situation excepted to remove OpenTelemetry instrumentation in favor of a "home made" manual trace propagation if that's the initial reason for the integration of OpenTelemetry.

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jan 22, 2024
@laurit
Copy link

laurit commented Jan 29, 2024

In NonRegisteringTrinoDriver, we try to instantiate OkHttpTelemetry while ignoring NoClassDefFoundError. Is that code guaranteed to throw NoClassDefFoundError if opentelemetry-extension-incubator is not present on the classpath?

yes, it is

And can the shaded instrumentation code work correctly with a non-shaded version of opentelemetry-extension-incubator?

yes, it can

I also don't understand the actual problem here. We're shading both io.opentelemetry.extension and io.opentelemetry.instrumentation, so the instanceof checks in HttpMetricsAdvice should work.

You can view the classes in https://github.com/open-telemetry/opentelemetry-java/tree/main/extensions/incubator/src/main/java/io/opentelemetry/extension/incubator/metrics as incubating api. Eventually this functionality will be moved to opentelemetry-api and the dependency on opentelemetry-extension-incubator will become unnecessary. The intended usage is that user will cast DoubleHistogramBuilder to ExtendedDoubleHistogramBuilder to use the incubating api. If you shade ExtendedDoubleHistogramBuilder such cast will be impossible because you only shade the instrumentation code, the sdk that is provided by the user will still use an unshaded version of ExtendedDoubleHistogramBuilder. This is the actual object that is being cast https://github.com/open-telemetry/opentelemetry-java/blob/f123d789097d714ed94f76d98e87985067a7b301/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/SdkDoubleHistogram.java#L60. It is part of the sdk that you don't shade. Because of that you should not shade these classes similarly like you don't shade the opentelemetry-api.

If we can't easily disable metrics collection, we might be forced to remove the OkHttp instrumentation and instead do manual trace propagation (and HTTP client trace collection if that is useful).

Currently metrics collection can not be disabled for the okhttp client instrumentation library. Http client metrics are part of the opentelemetry specification https://opentelemetry.io/docs/specs/semconv/http/http-metrics/#http-client.

@gaeljw
Copy link
Author

gaeljw commented Jan 29, 2024

@electrum I'd really like this PR (or at least a fix to the initial problem) be solved. What more would be missing to move forward?

Should I provide an integration test that can be automated? If so, are there already some in the project?

@github-actions github-actions bot removed the stale label Jan 29, 2024
@mosabua
Copy link
Member

mosabua commented Jan 29, 2024

cc @mattstep and @electrum .. please chime in here.

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

Copy link

github-actions bot commented Apr 9, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Apr 9, 2024
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Apr 30, 2024
@gaeljw gaeljw reopened this Apr 30, 2024
@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Apr 30, 2024
@mosabua
Copy link
Member

mosabua commented Apr 30, 2024

Sorry for the close again - I added the stale-ignore label now. Can @mattstep and @electrum please look at this PR

@mosabua mosabua mentioned this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jdbc Relates to Trino JDBC driver stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

None yet

5 participants