Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] [OpenTelemetry] Trino generates a huge cardinality of metrics http_client_duration #19958

Closed
gaeljw opened this issue Nov 29, 2023 · 7 comments

Comments

@gaeljw
Copy link

gaeljw commented Nov 29, 2023

I just noticed that using the Trino JDBC driver and OpenTelemetry metrics results in a huge cardinality for metrics http_client_duration generated by the Trino client.

Examples of such metrics (Prometheus format):

http_client_duration_milliseconds_count{
  otel_scope_name="io.opentelemetry.okhttp-3.0",
  http_method="GET",
  http_response_content_length="1081",
  http_status_code="200",
  http_url="http://trino.mycompany.net:8080/v1/statement/executing/20231128_072415_20801_d2gny/y412c0ae3189948116bfc9ee9c3498b380a28be93/9",
  net_peer_name="trino.mycompany.net",
  net_peer_port="8080",
  net_protocol_name="http",
  net_protocol_version="1.1",
  user_agent_original="Trino JDBC Driver/433"} 1.0 1701260869528

http_client_duration_milliseconds_count{otel_scope_name="io.opentelemetry.okhttp-3.0",
  http_method="GET",
  http_response_content_length="923",
  http_status_code="200",
  http_url="http://trino.mycompany.net:8080/v1/statement/executing/20231128_072917_21203_d2gny/y7080dd845e468e8853803535c1eed6efd08625cc/2",
  net_peer_name="trino.mycompany.net",
  net_peer_port="8080",
  net_protocol_name="http",
  net_protocol_version="1.1",
  user_agent_original="Trino JDBC Driver/433"} 1.0 1701260869528

http_client_duration_milliseconds_count{otel_scope_name="io.opentelemetry.okhttp-3.0",
  http_method="GET",
  http_response_content_length="909",
  http_status_code="200",
  http_url="http://trino.mycompany.net:8080/v1/statement/executing/20231128_072624_20987_d2gny/y4d8797c5b88e82d46e0382f3986817f1af5e8009/124",
  net_peer_name="trino.mycompany.net",
  net_peer_port="8080",
  net_protocol_name="http",
  net_protocol_version="1.1",
  user_agent_original="Trino JDBC Driver/433"} 1.0 1701260869528

As these are histograms, there is also a bunch of other metrics http_client_duration_milliseconds_bucket.

The high cardinality comes from the fact that http_url label is different for each request sent to Trino. Same for http_response_content_length.

IMHO it doesn't make sense to be enabled by default (I'm not using auto instrumentation, only manual instrumentation moreover), and even if chosen it's not relevant in Trino's context to have these metrics as the URL is changing for each query.

I'm using Trino JDBC 433 and OpenTelemetry 1.31.0 but the issue was present already with OpenTelemetry 1.29.0 and Trino JDBC 427.

@gaeljw
Copy link
Author

gaeljw commented Nov 29, 2023

I don't see an easy workaround for this. I tried to set OTEL_INSTRUMENTATION_OKHTTP_ENABLED=false but as it's enabled "manually" and not via the OTEL java agent this doesn't work.

I think this feature should really be made opt-in rather than the default.

@gaeljw
Copy link
Author

gaeljw commented Dec 5, 2023

Here's a minimal reproduction case: https://github.com/gaeljw/otel9972

I hope it will help.

To reproduce:

  • clone the repo
  • run the org.example.Main class, it will start a Trino container and do some silly requests to it each second for 30 seconds
  • while it runs, open http://localhost:19000/

You'll see a bunch of metrics having the issue.

For the record, here are the top lines I get in my case:

# TYPE target info
# HELP target Target metadata
target_info{service_name="my-app",telemetry_sdk_language="java",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.31.0"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="io.opentelemetry.okhttp-3.0"} 1
# TYPE http_client_duration_milliseconds histogram
# HELP http_client_duration_milliseconds The duration of the outbound HTTP request
http_client_duration_milliseconds_count{otel_scope_name="io.opentelemetry.okhttp-3.0",http_method="GET",http_response_content_length="455",http_status_code="200",http_url="http://localhost:32785/v1/statement/executing/20231205_194230_00023_dtni2/yafaffd20c244c96661c43b612e3d59f054e2474e/1",net_peer_name="localhost",net_peer_port="32785",net_protocol_name="http",net_protocol_version="1.1",user_agent_original="Trino JDBC Driver/434"} 1.0 1701805352774
http_client_duration_milliseconds_sum{otel_scope_name="io.opentelemetry.okhttp-3.0",http_method="GET",http_response_content_length="455",http_status_code="200",http_url="http://localhost:32785/v1/statement/executing/20231205_194230_00023_dtni2/yafaffd20c244c96661c43b612e3d59f054e2474e/1",net_peer_name="localhost",net_peer_port="32785",net_protocol_name="http",net_protocol_version="1.1",user_agent_original="Trino JDBC Driver/434"} 5.401179 1701805352774
http_client_duration_milliseconds_bucket{otel_scope_name="io.opentelemetry.okhttp-3.0",http_method="GET",http_response_content_length="455",http_status_code="200",http_url="http://localhost:32785/v1/statement/executing/20231205_194230_00023_dtni2/yafaffd20c244c96661c43b612e3d59f054e2474e/1",net_peer_name="localhost",net_peer_port="32785",net_protocol_name="http",net_protocol_version="1.1",user_agent_original="Trino JDBC Driver/434",le="0.0"} 0.0 1701805352774

You can see the http_response_content_length and http_url labels

@gaeljw
Copy link
Author

gaeljw commented Dec 6, 2023

From open-telemetry/opentelemetry-java-instrumentation#9972 (comment)

This is a trinio issue. Trinio does not shade opentelemetry-api classes like io.opentelemetry.api.metrics.DoubleHistogramBuilder but it shades the opentelemetry-extension-incubator classes like io.opentelemetry.extension.incubator.metrics.ExtendedDoubleHistogramBuilder to io.trino.jdbc.$internal.opentelemetry.extension.incubator.metrics.ExtendedDoubleHistogramBuilder. Due to this the test in

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/afe10b551bf048652d5f607838f8b93d32dd5087/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/semconv/http/HttpMetricsAdvice.java#L24
fails because it uses the shaded name but the actual object implements

@gaeljw
Copy link
Author

gaeljw commented Dec 8, 2023

As per the last comment at open-telemetry/opentelemetry-java-instrumentation#9972 (comment), would anyone object if I provide a PR to not shade OTEL Extension dependency?

@nineinchnick I believe you worked on the OTEL stuff in Trino

gaeljw added a commit to gaeljw/trino that referenced this issue Dec 10, 2023
…trinodb#19958)

Otherwise, OTEL's HttpMetricsAdvice class checks doesn't work and this result in many harmful labels
to be exposed in OkHttp metrics like the http_response_content_length and http_url.

trinodb#19958
gaeljw added a commit to gaeljw/trino that referenced this issue Dec 10, 2023
…rinodb#19958)

Otherwise, OTEL's HttpMetricsAdvice class checks doesn't work and this result in many harmful labels
to be exposed in OkHttp metrics like the http_response_content_length and http_url.

Fixes: trinodb#19958
@nineinchnick
Copy link
Member

OTEL has been added in the JDBC driver to support traces, specifically to propagate them to the server. It wasn't supposed to register any metrics, so I think we should try to turn that off.

@gaeljw
Copy link
Author

gaeljw commented Mar 13, 2024

Note that with the work done in open-telemetry/opentelemetry-java-instrumentation#10671, the metrics won't be exposed anymore. This doesn't fix the underlying root cause as #20074 aims to, but it solves the consequences.

@gaeljw
Copy link
Author

gaeljw commented Apr 25, 2024

Note that with the work done in open-telemetry/opentelemetry-java-instrumentation#10671, the metrics won't be exposed anymore.

I confirm that with recent trino-jdbc version (445), the metrics are not exposed anymore.

@findepi findepi closed this as completed Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants