`#[tracing::instrument(err)]` doesn't work with opentelemetry / datadog #2648

scottlamb · 2023-07-13T04:05:42Z

Bug Report

Version

Applies to the latest released versions as of this writing:

tracing = "0.1.37"
tracing-attributes = "0.1.26"
tracing-opentelemetry = "0.19.0"
opentelemetry = "0.19.0"
opentelemetry-datadog = "0.7.0"
opentelemetry-otlp = "0.12.0"
...

Platform

any

Crates

Interaction between tracing-attributes, tracing-opentelemetry, opentelemetry, and opentelemetry-datadog.

Description

I'd like the full integration from #[tracing(err)] to just work, resulting in following the expected protocols:

OpenTelemetry says that spans should report an error by having an event named exception with attributes exception.escaped (which should always be true the case of #[tracing::instrument(err)]), exception.message, exception.type, exception.stack.
Datadog says that there should be attributes on the span: error.msg, error.type, error.stack.

From observation: it doesn't work with opentelemetry_datadog. E.g., if I use essentially tracing_subscriber::registry().with(tracing_opentelemetry::layer().with_tracer(opentelemetry_datadog::new_pipeline().install_batch(...)), Datadog will correctly identify that the span has an error, but I can't see any details anywhere. I just see Missing error message and stack trace from the Datadog UI.

From code inspection: I don't think it'd work well with opentelemetry_otlp either. In fact, I think tracing-attributes and tracing-opentelemetry aren't following the same protocol:

#[tracing(err)] code here will create an event (with no message, I think?) with attribute error that has the Display or Debug output
tracing_opentelemetry has this promising-looking logic that looks based on the OpenTelemetry semantic conventions. It will fill in exception.message and exception.stack...but only on record_error. So the conversion to Display or Debug (error = %e or error = ?e) defeats that by calling record_debug instead in both cases (DisplayValue and DebugValue. Also, I think the event name won't get set to exception as it's supposed to; no tracing message field => no OpenTelemetry event name, right? There's also logic to copy it to the span's attributes (OpenTelemetryLayer::with_exception_field_propagation / ExceptionFieldConfig::propagate), but that's not the right thing to do by my reading of the semantic conventions above...

...and then I'm not sure opentelemetry crates would do the right thing from there anyway. opentelemetry_datadog at least doesn't seem to have a conversion from the OpenTelemetry protocol to the Datadog protocol.

I don't know how to accomplish this, but I wish some group of folks could get together and make this integration work end to end. I can file a separate issue in the opentelemetry repo about the paragraph above, but I feel like there still may be a gap between the two that I don't know how to fill.

My first impression is that going through these two conversions just seems really janky. It looks like the opentelemetry crate has this Span{,Ref}::record_error method that could do the right thing for the given backend (otlp, datadog, etc). But tracing_opentelemetry isn't using it (instead trying to directly follow the opentelemetry semantic conventions), and I think it's not actually using backend-specific logic so I don't see any reason to believe it'd work correctly with opentelemetry_datadog anyway.

(Bit of a tangent, but the tracing -> opentelemetry -> datadog conversion seems janky in other ways too. E.g. tracing has static/low-cardinality span names. tracing_opentelemetry copies those to the OpenTelemetry span names (unless overridden via otel.name attribute). But then opentelemetry_datadog says in the crate-level doc comment that because of cardinality concerns "This exporter therefore takes a different approach of naming the span with the name of the tracing provider, and using the span name to set the resource_name". So even though my code specifies low-cardinality span names, by default they all get replaced with the useless opentelemetry-datadog. It's possible to override this, but it's just a stupid papercut from going through two conversions.)

The text was updated successfully, but these errors were encountered:

scottlamb · 2023-07-13T04:17:09Z

From observation: it doesn't work with opentelemetry_datadog. E.g., if I use essentially tracing_subscriber::registry().with(tracing_opentelemetry::layer().with_tracer(opentelemetry_datadog::new_pipeline().install_batch(...)), Datadog will correctly identify that the span has an error, but I can't see any details anywhere. I just see Missing error message and stack trace from the Datadog UI.

It occurs to me I forgot to mention some context. Long story short: currently in this application tracing logs don't go to DataDog, just trace/span data. After seeing the semi-related #2522 just now it occurs to me that #[tracing::instrument(err)] is making log messages, which is presumably useful to some folks (and from the existence of that issue, undesirable for others). But it's not making the span data follow conventions.

* manually attach an `error.message` to subspans (on per-op errors) and to the main span. Unfortunately, `#[tracing(err)]` -> Datadog span export loses the error message (see <tokio-rs/tracing#2648>), so the manual approach is better. (It'd be worthwhile to do the same on other operations, but small steps...) * on `AtomicWriteConflict`, mention the offending key. * if there is an error that isn't a condition failure or write conflict, pass it back to the caller rather than erroneously claiming write conflict.

* manually attach an `error.message` to subspans (on per-op errors) and to the main span. Unfortunately, `#[tracing(err)]` -> Datadog span export loses the error message (see <tokio-rs/tracing#2648>), so the manual approach is better for now. (Would be nice to have a proc macro that produces the right format but doesn't seem worth it just for keyvaluestore-rs's stuff.) (It'd be worthwhile to do the same on other operations, but small steps...) * also fill in `otel.status_code` and `otel.kind`, which `tracing-opentelemetry` translates to the OpenTelemetry magic, and opentelemetry-datadog passes along. https://github.com/tokio-rs/tracing-opentelemetry/blob/f5f898b47e086659a7a78b495ef77661781895cd/src/layer.rs#L330 https://github.com/open-telemetry/opentelemetry-rust/blob/1f1a4fe0540b5df92b7cf5b289d6e89c2e1c02a8/opentelemetry-datadog/src/exporter/model/v05.rs#L183 * on `AtomicWriteConflict`, mention the offending key. * if there is an error that isn't a condition failure or write conflict, pass it back to the caller rather than erroneously claiming write conflict. We could further modify the interface to pass any errors back per-op rather than just condition failure, but I went with the more modest change for now.

* improve rusoto `exec_atomic_write` errors * manually attach an `error.message` to subspans (on per-op errors) and to the main span. Unfortunately, `#[tracing(err)]` -> Datadog span export loses the error message (see <tokio-rs/tracing#2648>), so the manual approach is better for now. (Would be nice to have a proc macro that produces the right format but doesn't seem worth it just for keyvaluestore-rs's stuff.) (It'd be worthwhile to do the same on other operations, but small steps...) * also fill in `otel.status_code` and `otel.kind`, which `tracing-opentelemetry` translates to the OpenTelemetry magic, and opentelemetry-datadog passes along. https://github.com/tokio-rs/tracing-opentelemetry/blob/f5f898b47e086659a7a78b495ef77661781895cd/src/layer.rs#L330 https://github.com/open-telemetry/opentelemetry-rust/blob/1f1a4fe0540b5df92b7cf5b289d6e89c2e1c02a8/opentelemetry-datadog/src/exporter/model/v05.rs#L183 * on `AtomicWriteConflict`, mention the offending key. * if there is an error that isn't a condition failure or write conflict, pass it back to the caller rather than erroneously claiming write conflict. We could further modify the interface to pass any errors back per-op rather than just condition failure, but I went with the more modest change for now. * `error.message` -> `error.msg` to match https://docs.datadoghq.com/tracing/trace_explorer/trace_view/?tab=spantags#more-information * remove stray `println!` * remove unnecessary conversion

mdtusz · 2023-08-02T17:37:37Z

Using honeycomb here with a similar issue - errors just show up as span events with error = true, but no additional details of the actual error are included in the event. This was working previously when using a different exporter (subscriber? the nomenclature confuses me) to send the traces to honeycomb though, so I believe the issue may be with the tracing-opentelemetry or opentelemetry-otlp crates.

scottlamb changed the title ~~#![tracing::instrument(err)] doesn't work with opentelemetry / datadog~~ #[tracing::instrument(err)] doesn't work with opentelemetry / datadog Jul 13, 2023

scottlamb mentioned this issue Jul 13, 2023

improve rusoto exec_atomic_write errors sportsball-ai/keyvaluestore-rs#17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`#[tracing::instrument(err)]` doesn't work with opentelemetry / datadog #2648

`#[tracing::instrument(err)]` doesn't work with opentelemetry / datadog #2648

scottlamb commented Jul 13, 2023 •

edited

scottlamb commented Jul 13, 2023

mdtusz commented Aug 2, 2023

#[tracing::instrument(err)] doesn't work with opentelemetry / datadog #2648

#[tracing::instrument(err)] doesn't work with opentelemetry / datadog #2648

Comments

scottlamb commented Jul 13, 2023 • edited

Bug Report

Version

Platform

Crates

Description

scottlamb commented Jul 13, 2023

mdtusz commented Aug 2, 2023

`#[tracing::instrument(err)]` doesn't work with opentelemetry / datadog #2648

`#[tracing::instrument(err)]` doesn't work with opentelemetry / datadog #2648

scottlamb commented Jul 13, 2023 •

edited