Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[tracing::instrument(err)] doesn't work with opentelemetry / datadog #2648

Open
scottlamb opened this issue Jul 13, 2023 · 2 comments
Open

Comments

@scottlamb
Copy link

scottlamb commented Jul 13, 2023

Bug Report

Version

Applies to the latest released versions as of this writing:

  • tracing = "0.1.37"
  • tracing-attributes = "0.1.26"
  • tracing-opentelemetry = "0.19.0"
  • opentelemetry = "0.19.0"
  • opentelemetry-datadog = "0.7.0"
  • opentelemetry-otlp = "0.12.0"
  • ...

Platform

any

Crates

Interaction between tracing-attributes, tracing-opentelemetry, opentelemetry, and opentelemetry-datadog.

Description

I'd like the full integration from #[tracing(err)] to just work, resulting in following the expected protocols:

  • OpenTelemetry says that spans should report an error by having an event named exception with attributes exception.escaped (which should always be true the case of #[tracing::instrument(err)]), exception.message, exception.type, exception.stack.
  • Datadog says that there should be attributes on the span: error.msg, error.type, error.stack.

From observation: it doesn't work with opentelemetry_datadog. E.g., if I use essentially tracing_subscriber::registry().with(tracing_opentelemetry::layer().with_tracer(opentelemetry_datadog::new_pipeline().install_batch(...)), Datadog will correctly identify that the span has an error, but I can't see any details anywhere. I just see Missing error message and stack trace from the Datadog UI.

From code inspection: I don't think it'd work well with opentelemetry_otlp either. In fact, I think tracing-attributes and tracing-opentelemetry aren't following the same protocol:

  • #[tracing(err)] code here will create an event (with no message, I think?) with attribute error that has the Display or Debug output
  • tracing_opentelemetry has this promising-looking logic that looks based on the OpenTelemetry semantic conventions. It will fill in exception.message and exception.stack...but only on record_error. So the conversion to Display or Debug (error = %e or error = ?e) defeats that by calling record_debug instead in both cases (DisplayValue and DebugValue. Also, I think the event name won't get set to exception as it's supposed to; no tracing message field => no OpenTelemetry event name, right? There's also logic to copy it to the span's attributes (OpenTelemetryLayer::with_exception_field_propagation / ExceptionFieldConfig::propagate), but that's not the right thing to do by my reading of the semantic conventions above...

...and then I'm not sure opentelemetry crates would do the right thing from there anyway. opentelemetry_datadog at least doesn't seem to have a conversion from the OpenTelemetry protocol to the Datadog protocol.

I don't know how to accomplish this, but I wish some group of folks could get together and make this integration work end to end. I can file a separate issue in the opentelemetry repo about the paragraph above, but I feel like there still may be a gap between the two that I don't know how to fill.

My first impression is that going through these two conversions just seems really janky. It looks like the opentelemetry crate has this Span{,Ref}::record_error method that could do the right thing for the given backend (otlp, datadog, etc). But tracing_opentelemetry isn't using it (instead trying to directly follow the opentelemetry semantic conventions), and I think it's not actually using backend-specific logic so I don't see any reason to believe it'd work correctly with opentelemetry_datadog anyway.

(Bit of a tangent, but the tracing -> opentelemetry -> datadog conversion seems janky in other ways too. E.g. tracing has static/low-cardinality span names. tracing_opentelemetry copies those to the OpenTelemetry span names (unless overridden via otel.name attribute). But then opentelemetry_datadog says in the crate-level doc comment that because of cardinality concerns "This exporter therefore takes a different approach of naming the span with the name of the tracing provider, and using the span name to set the resource_name". So even though my code specifies low-cardinality span names, by default they all get replaced with the useless opentelemetry-datadog. It's possible to override this, but it's just a stupid papercut from going through two conversions.)

@scottlamb scottlamb changed the title #![tracing::instrument(err)] doesn't work with opentelemetry / datadog #[tracing::instrument(err)] doesn't work with opentelemetry / datadog Jul 13, 2023
@scottlamb
Copy link
Author

From observation: it doesn't work with opentelemetry_datadog. E.g., if I use essentially tracing_subscriber::registry().with(tracing_opentelemetry::layer().with_tracer(opentelemetry_datadog::new_pipeline().install_batch(...)), Datadog will correctly identify that the span has an error, but I can't see any details anywhere. I just see Missing error message and stack trace from the Datadog UI.

It occurs to me I forgot to mention some context. Long story short: currently in this application tracing logs don't go to DataDog, just trace/span data. After seeing the semi-related #2522 just now it occurs to me that #[tracing::instrument(err)] is making log messages, which is presumably useful to some folks (and from the existence of that issue, undesirable for others). But it's not making the span data follow conventions.

scottlamb added a commit to sportsball-ai/keyvaluestore-rs that referenced this issue Jul 13, 2023
* manually attach an `error.message` to subspans (on per-op errors) and
  to the main span. Unfortunately, `#[tracing(err)]` -> Datadog span
  export loses the error message (see
  <tokio-rs/tracing#2648>), so the manual
  approach is better.

  (It'd be worthwhile to do the same on other operations, but small
  steps...)

* on `AtomicWriteConflict`, mention the offending key.

* if there is an error that isn't a condition failure or write conflict,
  pass it back to the caller rather than erroneously claiming write
  conflict.
scottlamb added a commit to sportsball-ai/keyvaluestore-rs that referenced this issue Jul 13, 2023
* manually attach an `error.message` to subspans (on per-op errors) and
  to the main span. Unfortunately, `#[tracing(err)]` -> Datadog span
  export loses the error message (see
  <tokio-rs/tracing#2648>), so the manual
  approach is better.

  (It'd be worthwhile to do the same on other operations, but small
  steps...)

* on `AtomicWriteConflict`, mention the offending key.

* if there is an error that isn't a condition failure or write conflict,
  pass it back to the caller rather than erroneously claiming write
  conflict.
scottlamb added a commit to sportsball-ai/keyvaluestore-rs that referenced this issue Jul 13, 2023
* manually attach an `error.message` to subspans (on per-op errors) and
  to the main span. Unfortunately, `#[tracing(err)]` -> Datadog span
  export loses the error message (see
  <tokio-rs/tracing#2648>), so the manual
  approach is better.

  (It'd be worthwhile to do the same on other operations, but small
  steps...)

* on `AtomicWriteConflict`, mention the offending key.

* if there is an error that isn't a condition failure or write conflict,
  pass it back to the caller rather than erroneously claiming write
  conflict.
scottlamb added a commit to sportsball-ai/keyvaluestore-rs that referenced this issue Jul 13, 2023
* manually attach an `error.message` to subspans (on per-op errors) and
  to the main span. Unfortunately, `#[tracing(err)]` -> Datadog span
  export loses the error message (see
  <tokio-rs/tracing#2648>), so the manual
  approach is better for now. (Would be nice to have a proc macro that
  produces the right format but doesn't seem worth it just for
  keyvaluestore-rs's stuff.)

  (It'd be worthwhile to do the same on other operations, but small
  steps...)

* also fill in `otel.status_code` and `otel.kind`, which
  `tracing-opentelemetry` translates to the OpenTelemetry magic,
  and opentelemetry-datadog passes along.
  https://github.com/tokio-rs/tracing-opentelemetry/blob/f5f898b47e086659a7a78b495ef77661781895cd/src/layer.rs#L330
  https://github.com/open-telemetry/opentelemetry-rust/blob/1f1a4fe0540b5df92b7cf5b289d6e89c2e1c02a8/opentelemetry-datadog/src/exporter/model/v05.rs#L183

* on `AtomicWriteConflict`, mention the offending key.

* if there is an error that isn't a condition failure or write conflict,
  pass it back to the caller rather than erroneously claiming write
  conflict.

We could further modify the interface to pass any errors back per-op
rather than just condition failure, but I went with the more modest
change for now.
scottlamb added a commit to sportsball-ai/keyvaluestore-rs that referenced this issue Jul 13, 2023
* manually attach an `error.message` to subspans (on per-op errors) and
  to the main span. Unfortunately, `#[tracing(err)]` -> Datadog span
  export loses the error message (see
  <tokio-rs/tracing#2648>), so the manual
  approach is better for now. (Would be nice to have a proc macro that
  produces the right format but doesn't seem worth it just for
  keyvaluestore-rs's stuff.)

  (It'd be worthwhile to do the same on other operations, but small
  steps...)

* also fill in `otel.status_code` and `otel.kind`, which
  `tracing-opentelemetry` translates to the OpenTelemetry magic,
  and opentelemetry-datadog passes along.
  https://github.com/tokio-rs/tracing-opentelemetry/blob/f5f898b47e086659a7a78b495ef77661781895cd/src/layer.rs#L330
  https://github.com/open-telemetry/opentelemetry-rust/blob/1f1a4fe0540b5df92b7cf5b289d6e89c2e1c02a8/opentelemetry-datadog/src/exporter/model/v05.rs#L183

* on `AtomicWriteConflict`, mention the offending key.

* if there is an error that isn't a condition failure or write conflict,
  pass it back to the caller rather than erroneously claiming write
  conflict.

We could further modify the interface to pass any errors back per-op
rather than just condition failure, but I went with the more modest
change for now.
scottlamb added a commit to sportsball-ai/keyvaluestore-rs that referenced this issue Jul 13, 2023
* improve rusoto `exec_atomic_write` errors

* manually attach an `error.message` to subspans (on per-op errors) and
  to the main span. Unfortunately, `#[tracing(err)]` -> Datadog span
  export loses the error message (see
  <tokio-rs/tracing#2648>), so the manual
  approach is better for now. (Would be nice to have a proc macro that
  produces the right format but doesn't seem worth it just for
  keyvaluestore-rs's stuff.)

  (It'd be worthwhile to do the same on other operations, but small
  steps...)

* also fill in `otel.status_code` and `otel.kind`, which
  `tracing-opentelemetry` translates to the OpenTelemetry magic,
  and opentelemetry-datadog passes along.
  https://github.com/tokio-rs/tracing-opentelemetry/blob/f5f898b47e086659a7a78b495ef77661781895cd/src/layer.rs#L330
  https://github.com/open-telemetry/opentelemetry-rust/blob/1f1a4fe0540b5df92b7cf5b289d6e89c2e1c02a8/opentelemetry-datadog/src/exporter/model/v05.rs#L183

* on `AtomicWriteConflict`, mention the offending key.

* if there is an error that isn't a condition failure or write conflict,
  pass it back to the caller rather than erroneously claiming write
  conflict.

We could further modify the interface to pass any errors back per-op
rather than just condition failure, but I went with the more modest
change for now.

* `error.message` -> `error.msg`

to match https://docs.datadoghq.com/tracing/trace_explorer/trace_view/?tab=spantags#more-information

* remove stray `println!`

* remove unnecessary conversion
@mdtusz
Copy link

mdtusz commented Aug 2, 2023

Using honeycomb here with a similar issue - errors just show up as span events with error = true, but no additional details of the actual error are included in the event. This was working previously when using a different exporter (subscriber? the nomenclature confuses me) to send the traces to honeycomb though, so I believe the issue may be with the tracing-opentelemetry or opentelemetry-otlp crates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants