Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenTracing span context #931

Merged
merged 6 commits into from Sep 24, 2018
Merged

Conversation

rcillo
Copy link
Contributor

@rcillo rcillo commented Aug 16, 2018

https://techjira.zalando.net/browse/ARUHA-1883

In order to trace distributed systems, open tracing span context can be
transported in the metadata of events. This way, publishers and consumer
can share a span even when asynchronously communicating through Nakadi.

In order to trace distributed systems, open tracing span context can be
transported in the metadata of events. This way, publishers and consumer
can share a span even when asynchronously communicating through Nakadi.
@codecov-io
Copy link

codecov-io commented Aug 16, 2018

Codecov Report

Merging #931 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #931      +/-   ##
============================================
+ Coverage     54.15%   54.16%   +<.01%     
  Complexity     1774     1774              
============================================
  Files           321      321              
  Lines          9687     9689       +2     
  Branches        885      885              
============================================
+ Hits           5246     5248       +2     
  Misses         4123     4123              
  Partials        318      318
Impacted Files Coverage Δ Complexity Δ
...alando/nakadi/validation/JsonSchemaEnrichment.java 94.16% <100%> (+0.09%) 42 <0> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6f40df...a742f62. Read the comment docs.

@adyach
Copy link
Member

adyach commented Aug 16, 2018

👍

uses OpenTracing for tracing transcations accross distributed services, this field is be populated with an
object that allows consumers to resume the context of a span. The details of the payload are vendor
specific and consumers are expected to contact the producer in order to obtain further details.
type: object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't it have any defined structure?
(I ask this because we always criticize our users for not defining the schema properly :) )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not possible or desired to define a schema as that would enforce vendor lock-in.

Unfortunately, OpenTracing deliberately leaves the span context wire format undefined. This allows each vendor to serialize/deserialize as they see fit. We want to keep the ability to switch vendors and should not depend on any particular format.
Additionally, part of the span context are the baggage items, which are arbitrary key values.
I remember this discussion from the initial narrative and a good idea could be to apply some sanity limits (payload size) for this particular element.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reasons why it can't be defined.
I'm just afraid that it goes against what we fight for in Nakadi: all data to be defined. With this field it will never be possible to completely define the structure of the event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@v-stepanov I agree with you. Another point is that this will eventually find its way to the data lake, where incompatible changes in types might lead to breaking their processes. Maybe we should involve them here, so that they know about this new piece of information.

I would say that having arbitrary maps<string, string> is better than having arbitrary objects.

@lmineiro do you think it's possible to assume that the extensible parts fit in a map<string, string> or should we keep it as an opaque object, that might have arbitrary nested structures?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We deliberately picked the TextMap format to, at least, provide that guarantee.

Copy link
Member

@ePaul ePaul Sep 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a Map<String, String>, it can be defined as such in JSON schema/OpenAPI schema:

        type: object
        additionalProperties:
          type: string

span_ctx:
description: |
Object containing an OpenTracing http://opentracing.io span context. In case the producer of this event type
uses OpenTracing for tracing transcations accross distributed services, this field is be populated with an

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: transcations
"this field is be populated" -> "this field is populated"

Object containing an OpenTracing http://opentracing.io span context. In case the producer of this event type
uses OpenTracing for tracing transcations accross distributed services, this field is be populated with an
object that allows consumers to resume the context of a span. The details of the payload are vendor
specific and consumers are expected to contact the producer in order to obtain further details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually add some short mention that any attempt at inspecting or effectively using the payload, apart from the Tracer implementation can lead to vendor lock-in which we want to avoid.

@lmontrieux
Copy link
Contributor

👍

Copy link
Contributor

@lmontrieux lmontrieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice way for Nakadi users to use OpenTracing.

@v-stepanov
Copy link
Contributor

I think we still need to discuss the fact that this new field is just an object which is somehow ruins all our efforts of having nakadi events fully defined structure. I will veto for now.

@v-stepanov
Copy link
Contributor

👎

@lmineiro
Copy link

lmineiro commented Sep 3, 2018

@v-stepanov I completely understand your concerns. I personally share them.

I'm afraid there's absolutely no other way to support context propagation with Nakadi. Even if we ignored the Baggage-Items (arbitrary key-value pairs that will be part of this object), any attempt at defining a schema for it would no longer be vendor-neutral. This seems to be a much higher price to pay.

Without this approach, Nakadi won't support context propagation through events.

@jmhofer
Copy link

jmhofer commented Sep 3, 2018

What's the problem with defining a more generic format? - It's a map from string to string, so you could go for an array of objects with the fields "key" and "value", or something similar, and still have it strictly defined?

@lmineiro
Copy link

lmineiro commented Sep 3, 2018

I'm sure there can be many creative ways to serialize the span-context. There's a high risk that it won't be supported out of the box by any Tracer implementation or the existing Opentracing libraries. The standard that was proposed (Text Map) would be easily serialized and deserialized by any Tracer as the Text Map is part of the Opentracing specification (it's just a glorified Map anyway).

The advantages of working out of the box, for me personally, outweigh any creative efforts of having a "strongly-typed" span context attribute in the event metadata. Every consumer and programming language would have to implement the deserialization.

If we don't care about the readability of the span-context by humans, I'd then propose a base64 encoded version of the Binary carrier format for this. This would allow Nakadi to have a single string attribute, at the extra expense of the consumers that would have to decode the base64 blob. What do you think @rcillo, @lmontrieux, @v-stepanov, @jmhofer?

@jmhofer
Copy link

jmhofer commented Sep 3, 2018

Personally, I do care about readability.

As the specification ensures that we have a representation of the span context as a map from string to string, it should be easy anyway to serialize/deserialize this map to whatever format you like.

I don't mind much in this case whether it's a (unfortunately schema-less) json object, some json map encoding or even something like application/x-www-form-urlencoded.

What's important in my humble opinion is that de-/serialization must be sufficiently simple for every Nakadi user, as everyone will have to write their own version of it.

specific and consumers are expected to contact the producer in order to obtain further details. Any attempt
to inspect or effectively use the payload, apart from the tracer implementation, can lead to vendor lock-in
which should be avoided.
type: object
Copy link
Member

@ePaul ePaul Sep 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the previous discussion somehow already is noted as "outdated" by Github, here my two cents to this:

If we suppose it is a TextMap, i.e. a map of strings to strings (contrary to an arbitrary JSON object with nested stuff), we can just declare it that way with this syntax:

      span_ctx:
        description: |
          Object containing ...
        type: object
        additionalProperties:
          type: string

This should take care of the concerns of @v-stepanov (not allowing arbitrary structures in here), while at the same time still allows some vendor-independence by not defining any keys or the meaning of their values.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear. What @ePaul just described is the actual proposal. If not defining the key names from said Map is not acceptable, only then I'd consider some other encoding.

Serializing/Deserializing from/to Text Map is supported out of the box on any implementation, once again, the reason why it was proposed. No extra work is needed from any consumer, apart from the typical tracer.Extract() call, using the Text Map carrier.

@v-stepanov, from your perspective, is what @ePaul described a reasonable/good compromise for Nakadi?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmineiro We scheduled internal team meeting for tomorrow morning to discuss the metadata changes in Nakadi for Open Tracing. So we will come back to you after that. But we also will need to get Data Lake team opinion on that.
As I understand current options are:

  1. just object with free structure (the initial proposal)
  2. single string with encoded span context
  3. flat map of key-value pairs (additionalProperties: type: string)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another option (mentioned by @jmhofer) to encode a flat map (not using an object) would be something like this (as a list of objects).

    span_ctx:
      description: ...
      type: array
      items:
        type: object
        properties:
          key:
            type: string
          value:
            type: string

I would prefer the additionalProperties version, though – it has less overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this all with the team. I would like to describe you what we came to. Basically we examined 3 options for this:

1) Free form object

"span_ctx" : {
  "key1": {
    "key1_1": "abc"
  }
  "key2": 43232,
  ...
}

Our concerns here is that we will break the guarantees that for "compatible" schema mode in Nakadi all fields are defined. This will create a backdoor to put any undefined data to this field.

2) Just a single string containing encoded data

"span_ctx": "WFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb2"

With this approach the problem is that it has very poor readability, so it will not be possible to get use of this data before decoding. So, for example, in Data Lake it will be very difficult to use this data.

3) Flat string->string map
a)

"span_ctx" : {
  "key1": "value1",
  "key2": "value2",
  ...
}

b)

"span_ctx" : [
  {
    "key": "key1",
    "value": "value1"
  },
  {
    "key": "key2",
    "value": "value2"
  },
  ...
]

In general approaches 3a and 3b are similar as they represent flat string->string map.
(However approach 3a has problems similar to case (1) as it will not allow to define all fields of Nakadi event in compatible mode.)
Our concern regarding flat string->string map is that the open-tracing data is actually more complex and has more hierarchical structure, so in a case of map some data still will have to be compacted or transformed to string - which lowers the data quality as the actual value is different from its representation in that map.


Our another concern for any of these approaches is that none of them define the actual data that will be there and it is not clear how our users can make use of it. You say that there is standardized way to serialize/deserialize open-tracing data to/from string->string map, then I suppose there should be some standardized list of required keys it should contain, in other case it just can not work.

Situation with current suggestions looks quite strange for us. Nakadi defines the following: there is a span_ctx field in metadata, I don't know what is inside and I don't provide any guarantees but probably open-tracing implementations should be able to parse it if the publisher has put correct data there.

From our point of view, if we add something to Nakadi metadata we should be able to explain our users how to adopt it and how to benefit from it. So that the user should be able to know what to do with span_ctx if it is present in metadata. For us the most preferable solution will still be to define at least the basic schema for this field (maybe with loosing of some flexibility).

@lmineiro can you please elaborate on that to clarify those points? Maybe providing some examples of what will be the actual content there? And with some use-cases of apps that will write/read data to/from that field without prior agreements between them. Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the discussion with datalake team and the fact that 3b provides a more stable schema inference could be pointed out as a clear advantage. But I personally expected the span_ctx to be defined by additionalProperties: string. I just didn't proposed it to be like that since the beginning because it was not clear for me how the "baggage items" would look like. I think it's important to have the confirmation that span_ctx = additionalProperties: true suites all use cases.

Even though 3a might lead to ever growing schema inference on the datalake side, I would seriously consider it, since it's always better to have more informative schemas, even for tracing data. It's more "searchable". Having "array<string, string>" is not very good for queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's important to understand if is it "ever growing" or not. I assume it might be quite stable set of properties. I think it would be really useful to have some real examples pasted here with and without baggage items. Then we discuss the real thing and not something hypothetical. @lmineiro do you think it possible to provide samples with the current implementation?

Copy link

@lmineiro lmineiro Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example from the narrative is what we're expected to have with the current vendor:

Example without baggage - stable

ot-tracer-spanid: b268f901d5f2b865
ot-tracer-traceid: e9435c17dabe8238
ot-tracer-sampled: true

Example with baggage - arbitrary

ot-tracer-spanid: b268f901d5f2b865
ot-tracer-traceid: e9435c17dabe8238
ot-tracer-sampled: true
ot-baggage-key1: value1
ot-baggage-key2: value2

We currently don't have any valid use cases for baggage. We also advise our users to use it with care, in respect to the specification note: "Use this feature thoughtfully and with care. Every key and value is copied into every local and remote child of the associated Span, and that can add up to a lot of network and cpu overhead."

I understand the preference for 3b, it fulfills both Nakadi's goal of strictly defined schemas and it seems to be better for the searching the Data Lake (this one a Zalando internal).

I still would like to point out that every Nakadi producer and consumer out there, internal or not, will have to write their own serializer an deserializer as the OpenTracing libraries will not be able to do it out of the box - this is, in fact, the biggest advantage of 3a.

The Data Lake is, as far as I know, proprietary. I don't expect any of the external users using Nakadi to benefit from that.

Only the maintainers can make the decision which way to go forward. My choice, as Nakadi user, even if I do have access to our Data Lake, would be to have the greatest out of the box experience if and when I decide to use OpenTracing. Option 3b adds additional cost and effort to that decision while 3a allows any user to reuse existing code and infrastructure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think we came to a good compromise here. I fixed the type to be additionalProperties: type: string which supports 3a.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good!

@jmhofer
Copy link

jmhofer commented Sep 6, 2018

Our concern regarding flat string->string map is that the open-tracing data is actually more complex and has more hierarchical structure

@v-stepanov just to clarify: this should not be a concern, as the OpenTracing specification guarantees that the span context can always be encoded as and decoded from a text map (map from string to string), no matter how the internal vendor-specific data structure looks.

For example, in Java, using the OpenTracing API, encoding looks like this:

Map<String, String> contextMap = new HashMap<>();
tracer.inject(tracer.activeSpan().context(), TEXT_MAP, new TextMapInjectAdapter(contextMap));
// contextMap now contains the (vendor-specific) serialized span context

And decoding (from a given map) looks like this:

SpanContext spanContext = tracer.extract(TEXT_MAP, new TextMapExtractAdapter(contextMap));

Using additionalProperties string makes sure that only flat maps of
strings are accepted in the span_ctx. This is considered a decent level
of guarantees in terms of schema, keeping it flexible but also
relatively well behaved.
@rcillo rcillo requested a review from a team as a code owner September 6, 2018 14:30
"\"span_ctx\": {" +
" \"ot-tracer-spanid\": \"b268f901d5f2b865\"," +
" \"ot-tracer-traceid\": \"e9435c17dabe8238\"," +
" \"ot-baggage-foo\": \"bar\"" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also have a test that validates the case when we have a none-string field in span_ctx? (Validation should fail in this case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcillo
Copy link
Contributor Author

rcillo commented Sep 6, 2018

👍

1 similar comment
@v-stepanov
Copy link
Contributor

👍

@rcillo
Copy link
Contributor Author

rcillo commented Sep 6, 2018

deploy validation please

@jmhofer
Copy link

jmhofer commented Sep 13, 2018

😭

@rcillo
Copy link
Contributor Author

rcillo commented Sep 24, 2018

👍

1 similar comment
@lmontrieux
Copy link
Contributor

👍

@rcillo rcillo merged commit effb2ed into master Sep 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants