New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support OpenTracing span context #931
Conversation
In order to trace distributed systems, open tracing span context can be transported in the metadata of events. This way, publishers and consumer can share a span even when asynchronously communicating through Nakadi.
Codecov Report
@@ Coverage Diff @@
## master #931 +/- ##
============================================
+ Coverage 54.15% 54.16% +<.01%
Complexity 1774 1774
============================================
Files 321 321
Lines 9687 9689 +2
Branches 885 885
============================================
+ Hits 5246 5248 +2
Misses 4123 4123
Partials 318 318
Continue to review full report at Codecov.
|
👍 |
docs/_data/nakadi-event-bus-api.yaml
Outdated
uses OpenTracing for tracing transcations accross distributed services, this field is be populated with an | ||
object that allows consumers to resume the context of a span. The details of the payload are vendor | ||
specific and consumers are expected to contact the producer in order to obtain further details. | ||
type: object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't it have any defined structure?
(I ask this because we always criticize our users for not defining the schema properly :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not possible or desired to define a schema as that would enforce vendor lock-in.
Unfortunately, OpenTracing deliberately leaves the span context wire format undefined. This allows each vendor to serialize/deserialize as they see fit. We want to keep the ability to switch vendors and should not depend on any particular format.
Additionally, part of the span context are the baggage items, which are arbitrary key values.
I remember this discussion from the initial narrative and a good idea could be to apply some sanity limits (payload size) for this particular element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the reasons why it can't be defined.
I'm just afraid that it goes against what we fight for in Nakadi: all data to be defined. With this field it will never be possible to completely define the structure of the event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@v-stepanov I agree with you. Another point is that this will eventually find its way to the data lake, where incompatible changes in types might lead to breaking their processes. Maybe we should involve them here, so that they know about this new piece of information.
I would say that having arbitrary maps<string, string> is better than having arbitrary objects.
@lmineiro do you think it's possible to assume that the extensible parts fit in a map<string, string> or should we keep it as an opaque object, that might have arbitrary nested structures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We deliberately picked the TextMap format to, at least, provide that guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is a Map<String, String>
, it can be defined as such in JSON schema/OpenAPI schema:
type: object
additionalProperties:
type: string
docs/_data/nakadi-event-bus-api.yaml
Outdated
span_ctx: | ||
description: | | ||
Object containing an OpenTracing http://opentracing.io span context. In case the producer of this event type | ||
uses OpenTracing for tracing transcations accross distributed services, this field is be populated with an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: transcations
"this field is be populated" -> "this field is populated"
docs/_data/nakadi-event-bus-api.yaml
Outdated
Object containing an OpenTracing http://opentracing.io span context. In case the producer of this event type | ||
uses OpenTracing for tracing transcations accross distributed services, this field is be populated with an | ||
object that allows consumers to resume the context of a span. The details of the payload are vendor | ||
specific and consumers are expected to contact the producer in order to obtain further details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually add some short mention that any attempt at inspecting or effectively using the payload, apart from the Tracer implementation can lead to vendor lock-in which we want to avoid.
👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice way for Nakadi users to use OpenTracing.
I think we still need to discuss the fact that this new field is just an object which is somehow ruins all our efforts of having nakadi events fully defined structure. I will veto for now. |
👎 |
@v-stepanov I completely understand your concerns. I personally share them. I'm afraid there's absolutely no other way to support context propagation with Nakadi. Even if we ignored the Baggage-Items (arbitrary key-value pairs that will be part of this object), any attempt at defining a schema for it would no longer be vendor-neutral. This seems to be a much higher price to pay. Without this approach, Nakadi won't support context propagation through events. |
What's the problem with defining a more generic format? - It's a map from string to string, so you could go for an array of objects with the fields "key" and "value", or something similar, and still have it strictly defined? |
I'm sure there can be many creative ways to serialize the span-context. There's a high risk that it won't be supported out of the box by any Tracer implementation or the existing Opentracing libraries. The standard that was proposed (Text Map) would be easily serialized and deserialized by any Tracer as the Text Map is part of the Opentracing specification (it's just a glorified Map anyway). The advantages of working out of the box, for me personally, outweigh any creative efforts of having a "strongly-typed" span context attribute in the event metadata. Every consumer and programming language would have to implement the deserialization. If we don't care about the readability of the span-context by humans, I'd then propose a base64 encoded version of the Binary carrier format for this. This would allow Nakadi to have a single string attribute, at the extra expense of the consumers that would have to decode the base64 blob. What do you think @rcillo, @lmontrieux, @v-stepanov, @jmhofer? |
Personally, I do care about readability. As the specification ensures that we have a representation of the span context as a map from string to string, it should be easy anyway to serialize/deserialize this map to whatever format you like. I don't mind much in this case whether it's a (unfortunately schema-less) json object, some json map encoding or even something like What's important in my humble opinion is that de-/serialization must be sufficiently simple for every Nakadi user, as everyone will have to write their own version of it. |
specific and consumers are expected to contact the producer in order to obtain further details. Any attempt | ||
to inspect or effectively use the payload, apart from the tracer implementation, can lead to vendor lock-in | ||
which should be avoided. | ||
type: object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the previous discussion somehow already is noted as "outdated" by Github, here my two cents to this:
If we suppose it is a TextMap, i.e. a map of strings to strings (contrary to an arbitrary JSON object with nested stuff), we can just declare it that way with this syntax:
span_ctx:
description: |
Object containing ...
type: object
additionalProperties:
type: string
This should take care of the concerns of @v-stepanov (not allowing arbitrary structures in here), while at the same time still allows some vendor-independence by not defining any keys or the meaning of their values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear. What @ePaul just described is the actual proposal. If not defining the key names from said Map is not acceptable, only then I'd consider some other encoding.
Serializing/Deserializing from/to Text Map is supported out of the box on any implementation, once again, the reason why it was proposed. No extra work is needed from any consumer, apart from the typical tracer.Extract() call, using the Text Map carrier.
@v-stepanov, from your perspective, is what @ePaul described a reasonable/good compromise for Nakadi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmineiro We scheduled internal team meeting for tomorrow morning to discuss the metadata changes in Nakadi for Open Tracing. So we will come back to you after that. But we also will need to get Data Lake team opinion on that.
As I understand current options are:
- just object with free structure (the initial proposal)
- single string with encoded span context
- flat map of key-value pairs (
additionalProperties: type: string
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think another option (mentioned by @jmhofer) to encode a flat map (not using an object) would be something like this (as a list of objects).
span_ctx:
description: ...
type: array
items:
type: object
properties:
key:
type: string
value:
type: string
I would prefer the additionalProperties version, though – it has less overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this all with the team. I would like to describe you what we came to. Basically we examined 3 options for this:
1) Free form object
"span_ctx" : {
"key1": {
"key1_1": "abc"
}
"key2": 43232,
...
}
Our concerns here is that we will break the guarantees that for "compatible" schema mode in Nakadi all fields are defined. This will create a backdoor to put any undefined data to this field.
2) Just a single string containing encoded data
"span_ctx": "WFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb2"
With this approach the problem is that it has very poor readability, so it will not be possible to get use of this data before decoding. So, for example, in Data Lake it will be very difficult to use this data.
3) Flat string->string map
a)
"span_ctx" : {
"key1": "value1",
"key2": "value2",
...
}
b)
"span_ctx" : [
{
"key": "key1",
"value": "value1"
},
{
"key": "key2",
"value": "value2"
},
...
]
In general approaches 3a and 3b are similar as they represent flat string->string map.
(However approach 3a has problems similar to case (1) as it will not allow to define all fields of Nakadi event in compatible mode.)
Our concern regarding flat string->string map is that the open-tracing data is actually more complex and has more hierarchical structure, so in a case of map some data still will have to be compacted or transformed to string - which lowers the data quality as the actual value is different from its representation in that map.
Our another concern for any of these approaches is that none of them define the actual data that will be there and it is not clear how our users can make use of it. You say that there is standardized way to serialize/deserialize open-tracing data to/from string->string map, then I suppose there should be some standardized list of required keys it should contain, in other case it just can not work.
Situation with current suggestions looks quite strange for us. Nakadi defines the following: there is a span_ctx field in metadata, I don't know what is inside and I don't provide any guarantees but probably open-tracing implementations should be able to parse it if the publisher has put correct data there.
From our point of view, if we add something to Nakadi metadata we should be able to explain our users how to adopt it and how to benefit from it. So that the user should be able to know what to do with span_ctx if it is present in metadata. For us the most preferable solution will still be to define at least the basic schema for this field (maybe with loosing of some flexibility).
@lmineiro can you please elaborate on that to clarify those points? Maybe providing some examples of what will be the actual content there? And with some use-cases of apps that will write/read data to/from that field without prior agreements between them. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the discussion with datalake team and the fact that 3b provides a more stable schema inference could be pointed out as a clear advantage. But I personally expected the span_ctx
to be defined by additionalProperties: string
. I just didn't proposed it to be like that since the beginning because it was not clear for me how the "baggage items" would look like. I think it's important to have the confirmation that span_ctx = additionalProperties: true
suites all use cases.
Even though 3a might lead to ever growing schema inference on the datalake side, I would seriously consider it, since it's always better to have more informative schemas, even for tracing data. It's more "searchable". Having "array<string, string>" is not very good for queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's important to understand if is it "ever growing" or not. I assume it might be quite stable set of properties. I think it would be really useful to have some real examples pasted here with and without baggage items. Then we discuss the real thing and not something hypothetical. @lmineiro do you think it possible to provide samples with the current implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example from the narrative is what we're expected to have with the current vendor:
Example without baggage - stable
ot-tracer-spanid: b268f901d5f2b865
ot-tracer-traceid: e9435c17dabe8238
ot-tracer-sampled: true
Example with baggage - arbitrary
ot-tracer-spanid: b268f901d5f2b865
ot-tracer-traceid: e9435c17dabe8238
ot-tracer-sampled: true
ot-baggage-key1: value1
ot-baggage-key2: value2
We currently don't have any valid use cases for baggage. We also advise our users to use it with care, in respect to the specification note: "Use this feature thoughtfully and with care. Every key and value is copied into every local and remote child of the associated Span, and that can add up to a lot of network and cpu overhead."
I understand the preference for 3b, it fulfills both Nakadi's goal of strictly defined schemas and it seems to be better for the searching the Data Lake (this one a Zalando internal).
I still would like to point out that every Nakadi producer and consumer out there, internal or not, will have to write their own serializer an deserializer as the OpenTracing libraries will not be able to do it out of the box - this is, in fact, the biggest advantage of 3a.
The Data Lake is, as far as I know, proprietary. I don't expect any of the external users using Nakadi to benefit from that.
Only the maintainers can make the decision which way to go forward. My choice, as Nakadi user, even if I do have access to our Data Lake, would be to have the greatest out of the box experience if and when I decide to use OpenTracing. Option 3b adds additional cost and effort to that decision while 3a allows any user to reuse existing code and infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I think we came to a good compromise here. I fixed the type to be additionalProperties: type: string
which supports 3a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good!
@v-stepanov just to clarify: this should not be a concern, as the OpenTracing specification guarantees that the span context can always be encoded as and decoded from a text map (map from string to string), no matter how the internal vendor-specific data structure looks. For example, in Java, using the OpenTracing API, encoding looks like this: Map<String, String> contextMap = new HashMap<>();
tracer.inject(tracer.activeSpan().context(), TEXT_MAP, new TextMapInjectAdapter(contextMap));
// contextMap now contains the (vendor-specific) serialized span context And decoding (from a given map) looks like this: SpanContext spanContext = tracer.extract(TEXT_MAP, new TextMapExtractAdapter(contextMap)); |
Using additionalProperties string makes sure that only flat maps of strings are accepted in the span_ctx. This is considered a decent level of guarantees in terms of schema, keeping it flexible but also relatively well behaved.
"\"span_ctx\": {" + | ||
" \"ot-tracer-spanid\": \"b268f901d5f2b865\"," + | ||
" \"ot-tracer-traceid\": \"e9435c17dabe8238\"," + | ||
" \"ot-baggage-foo\": \"bar\"" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also have a test that validates the case when we have a none-string field in span_ctx? (Validation should fail in this case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@v-stepanov done 8488b55
👍 |
1 similar comment
👍 |
deploy validation please |
😭 |
👍 |
1 similar comment
👍 |
https://techjira.zalando.net/browse/ARUHA-1883
In order to trace distributed systems, open tracing span context can be
transported in the metadata of events. This way, publishers and consumer
can share a span even when asynchronously communicating through Nakadi.