New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support OpenTracing span context #931
Changes from 1 commit
322e837
51753d3
561d64f
a68032f
8488b55
a742f62
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2053,6 +2053,13 @@ definitions: | |
same partition and therefore compacted in a proper way (as compaction is performed per partition). | ||
type: string | ||
example: '329ed3d2-8366-11e8-adc0-fa7ae01bbebc' | ||
span_ctx: | ||
description: | | ||
Object containing an OpenTracing http://opentracing.io span context. In case the producer of this event type | ||
uses OpenTracing for tracing transcations accross distributed services, this field is be populated with an | ||
object that allows consumers to resume the context of a span. The details of the payload are vendor | ||
specific and consumers are expected to contact the producer in order to obtain further details. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd actually add some short mention that any attempt at inspecting or effectively using the payload, apart from the Tracer implementation can lead to vendor lock-in which we want to avoid. |
||
type: object | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. doesn't it have any defined structure? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not possible or desired to define a schema as that would enforce vendor lock-in. Unfortunately, OpenTracing deliberately leaves the span context wire format undefined. This allows each vendor to serialize/deserialize as they see fit. We want to keep the ability to switch vendors and should not depend on any particular format. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand the reasons why it can't be defined. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @v-stepanov I agree with you. Another point is that this will eventually find its way to the data lake, where incompatible changes in types might lead to breaking their processes. Maybe we should involve them here, so that they know about this new piece of information. I would say that having arbitrary maps<string, string> is better than having arbitrary objects. @lmineiro do you think it's possible to assume that the extensible parts fit in a map<string, string> or should we keep it as an opaque object, that might have arbitrary nested structures? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. We deliberately picked the TextMap format to, at least, provide that guarantee. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it is a type: object
additionalProperties:
type: string There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As the previous discussion somehow already is noted as "outdated" by Github, here my two cents to this: If we suppose it is a TextMap, i.e. a map of strings to strings (contrary to an arbitrary JSON object with nested stuff), we can just declare it that way with this syntax: span_ctx:
description: |
Object containing ...
type: object
additionalProperties:
type: string This should take care of the concerns of @v-stepanov (not allowing arbitrary structures in here), while at the same time still allows some vendor-independence by not defining any keys or the meaning of their values. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to be clear. What @ePaul just described is the actual proposal. If not defining the key names from said Map is not acceptable, only then I'd consider some other encoding. Serializing/Deserializing from/to Text Map is supported out of the box on any implementation, once again, the reason why it was proposed. No extra work is needed from any consumer, apart from the typical tracer.Extract() call, using the Text Map carrier. @v-stepanov, from your perspective, is what @ePaul described a reasonable/good compromise for Nakadi? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lmineiro We scheduled internal team meeting for tomorrow morning to discuss the metadata changes in Nakadi for Open Tracing. So we will come back to you after that. But we also will need to get Data Lake team opinion on that.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think another option (mentioned by @jmhofer) to encode a flat map (not using an object) would be something like this (as a list of objects). span_ctx:
description: ...
type: array
items:
type: object
properties:
key:
type: string
value:
type: string I would prefer the additionalProperties version, though – it has less overhead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We discussed this all with the team. I would like to describe you what we came to. Basically we examined 3 options for this: 1) Free form object
Our concerns here is that we will break the guarantees that for "compatible" schema mode in Nakadi all fields are defined. This will create a backdoor to put any undefined data to this field. 2) Just a single string containing encoded data
With this approach the problem is that it has very poor readability, so it will not be possible to get use of this data before decoding. So, for example, in Data Lake it will be very difficult to use this data. 3) Flat string->string map
b)
In general approaches 3a and 3b are similar as they represent flat string->string map. Our another concern for any of these approaches is that none of them define the actual data that will be there and it is not clear how our users can make use of it. You say that there is standardized way to serialize/deserialize open-tracing data to/from string->string map, then I suppose there should be some standardized list of required keys it should contain, in other case it just can not work. Situation with current suggestions looks quite strange for us. Nakadi defines the following: there is a span_ctx field in metadata, I don't know what is inside and I don't provide any guarantees but probably open-tracing implementations should be able to parse it if the publisher has put correct data there. From our point of view, if we add something to Nakadi metadata we should be able to explain our users how to adopt it and how to benefit from it. So that the user should be able to know what to do with span_ctx if it is present in metadata. For us the most preferable solution will still be to define at least the basic schema for this field (maybe with loosing of some flexibility). @lmineiro can you please elaborate on that to clarify those points? Maybe providing some examples of what will be the actual content there? And with some use-cases of apps that will write/read data to/from that field without prior agreements between them. Thank you. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I read the discussion with datalake team and the fact that 3b provides a more stable schema inference could be pointed out as a clear advantage. But I personally expected the Even though 3a might lead to ever growing schema inference on the datalake side, I would seriously consider it, since it's always better to have more informative schemas, even for tracing data. It's more "searchable". Having "array<string, string>" is not very good for queries. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, it's important to understand if is it "ever growing" or not. I assume it might be quite stable set of properties. I think it would be really useful to have some real examples pasted here with and without baggage items. Then we discuss the real thing and not something hypothetical. @lmineiro do you think it possible to provide samples with the current implementation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The example from the narrative is what we're expected to have with the current vendor: Example without baggage - stable
Example with baggage - arbitrary
We currently don't have any valid use cases for baggage. We also advise our users to use it with care, in respect to the specification note: "Use this feature thoughtfully and with care. Every key and value is copied into every local and remote child of the associated Span, and that can add up to a lot of network and cpu overhead." I understand the preference for 3b, it fulfills both Nakadi's goal of strictly defined schemas and it seems to be better for the searching the Data Lake (this one a Zalando internal). I still would like to point out that every Nakadi producer and consumer out there, internal or not, will have to write their own serializer an deserializer as the OpenTracing libraries will not be able to do it out of the box - this is, in fact, the biggest advantage of 3a. The Data Lake is, as far as I know, proprietary. I don't expect any of the external users using Nakadi to benefit from that. Only the maintainers can make the decision which way to go forward. My choice, as Nakadi user, even if I do have access to our Data Lake, would be to have the greatest out of the box experience if and when I decide to use OpenTracing. Option 3b adds additional cost and effort to that decision while 3a allows any user to reuse existing code and infrastructure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice, I think we came to a good compromise here. I fixed the type to be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, sounds good! |
||
required: | ||
- eid | ||
- occurred_at | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,6 +37,27 @@ public void validationOfBusinessEventShouldRequiredMetadata() { | |
assertThat(error.get().getMessage(), equalTo("#: required key [metadata] not found")); | ||
} | ||
|
||
@Test | ||
public void validationOfBusinessEventShouldAllowSpanCtxtInMetadata() { | ||
final EventType et = EventTypeTestBuilder.builder().name("some-event-type") | ||
.schema(basicSchema()).build(); | ||
et.setCategory(EventCategory.BUSINESS); | ||
|
||
final JSONObject event = new JSONObject("{\"metadata\":{" + | ||
"\"occurred_at\":\"1992-08-03T10:00:00Z\"," + | ||
"\"eid\":\"329ed3d2-8366-11e8-adc0-fa7ae01bbebc\"," + | ||
"\"span_ctx\": {" + | ||
" \"ot-tracer-spanid\": \"b268f901d5f2b865\"," + | ||
" \"ot-tracer-traceid\": \"e9435c17dabe8238\"," + | ||
" \"ot-baggage-foo\": \"bar\"" + | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we also have a test that validates the case when we have a none-string field in span_ctx? (Validation should fail in this case) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @v-stepanov done 8488b55 |
||
"}}," + | ||
"\"foo\": \"bar\"}"); | ||
|
||
final Optional<ValidationError> error = EventValidation.forType(et).validate(event); | ||
|
||
assertThat(error, isAbsent()); | ||
} | ||
|
||
@Test | ||
public void validationOfDataChangeEventRequiresExtraFields() { | ||
final EventType et = EventTypeTestBuilder.builder().name("some-event-type") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: transcations
"this field is be populated" -> "this field is populated"