Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenTracing span context #931

Merged
merged 6 commits into from Sep 24, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/_data/nakadi-event-bus-api.yaml
Expand Up @@ -2053,6 +2053,13 @@ definitions:
same partition and therefore compacted in a proper way (as compaction is performed per partition).
type: string
example: '329ed3d2-8366-11e8-adc0-fa7ae01bbebc'
span_ctx:
description: |
Object containing an OpenTracing http://opentracing.io span context. In case the producer of this event type
uses OpenTracing for tracing transcations accross distributed services, this field is be populated with an

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: transcations
"this field is be populated" -> "this field is populated"

object that allows consumers to resume the context of a span. The details of the payload are vendor
specific and consumers are expected to contact the producer in order to obtain further details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually add some short mention that any attempt at inspecting or effectively using the payload, apart from the Tracer implementation can lead to vendor lock-in which we want to avoid.

type: object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't it have any defined structure?
(I ask this because we always criticize our users for not defining the schema properly :) )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not possible or desired to define a schema as that would enforce vendor lock-in.

Unfortunately, OpenTracing deliberately leaves the span context wire format undefined. This allows each vendor to serialize/deserialize as they see fit. We want to keep the ability to switch vendors and should not depend on any particular format.
Additionally, part of the span context are the baggage items, which are arbitrary key values.
I remember this discussion from the initial narrative and a good idea could be to apply some sanity limits (payload size) for this particular element.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reasons why it can't be defined.
I'm just afraid that it goes against what we fight for in Nakadi: all data to be defined. With this field it will never be possible to completely define the structure of the event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@v-stepanov I agree with you. Another point is that this will eventually find its way to the data lake, where incompatible changes in types might lead to breaking their processes. Maybe we should involve them here, so that they know about this new piece of information.

I would say that having arbitrary maps<string, string> is better than having arbitrary objects.

@lmineiro do you think it's possible to assume that the extensible parts fit in a map<string, string> or should we keep it as an opaque object, that might have arbitrary nested structures?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We deliberately picked the TextMap format to, at least, provide that guarantee.

Copy link
Member

@ePaul ePaul Sep 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a Map<String, String>, it can be defined as such in JSON schema/OpenAPI schema:

        type: object
        additionalProperties:
          type: string

Copy link
Member

@ePaul ePaul Sep 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the previous discussion somehow already is noted as "outdated" by Github, here my two cents to this:

If we suppose it is a TextMap, i.e. a map of strings to strings (contrary to an arbitrary JSON object with nested stuff), we can just declare it that way with this syntax:

      span_ctx:
        description: |
          Object containing ...
        type: object
        additionalProperties:
          type: string

This should take care of the concerns of @v-stepanov (not allowing arbitrary structures in here), while at the same time still allows some vendor-independence by not defining any keys or the meaning of their values.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear. What @ePaul just described is the actual proposal. If not defining the key names from said Map is not acceptable, only then I'd consider some other encoding.

Serializing/Deserializing from/to Text Map is supported out of the box on any implementation, once again, the reason why it was proposed. No extra work is needed from any consumer, apart from the typical tracer.Extract() call, using the Text Map carrier.

@v-stepanov, from your perspective, is what @ePaul described a reasonable/good compromise for Nakadi?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmineiro We scheduled internal team meeting for tomorrow morning to discuss the metadata changes in Nakadi for Open Tracing. So we will come back to you after that. But we also will need to get Data Lake team opinion on that.
As I understand current options are:

  1. just object with free structure (the initial proposal)
  2. single string with encoded span context
  3. flat map of key-value pairs (additionalProperties: type: string)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another option (mentioned by @jmhofer) to encode a flat map (not using an object) would be something like this (as a list of objects).

    span_ctx:
      description: ...
      type: array
      items:
        type: object
        properties:
          key:
            type: string
          value:
            type: string

I would prefer the additionalProperties version, though – it has less overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this all with the team. I would like to describe you what we came to. Basically we examined 3 options for this:

1) Free form object

"span_ctx" : {
  "key1": {
    "key1_1": "abc"
  }
  "key2": 43232,
  ...
}

Our concerns here is that we will break the guarantees that for "compatible" schema mode in Nakadi all fields are defined. This will create a backdoor to put any undefined data to this field.

2) Just a single string containing encoded data

"span_ctx": "WFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb2"

With this approach the problem is that it has very poor readability, so it will not be possible to get use of this data before decoding. So, for example, in Data Lake it will be very difficult to use this data.

3) Flat string->string map
a)

"span_ctx" : {
  "key1": "value1",
  "key2": "value2",
  ...
}

b)

"span_ctx" : [
  {
    "key": "key1",
    "value": "value1"
  },
  {
    "key": "key2",
    "value": "value2"
  },
  ...
]

In general approaches 3a and 3b are similar as they represent flat string->string map.
(However approach 3a has problems similar to case (1) as it will not allow to define all fields of Nakadi event in compatible mode.)
Our concern regarding flat string->string map is that the open-tracing data is actually more complex and has more hierarchical structure, so in a case of map some data still will have to be compacted or transformed to string - which lowers the data quality as the actual value is different from its representation in that map.


Our another concern for any of these approaches is that none of them define the actual data that will be there and it is not clear how our users can make use of it. You say that there is standardized way to serialize/deserialize open-tracing data to/from string->string map, then I suppose there should be some standardized list of required keys it should contain, in other case it just can not work.

Situation with current suggestions looks quite strange for us. Nakadi defines the following: there is a span_ctx field in metadata, I don't know what is inside and I don't provide any guarantees but probably open-tracing implementations should be able to parse it if the publisher has put correct data there.

From our point of view, if we add something to Nakadi metadata we should be able to explain our users how to adopt it and how to benefit from it. So that the user should be able to know what to do with span_ctx if it is present in metadata. For us the most preferable solution will still be to define at least the basic schema for this field (maybe with loosing of some flexibility).

@lmineiro can you please elaborate on that to clarify those points? Maybe providing some examples of what will be the actual content there? And with some use-cases of apps that will write/read data to/from that field without prior agreements between them. Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the discussion with datalake team and the fact that 3b provides a more stable schema inference could be pointed out as a clear advantage. But I personally expected the span_ctx to be defined by additionalProperties: string. I just didn't proposed it to be like that since the beginning because it was not clear for me how the "baggage items" would look like. I think it's important to have the confirmation that span_ctx = additionalProperties: true suites all use cases.

Even though 3a might lead to ever growing schema inference on the datalake side, I would seriously consider it, since it's always better to have more informative schemas, even for tracing data. It's more "searchable". Having "array<string, string>" is not very good for queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's important to understand if is it "ever growing" or not. I assume it might be quite stable set of properties. I think it would be really useful to have some real examples pasted here with and without baggage items. Then we discuss the real thing and not something hypothetical. @lmineiro do you think it possible to provide samples with the current implementation?

Copy link

@lmineiro lmineiro Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example from the narrative is what we're expected to have with the current vendor:

Example without baggage - stable

ot-tracer-spanid: b268f901d5f2b865
ot-tracer-traceid: e9435c17dabe8238
ot-tracer-sampled: true

Example with baggage - arbitrary

ot-tracer-spanid: b268f901d5f2b865
ot-tracer-traceid: e9435c17dabe8238
ot-tracer-sampled: true
ot-baggage-key1: value1
ot-baggage-key2: value2

We currently don't have any valid use cases for baggage. We also advise our users to use it with care, in respect to the specification note: "Use this feature thoughtfully and with care. Every key and value is copied into every local and remote child of the associated Span, and that can add up to a lot of network and cpu overhead."

I understand the preference for 3b, it fulfills both Nakadi's goal of strictly defined schemas and it seems to be better for the searching the Data Lake (this one a Zalando internal).

I still would like to point out that every Nakadi producer and consumer out there, internal or not, will have to write their own serializer an deserializer as the OpenTracing libraries will not be able to do it out of the box - this is, in fact, the biggest advantage of 3a.

The Data Lake is, as far as I know, proprietary. I don't expect any of the external users using Nakadi to benefit from that.

Only the maintainers can make the decision which way to go forward. My choice, as Nakadi user, even if I do have access to our Data Lake, would be to have the greatest out of the box experience if and when I decide to use OpenTracing. Option 3b adds additional cost and effort to that decision while 3a allows any user to reuse existing code and infrastructure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think we came to a good compromise here. I fixed the type to be additionalProperties: type: string which supports 3a.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good!

required:
- eid
- occurred_at
Expand Down
Expand Up @@ -162,6 +162,7 @@ private static JSONObject addMetadata(final JSONObject schema, final EventType e
.put("type", "string")
.put("enum", Arrays.asList(new String[]{eventType.getName()}));
final JSONObject string = new JSONObject().put("type", "string");
final JSONObject object = new JSONObject().put("type", "object");
final JSONObject dateTime = new JSONObject()
.put("type", "string");

Expand All @@ -171,6 +172,7 @@ private static JSONObject addMetadata(final JSONObject schema, final EventType e
metadataProperties.put("parent_eids", arrayOfUUIDs);
metadataProperties.put("flow_id", string);
metadataProperties.put("partition", string);
metadataProperties.put("span_ctx", object);

final ArrayList<String> requiredFields = newArrayList("eid", "occurred_at");
if (eventType.getCleanupPolicy() == CleanupPolicy.COMPACT) {
Expand Down
Expand Up @@ -37,6 +37,27 @@ public void validationOfBusinessEventShouldRequiredMetadata() {
assertThat(error.get().getMessage(), equalTo("#: required key [metadata] not found"));
}

@Test
public void validationOfBusinessEventShouldAllowSpanCtxtInMetadata() {
final EventType et = EventTypeTestBuilder.builder().name("some-event-type")
.schema(basicSchema()).build();
et.setCategory(EventCategory.BUSINESS);

final JSONObject event = new JSONObject("{\"metadata\":{" +
"\"occurred_at\":\"1992-08-03T10:00:00Z\"," +
"\"eid\":\"329ed3d2-8366-11e8-adc0-fa7ae01bbebc\"," +
"\"span_ctx\": {" +
" \"ot-tracer-spanid\": \"b268f901d5f2b865\"," +
" \"ot-tracer-traceid\": \"e9435c17dabe8238\"," +
" \"ot-baggage-foo\": \"bar\"" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also have a test that validates the case when we have a none-string field in span_ctx? (Validation should fail in this case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"}}," +
"\"foo\": \"bar\"}");

final Optional<ValidationError> error = EventValidation.forType(et).validate(event);

assertThat(error, isAbsent());
}

@Test
public void validationOfDataChangeEventRequiresExtraFields() {
final EventType et = EventTypeTestBuilder.builder().name("some-event-type")
Expand Down