Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the HTTP header name choice for trace #119

Closed
serialseb opened this issue Apr 28, 2018 · 24 comments
Closed

Clarify the HTTP header name choice for trace #119

serialseb opened this issue Apr 28, 2018 · 24 comments

Comments

@serialseb
Copy link

serialseb commented Apr 28, 2018

Hi,

The following text made me rise an eyebrow, so I thought I'd give some feedback, maybe the text could clarify those points for future readers.

While HTTP headers are conventionally delimited by hyphens, the trace context header names are not. Rather, they are lowercase concatenated "traceparent" and "tracestate" respectively. The departure from convention is due to practical concerns of propagation. Trace context is unlike typical http headers, which are point-to-point and do not propagate through other systems like messaging. Different systems have different constraints. For example, some cannot read case insensitively, and others forbid the hyphen character. Even if we could suggest not using the same format for such systems, we know many systems transparently copy http headers into fields. This class of concerns only exist when we choose to support mixed case with hyphens. By choosing not to, we open trace context integration beyond http at the cost of a conventional distraction.

  1. Http headers are case insensitive so I'm unsure why the lowercase mention exists. A specification cannot redefine HTTP, and nothing can or would prevent proxies and intermediaries from re-casing as they see fit.
  2. Most http headers are not point-to-point, there is a limited list alongside the ones in the Connection response header.
  3. The reference to "fields" is confusing here, I'm not sure what it refers to.
  4. If many systems transparently copy http headers into fields, then it'll end up copying a vast majority of headers using the convention, so I'm unsure what scenario we cater for here, especially as most pre-existing and deployed tracing headers use the dash, without triggering interoperability problems.

Many thanks.

@codefromthecrypt
Copy link

codefromthecrypt commented Apr 29, 2018 via email

@yurishkuro
Copy link
Member

Personally, the no-hyphen rule bothers me too. The current draft is specifically talking about HTTP headers, where hyphens are perfectly fine. Another protocol may have a different specification. The theoretic case of some system that receives HTTP requests and translates them, say, into RabbitMQ messages by blindly copying all HTTP headers to message headers is not going to work anyway, since most common HTTP headers have hyphens, as well as headers used by existing tracing systems.

Even in HTTP world it's not uncommon to have infra that doesn't allow dashes (e.g. CGI, WSGI - they will also uppercase, @adriancole), but people often assume that dash and underscore are interchangable.

@codefromthecrypt
Copy link

codefromthecrypt commented Apr 29, 2018 via email

@serialseb
Copy link
Author

serialseb commented Apr 29, 2018

@adriancole thank you for taking the time to respond.

My interest in this is one of implementation, as I work on distributed systems and am a web framework author; and as a member of the wider standards community.

The recasing to uppercase is indeed an odd idea, although there is prior art in cgi gates of old times. There is however a whole set of http frameworks that would parse http headers, turn them into an object model, and re-serialize them on the other end. It is also common for operators configuring reverse-proxy to mistype the casing.

From a standards perspective, defining a header as case sensitive is not possible without redefining the semantics of HTTP itself. I woudl imagine that it would prevent a registration of the header as per BCP90

From an implementation perspective, the header itself may be comma-separated, so it could be folded into one or split in multiple headers, as per the specification. As with most things in HTTP, if it can be done it will be done, in intermediaries that the operator of a tracing system, or the author of the headers, will have no control over.

RFC7231 Section 8.3.1 says on the subject

Whether the field is a single value or whether it can be a list
(delimited by commas; see Section 3.2 of [RFC7230]).

If it does not use the list syntax, document how to treat messages
where the field occurs multiple times (a sensible default would be
to ignore the field, but this might not always be the right
choice).

Note that intermediaries and software libraries might combine
multiple header field instances into a single one, despite the
field's definition not allowing the list syntax. A robust format
enables recipients to discover these situations (good example:
"Content-Type", as the comma can only appear inside quoted
strings; bad example: "Location", as a comma can occur inside a
URI).

Within those constraints, I find it difficult to imagine using the http headers as one serialisation stable enough to transfer with no modification across protocols without some level of parsing, and that will mean some level of processing for things that you cannot disallow http to do, by design, like case sensitivity in header field names and the presence of multiple headers for lists.

At this point, I'm not sure that the lack of hyphen looks as attractive anymore, but it's entirely likely that i do not understand the mappings between tracing systems that you refer to, and I expect other readers and implementers of specifications may stumble upon the same difficulty, hence my suggestion to extend the description of the incompatibilities between toolsets leading to the choice of the hyphen.

Either way, if the goal is to have one simple mapping from a single header to whatever other system of choice exists, then lists with commas seem to prevent this from happening, as for example most json-based messaging systems will not allow the presence of duplicate keys in objects. It would probably be wiser to have a space-separated list instead, and a processing rule (as per above) when multiple are present.

If the goal, on the other hand, is to define semantics that integrate well with HTTP, it may be necessary to provide the steps to generate an idiomatic single key/value pair representation, separate from the definition of the tracing elements themselves, along the lines of what was used rather successfully in RFC8288.

On a side-note, my original research lead me from the pull request to the minutes that alluded to a decision made at a workshop, for which I couldn't find any minutes. This would explain why, as part of the wider spec community, the facts leading to that decision are unknown enough that I opened this issue.

@codefromthecrypt
Copy link

codefromthecrypt commented Apr 29, 2018 via email

@asbjornu
Copy link

asbjornu commented Apr 29, 2018

  1. you want us to intentionally add hyphens even though folks with practice know this to cause issues in distributed tracing which by definition is more than http

While I understand why you would want some definition power of the header's value, I don't understand the need to control its name. As @serialseb writes, proxies, caches, middleware and whatnot is going to rewrite these names regardless of what you write in this specification. Most proxy providers probably won't even know about its existence for many years to come. You can't redefine HTTP; it is well defined in RFC 7230.

Also, just the idea of taking one protocol's header name and plunging it into another without any modification sounds very weird to me. The value, I can understand, but not the name. The name will have to obey the syntax of the protocol it is being used in. In binary protocol such as MQTT, the header won't even have a name – it will just be represented by a number.

  1. you want us to intentionally not suggest lowercase for header name encoding

If you want to make suggestions, they should at least conform with the protocol which they apply to. If you want to ensure compatibility across protocols, provide guidance for each protocol you envision that this standard is going to be used in. For most open and standard protocols, I assume you're going to have to formally register the header, in which case you will need to adhere to the syntax of each individual protocol anyway.

  1. you want us to accept concatenated headers on things that shouldn't be. ex right now trace headers in the wild would fail of people concatenated them.

This is how HTTP, SMTP, POP3, IMAP and all other protocols using MIME headers work. Headers can be split, concatenated and juggled around for every hop a message takes from the sender to the receiver. The standard allows for a great deal of flexibility and you can't break this 45 years of legacy with your specification.

@serialseb
Copy link
Author

@adriancole I don't want you to do anything. You're writing an open specification that, as currently written, would suffer from significant compatibility issues with HTTP. I suggested changes to either confirm better to HTTP, or alternatively, to continue in the current direction but without the compatibility problems.

Either way, as is this specification wouldn't be usable on HTTP as deployed on the web, hence my feedback. If you believe it to be unhelpful, or incorrect, I'd suggest outreaching to some of the w3c resources, as they're much more authoritative than me on the question.

If you wish me not to provide feedback at this stage in the standardisation, please feel free to say so, I'm in no way trying to impose.

@codefromthecrypt
Copy link

codefromthecrypt commented Apr 30, 2018 via email

@codefromthecrypt
Copy link

codefromthecrypt commented Apr 30, 2018 via email

@dret
Copy link
Member

dret commented Apr 30, 2018

just chiming in here: @serialseb and @asbjornu to my understanding have a very valid point here: you may be defining something that attempts to be cross-protocol, and that's fine. but it is hard to imagine that you can do so without carefully examining and respecting the rules of each individual protocol you want to cover. i haven't look at any of your other "bindings", but for HTTP, you definitely have a bit of adjustment to do. and as @asbjornu said, it is hard to imagine that it will scale to rely on blindly copying "headers" and "values" across protocols, as all of them will have different definitions of how to handle both (he alluded to MQTT, and there certainly are others).

@codefromthecrypt
Copy link

codefromthecrypt commented Apr 30, 2018 via email

@codefromthecrypt
Copy link

codefromthecrypt commented Apr 30, 2018 via email

@AloisReitbauer
Copy link
Contributor

AloisReitbauer commented Apr 30, 2018

Summarising this issue, we have two items to discuss:

  • Adding Hyphens for header names. This would improve readability, but I don't see any technical implications.
  • Enforce casing of header name. Ideally we make the spec work ignoring casing. This will make the spec as well as implementations more robust as well. We can make the header case-insensitive for HTTP but enforce casing for other protocol binds.

@dret
Copy link
Member

dret commented Apr 30, 2018 via email

@serialseb
Copy link
Author

Due to the tone of this conversation, I’m withdrawing from making further comments.

@SergeyKanzhelev
Copy link
Member

SergeyKanzhelev commented Apr 30, 2018

@serialseb I'm sorry this thread came into this. I appreciate your feedback and opinion.

We want to make this specification work for real life scenarios. In order to achieve interoperability it should be as strict as possible while accounting for the history and prior art. @adriancole and others working on it has a great experience building distributed tracing systems. And saw a lot of misunderstanding, lax and corner-cutting implementations of standards. Which made code a big mess of if statements accounting for such implementations. You might have seen it firsthand as well. That's said -- the spec should be clear on not dismissing prior art, rather suggest trace systems how to operate in "Http world".

One additional feedback to what @AloisReitbauer listed I'm taking is that rationale and headers overview section should be re-worked to not make reasoning of chosen non-traditional names clearer.

@dret
Copy link
Member

dret commented Apr 30, 2018 via email

@SergeyKanzhelev
Copy link
Member

@dret sure! The rationale docs and introductions is a way to be transparent of the thinking process AND more importantly -- the way to highlight non-traditional for http choice. This text makes it visible and explicit for implementors.

@nicmunroe
Copy link

You can't redefine HTTP; it is well defined in RFC 7230.


If you want to make suggestions, they should at least conform with the protocol which they apply to.


You're writing an open specification that, as currently written, would suffer from significant compatibility issues with HTTP.


Either way, as is this specification wouldn't be usable on HTTP as deployed on the web, hence my feedback.


you can't break this 45 years of legacy with your specification.


We should probably re-word some stuff to make clear what is a suggestion (i.e. lowercasing - which we can't enforce since HTTP treats header names as case-insensitive, but we can encourage) vs. required (i.e. no hyphens in the name - which we are certainly allowed to do). These are oversights or artifacts of early drafts - not an intentional attempt to subvert HTTP. C'mon - nobody is attempting to redefine or break HTTP with this spec...

With that out of the way: it seems to me that since the main purpose of this tracing spec is to maximize interoperability, we should focus on making decisions that promote success by default, even in the face of middleware that does strange things or tooling that is implemented without carefully read the spec. That's what the no-hyphen-header-names and lowercase-suggestion stuff is about - successful integration by default, in the wild, based on hard-earned prior experience:

  • The only drawback I see to removing the hyphens is a slight ding on readability. On the plus side it fixes real concrete interoperability issues that have been experienced in the past. If the main point of this spec is interoperability (✅ ), and header names without hyphens is perfectly valid in HTTP (✅ ), then this seems like a clear win to me (✅ ?).
  • On the casing issue, while we can't use MUST language to force lowercase since the HTTP spec treats header names as case insensitive, there is absolutely nothing stopping us from recommending it via SHOULD language. Essentially boils down to a "be conservative in what you send, be liberal in what you accept" situation. We can say that receivers MUST accept traceparent and tracestate headers regardless of casing, but that they SHOULD lowercase the headers when doing further HTTP propagation, with a quick explanation for why and a link to the rationale doc for full details.

However much we feel that middleware shouldn't do certain things we have empirical proof that it does happen, and will continue to happen if we make the same mistakes from the past. We might gain some moral high ground by saying "sorry your stuff is broken but we decided to go with hyphens and not encourage lowercase header names because it was more HTTP conventional - you need to fix your libraries". And that middleware (e.g. apache camel) would likely respond "cool story, but we're established and popular and not going to break our existing customers for your tracing spec". The end result is that actual users have broken tracing and nobody is happy. Hence the no-hyphen names and lowercase suggestions: success by default.

Headers can be split, concatenated and juggled around for every hop a message takes from the sender to the receiver. The standard allows for a great deal of flexibility and you can't break this 45 years of legacy with your specification.

As @serialseb mentioned, in RFC 7231 Section 8.3.1, we are certainly allowed to define Whether the field is a single value or whether it can be a list (delimited by commas; see Section 3.2 of [RFC7230]). Again, nobody here is trying to break HTTP. As @adriancole said, traceparent is clearly a single value header. tracestate may or may not be - that's being actively discussed (see #107, #81, and probably others). In any case we can and should define what good citizen implementations should do when they receive a request that contains multiple traceparent and/or tracestate headers, as I agree it is true that we can't control what intermediaries do and there's a good chance broken impls will accidentally do a headers.add("traceparent", traceParentValue) rather than headers.set(...) someday, and it would be good to give receivers of that broken tracing header guidance on how to handle it gracefully.

@SergeyKanzhelev
Copy link
Member

Added some clarification for the name choice in spec. Closing issue

@alexweej
Copy link

I know I'm very, very late to this party but I just... wat.

@ioquatix
Copy link

ioquatix commented Sep 2, 2021

I've implemented HTTP/1 & HTTP/2 clients and servers and spent a lot of time reading through the relevant RFCs/specs. Since I wrote these clients and servers, I wanted to add tracing, so I was really excited to see this spec. Well done on standardising it.

But then I saw the header names and the related comments regarding hyphens and wanted to understand the rational behind such a decision.

I'm curious what actually would have broken using more standard names like "Trace-Parent" and "Trace-State" which just semantically seem much more consistent with the entire HTTP model. When I read the spec and noticed the comment regarding hyphens, I wanted to know what was the justification.

Reading this thread, I just don't understand why anyone could possibly map between HTTP headers and some non-HTTP key-value metadata without some kind of transformation. So the whole idea of "interoperability" just feels like a stretch without understanding the exact "hard won experience" that would lead to such a trade-off.

I guess I understand the logic, but I'm also a bit disappointed as the headers feel out of place and a bit ugly/inconsistent with the rest of HTTP (which it seems this spec is mostly in relation to). I know backwards compatibility is important but I believe specs need to be forward looking. Maybe I'm too idealistic but I always believe there will be a lot more code written in the future than has been written in the past, so I personally care more about the future and less about the past.

Anyway, I'm guessing that ship has sailed?

@dyladan
Copy link
Member

dyladan commented Sep 14, 2021

I've implemented HTTP/1 & HTTP/2 clients and servers and spent a lot of time reading through the relevant RFCs/specs. Since I wrote these clients and servers, I wanted to add tracing, so I was really excited to see this spec. Well done on standardising it.

But then I saw the header names and the related comments regarding hyphens and wanted to understand the rational behind such a decision.

I'm curious what actually would have broken using more standard names like "Trace-Parent" and "Trace-State" which just semantically seem much more consistent with the entire HTTP model. When I read the spec and noticed the comment regarding hyphens, I wanted to know what was the justification.

Reading this thread, I just don't understand why anyone could possibly map between HTTP headers and some non-HTTP key-value metadata without some kind of transformation. So the whole idea of "interoperability" just feels like a stretch without understanding the exact "hard won experience" that would lead to such a trade-off.

I guess I understand the logic, but I'm also a bit disappointed as the headers feel out of place and a bit ugly/inconsistent with the rest of HTTP (which it seems this spec is mostly in relation to). I know backwards compatibility is important but I believe specs need to be forward looking. Maybe I'm too idealistic but I always believe there will be a lot more code written in the future than has been written in the past, so I personally care more about the future and less about the past.

Anyway, I'm guessing that ship has sailed?

Most of these decisions were made before my time on the working group, but I suspect the ship has indeed sailed. At least for traceparent and tracestate, they are official w3c recommendations and I see no workable path forward to change them. For headers which are not yet released, it would be possible to change still at this point if there is a concrete benefit. We discussed using more typical header casing for baggage (options were baggage, Baggage, correlationcontext, and Correlation-Context). In the end, we settled on the all lowercase baggage mostly because it was consistent with headers already recommended by the working group.

@ioquatix
Copy link

I would have gone for:

Trace-Parent
Trace-State
Trace-Context

Case is irrelevant (even more so with HTTP/2). Trace- should have been treated like a namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests