Why should keys be restricted to lowercase ASCII? #508

aphillips · 2022-10-23T17:28:58Z

3.3.2.2.1 Key
https://www.w3.org/TR/trace-context-2/#key

A key MUST begin with a lowercase letter or a digit and contain up to 256 characters including lowercase letters (a-z), digits (0-9), underscores (_), dashes (-), asterisks (*), forward slashes (/), and at signs (@).

The definitions here should be more rigorously laid out. The key is part of the tracestate header's value and is described as "...an identifier that describes the vendor". The identifier is restricted to lowercase ASCII letters, digits, and the punctuation shown above in the quote.

Allowing keys/values that are non-ASCII is one possibility here, since vendors presumably can be named anything. Restricting the value to an ASCII token is understandable because of the need for encoding efficiency, but the text should be clearer about the reasoning behind the ASCII restriction for the key.

Note that key names like this are sometimes machine generated and a restriction to ASCII can be irksome when non-ASCII Unicode characters might be present in the data. The pre-computed portions probably should be well-known (ASCII !) tokens, but it's unclear why the names are otherwise restricted.

The text was updated successfully, but these errors were encountered:

kalyanaj · 2022-11-22T20:40:16Z

Thanks for the review and for filing this issue. We discussed this in the Distributed Tracing working group meeting today and below captures the summary from that discussion.

Since this is Level 2 of the spec, the primary reason is back compatibility with existing Level 1 standard.

Some of the original reasons (from Level 1) for restricting the keys and values:

Processing efficiency: For example, an implementation could split on commas to get the list of entries.
If we extended the allowed character list, we would have had to have specific escape characters or disallowed characters (to meet the processing efficiency goal) which would have added more complexity to the specification.

We will update the specification to include the above rationale.

At this point, the main reason to not revisit this decision is backwards compatibility since Level 1 is already a standard.

aphillips · 2022-11-23T18:42:53Z

Some of the original reasons (from Level 1) for restricting the keys and values:

Processing efficiency: For example, an implementation could split on commas to get the list of entries.

If we extended the allowed character list, we would have had to have specific escape characters or disallowed characters (to meet the processing efficiency goal) which would have added more complexity to the specification.

I think these reasons are questionable.

The ASCII comma is still a comma. Disallowing a small list of characters (comma, space, etc.) because they are syntax-meaningful would allow implementations to split on commas. There is no need to introduce escapes unless you want to allow restricted characters (comma, space) into the namespace.

At this point, the main reason to not revisit this decision is backwards compatibility since Level 1 is already a standard.

I can't find ANY record of I18N having reviewed trace-context 1. In fact, I see your transreq (w3c/transitions#133) does not mention us--only security and privacy. As you might imagine, I am unhappy to see you relying on back compat if we were not afforded a look at your first edition.

@plehegar was there a process failure here or did I overlook a review somehow? @r12a for visibility.

basti1302 · 2022-12-01T09:23:46Z

Restricting the value to an ASCII token is understandable because of the need for encoding efficiency, but the text should be clearer about the reasoning behind the ASCII restriction for the key.

#515 is an attempt to improve that partial aspect by providing the reason to restrict the allowed characters.

You have a point that the original design choice for this restriction could have been different. I also understand your unhappiness of level 1 not having gone through an i18n review (if that is what happened). I cannot speak on @plehegar behalf so I will let him comment on that.

But even if there was a process failure for level 1, the fact remains that there are implementations out there which we would break if we changed this now. We are adamant that we cannot break existing implementations by allowing more characters in tracestate in level 2.

basti1302 · 2022-12-01T09:27:09Z

I wonder if w3c/i18n-request#108 was meant to be the i18n review request for trace context? It fails to mention which spec it is for. But I'm guessing wildly here, nevermind.

aphillips · 2022-12-02T15:53:16Z

@basti1302 I don't know how the process failed us here. I think I see a gap in our tooling and I'll take it up with @plehegar and company to help prevent repeats.

Regarding this spec/issue, I understand your commitment to backwards compatibility: it's important.

Regarding #515, that's the right intention, but the logic of that statement is flawed. It says:

This restriction ensures that the tracestate header can be parsed efficiently into individual list members by splitting the header value at each comma, and splitting the list members an implementation is interested in at the equals sign.

A string consisting of Unicode code points or a bytestring consisting of UTF-8 can still be parsed on commas and equal signs, since those are still characters--with the same byte values even (in the case of UTF-8). Consider the string: мзкез=日本語,文字化け,éèçéèç,etc, whose UTF-8 byte values are:

D0 BC D0 B7 D0 BA D0 B5 D0 B7 3D E6 97 A5 E6 9C AC E8 AA 9E 2C E6 96 87 E5 AD 97 E5 8C 96 E3 81 91 2C C3 A9 C3 A8 C3 A7 C3 A9 C3 A8 C3 A7 2C 65 74 63

Notice that 0x3D (=) and 0x2C (,) are still present...

I would tend to say instead:

This restriction ensures that the tracestate header is encoded in a compact and efficient manner and that values can be processed by even very old systems that do not support non-ASCII characters.

( I added a review just now to #515 that has a different wording suggestion. Use that one)

I suppose we could introduce a way to smuggle non-ASCII characters into the values (Punycode [RFC3492] looks like about the only commonly available option compatible with the character set you have), but given the nature and probable uses of tracestate, I think you might as well skip it for now.

For future reference: restricted-to-ASCII mechanisms have a long history of coming back to bite standardizers/implementers. The existence of Punycode to support IDNs is just one example of inventing convoluted schemes for getting Unicode code points into protocols or formats after the fact. But v1's day is done. If you fix your note above I think you can close this issue.

basti1302 · 2022-12-06T20:18:39Z

Closing this as satisfied as discussed in #515 (comment)

aphillips · 2022-12-06T22:29:04Z

See discussion in PR. I'm satisfied by this outcome.

aphillips added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Oct 23, 2022

aphillips mentioned this issue Oct 23, 2022

Why should keys be restricted to lowercase ASCII? w3c/i18n-activity#1600

Closed

kalyanaj assigned basti1302 Nov 22, 2022

basti1302 mentioned this issue Dec 1, 2022

add reasoning for the restriction of tracestate keys #515

Closed

basti1302 closed this as completed Dec 6, 2022

kalyanaj mentioned this issue Feb 23, 2023

Trace Context 2022-09-22 > 2022-11-03 w3c/i18n-request#193

Closed

kalyanaj mentioned this issue Mar 28, 2023

CR Request for W3C TraceContext Level 2 Specification w3c/transitions#495

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why should keys be restricted to lowercase ASCII? #508

Why should keys be restricted to lowercase ASCII? #508

aphillips commented Oct 23, 2022

kalyanaj commented Nov 22, 2022

aphillips commented Nov 23, 2022

basti1302 commented Dec 1, 2022

basti1302 commented Dec 1, 2022

aphillips commented Dec 2, 2022 •

edited

Loading

basti1302 commented Dec 6, 2022

aphillips commented Dec 6, 2022

Why should keys be restricted to lowercase ASCII? #508

Why should keys be restricted to lowercase ASCII? #508

Comments

aphillips commented Oct 23, 2022

kalyanaj commented Nov 22, 2022

aphillips commented Nov 23, 2022

basti1302 commented Dec 1, 2022

basti1302 commented Dec 1, 2022

aphillips commented Dec 2, 2022 • edited Loading

basti1302 commented Dec 6, 2022

aphillips commented Dec 6, 2022

aphillips commented Dec 2, 2022 •

edited

Loading