Clarify scope of document in preface #10

nigelmegitt · 2015-10-20T10:21:56Z

The Preface claims:

"… this specification … defines … the utf-8 encoding."

Isn't that formalised in ISO/IEC 10646:2014 and Unicode? I'm not suggesting that the contents of this document aren't useful, just that they don't define utf-8.

I suggest resolving this issue by:

Removing "(and defines)" from the preface.
Adding a reference to ISO/IEC 10646 to the References section and referring to it in the definition of utf-8.

annevk · 2015-10-20T10:30:48Z

It does actually define utf-8 in section 9 though.

nigelmegitt · 2015-10-20T10:36:31Z

True! If that is an exact equivalent of 10646 then why duplicate it?

annevk · 2015-10-20T10:40:57Z

10646 allows variation in how errors are handled as indicated in a note. The utf-8 encoder is duplicated for completeness sake.

nigelmegitt · 2015-10-20T10:48:58Z

If it's only for completeness and isn't intended to replace the definition then it should be included non-normatively and the normative defining reference should be clearly included. Clarifications or modifications made in this spec only should (obviously) be normative.

If it is intended to replace or modify the definition normatively then that rings big alarm bells. Specifying more precisely error handling does make sense but doesn't change the basic definition.

annevk · 2015-10-20T11:18:11Z

The utf-8 decoder is intended to replace the 10646 definition since we don't want to have the variation in error handling. Given that we already that I don't see much point in not also defining the encoder.

nigelmegitt · 2015-10-20T11:32:17Z

The general problem here is that forking and changing specs creates confusion and incompatibility.

If you're not changing the encoder then it's really helpful to make that clear by including it only informatively (where "helpful" is not a strong enough word). Then the world can know that existing encoders built on 10646 don't need to be rebuilt and are intended to remain compatible.

If I've understood correctly you're not really changing the decoder model either, just restricting and clarifying the error model and including a reference algorithm. In that case the delta relative to 10646 needs to be included too.

annevk · 2015-10-20T11:52:25Z

I would be happy to add more notes or clarify something, PRs welcome if you're in a hurry, but this document has had review by some of the authors of 10646 and they did not consider anything problematic.

I don't think I would want to mark any of the existing algorithms non-normative since having everything related to encodings be self-contained seems extremely useful.

nigelmegitt · 2015-10-20T12:44:25Z

I'm not in a huge hurry, just like to make things neat and tidy.

Marking the encoding algorithm non-normative would be helpful - it doesn't stop it from being self contained for reference within the document. I suppose it would be okay to keep it normative but add a statement along the lines of "This is [identical to|based on] what is specified in 10646".

Ms2ger · 2015-10-20T13:49:58Z

For the intended audience of this document, it actually is the canonical definition of utf-8. Adding notes that it matches other specification doesn't seem like it would hurt, but note that utf-8 is not the only encoding that is also defined elsewhere.

nigelmegitt · 2015-10-20T14:55:55Z

@Ms2ger good point - everywhere there's a 'copy and include' pattern the original (and presumably definitive + already implemented) should be referenced.

If there's a reverse engineering scenario to deal with a closed or non-existent "standard" then it makes sense to define something here.

It looks like the intended audience is anyone creating a new UA, maintaining an existing UA, or defining new protocols and formats. The obvious danger is that existing protocols, formats and content will break - this spec is clearly taking pains to avoid that situation, so it would be worth making the derivations clear and obvious, as well as any deltas.

annevk · 2015-10-20T15:00:11Z

Most existing specifications for the encodings in this document are a mess. That is why this document was created. This document mostly resulted from reverse engineering implementations, coupled with improvements around possible XSS attacks, and end-of-file handling.

I'm happy to add a note for the utf-8 encoder, but I've no intent of putting effort into the other encodings as that mostly seems like busywork. I might accept PRs, though it depends on the specifics.

nigelmegitt · 2015-10-20T15:54:05Z

Okay, that sounds like a reasonable way forward. Thanks!

annevk · 2015-10-20T16:09:22Z

How do you want to appear in the acknowledgments? As nigelmegitt?

nigelmegitt · 2015-10-20T16:22:59Z

[blushes] "Nigel Megitt", please, if it's warranted at all.

annevk closed this as completed in adb5f84 Nov 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify scope of document in preface #10

Clarify scope of document in preface #10

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

Ms2ger commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

Clarify scope of document in preface #10

Clarify scope of document in preface #10

Comments

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

Ms2ger commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015

annevk commented Oct 20, 2015

nigelmegitt commented Oct 20, 2015