Skip to content

Commit

Permalink
Mention utf-8 in Preface and clarify things a bit
Browse files Browse the repository at this point in the history
  • Loading branch information
annevk committed Sep 1, 2014
1 parent 31b5075 commit d6eaebe
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 26 deletions.
33 changes: 20 additions & 13 deletions Overview.html
Expand Up @@ -141,20 +141,27 @@ <h2 class="no-num no-toc" id="table-of-contents">Table of Contents</h2>

<h2 id="preface"><span class="secno">1 </span>Preface</h2>

<p>While encodings have been defined to some extent, implementations have
not always implemented them in the same way, have not always used the same
labels, and often differ in dealing with undefined and former proprietary
areas of encodings. This specification attempts to fill those gaps so that
new implementations do not have to reverse engineer encoding implementations
of the market leaders and existing implementations can converge.

<p>In particular, this specification defines the encodings, their algorithms to go from
bytes to code points and back, and their canonical names and identifying labels. This
specification also defines an API to expose part of the encoding algorithms to JavaScript.

<p>Historically encodings and their specifications (if any) were kept track of by the
<p>Unicode is the universal alphabet and utf-8 is its encoding. This specification turns
that into a requirement for new protocols and formats, as well as existing formats
deployed in new contexts.

<p>There are other (legacy) encodings and while they have been defined to some extent,
implementations have not always implemented them in the same way, have not always used the
same labels, and often differ in dealing with undefined and former proprietary areas of
encodings. This specification attempts to fill those gaps so that new implementations do
not have to reverse engineer encoding implementations of the market leaders and existing
implementations can converge.

<p>In particular, this specification defines all the encodings, their algorithms to go
from bytes to scalar values and back, and their canonical names and identifying labels.
This specification also defines an API to expose part of the encoding algorithms to
JavaScript.

<p>Implementations have significantly deviated from the labels listed in the
<a href="http://www.iana.org/assignments/character-sets/character-sets.xhtml">IANA Character Sets registry</a>.
This specification renders that registry obsolete.
Combined with the desire to stop legacy encodings from spreading further, this
specification is exhaustive about the aforementioned details and thereby renders the
registry irrelevant.



Expand Down
33 changes: 20 additions & 13 deletions Overview.src.html
Expand Up @@ -57,20 +57,27 @@ <h2 class="no-num no-toc">Table of Contents</h2>

<h2>Preface</h2>

<p>While encodings have been defined to some extent, implementations have
not always implemented them in the same way, have not always used the same
labels, and often differ in dealing with undefined and former proprietary
areas of encodings. This specification attempts to fill those gaps so that
new implementations do not have to reverse engineer encoding implementations
of the market leaders and existing implementations can converge.

<p>In particular, this specification defines the encodings, their algorithms to go from
bytes to code points and back, and their canonical names and identifying labels. This
specification also defines an API to expose part of the encoding algorithms to JavaScript.

<p>Historically encodings and their specifications (if any) were kept track of by the
<p>Unicode is the universal alphabet and utf-8 is its encoding. This specification turns
that into a requirement for new protocols and formats, as well as existing formats
deployed in new contexts.

<p>There are other (legacy) encodings and while they have been defined to some extent,
implementations have not always implemented them in the same way, have not always used the
same labels, and often differ in dealing with undefined and former proprietary areas of
encodings. This specification attempts to fill those gaps so that new implementations do
not have to reverse engineer encoding implementations of the market leaders and existing
implementations can converge.

<p>In particular, this specification defines all the encodings, their algorithms to go
from bytes to scalar values and back, and their canonical names and identifying labels.
This specification also defines an API to expose part of the encoding algorithms to
JavaScript.

<p>Implementations have significantly deviated from the labels listed in the
<a href="http://www.iana.org/assignments/character-sets/character-sets.xhtml">IANA Character Sets registry</a>.
This specification renders that registry obsolete.
Combined with the desire to stop legacy encodings from spreading further, this
specification is exhaustive about the aforementioned details and thereby renders the
registry irrelevant.



Expand Down

0 comments on commit d6eaebe

Please sign in to comment.