Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
Co-authored-by: Fuqiao Xue <xfq@w3.org>
  • Loading branch information
aphillips and xfq committed Oct 6, 2022
1 parent 280a0e2 commit 0fd5185
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion index.html
Expand Up @@ -2754,7 +2754,7 @@ <h3>Basics</h3>
</ul>
</aside>

<p>The situation with regards to specifying support of non-ASCII characters in <a>resource identifiers</a> is complicated because there are at least three specifications ( URI [[RFC3986]], IRI [[RFC3987]], and [[URL]]) that define resource identifiers and their serialization. The WhatWG [[URL]] specification is an attempt to address this complexity by documenting the actual practice of browsers and other user-agents. The stated goal of the URL specification is to obsolete both RFCs.</p>
<p>The situation with regards to specifying support of non-ASCII characters in <a>resource identifiers</a> is complicated because there are at least three specifications (URI [[RFC3986]], IRI [[RFC3987]], and [[URL]]) that define resource identifiers and their serialization. The WhatWG [[URL]] specification is an attempt to address this complexity by documenting the actual practice of browsers and other user agents. The stated goal of the URL specification is to obsolete both RFCs.</p>

<p>In general, document formats on the Web use resource identifiers that encode non-ASCII characters as plain text, that is, as "IRIs". Protocols such as&mdash;but not limited to&mdash;HTTP [[RFC9110]]) use resource identifiers that encode non-ASCII characters as a sequence of bytes using <a>percent encoding</a>, that is, as "URIs". Because [[RFC3986]] does not specify any particular <a>character encoding</a> for encoding characters to bytes, the <a>percent encoding</a> escapes are prone to misinterpretation. To help combat this, many modern protocols and specifications expect resource identifiers to use the UTF-8 character encoding, exactly as specified by IRI, when encoding characters into the subset of ASCII supported in wire formats and protocols.</p>

Expand Down

0 comments on commit 0fd5185

Please sign in to comment.