Skip to content

Commit

Permalink
qa-html-css-normalization.en: Formatting source
Browse files Browse the repository at this point in the history
  • Loading branch information
r12a committed Feb 16, 2023
1 parent d8f88d1 commit eacec1b
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions questions/qa-html-css-normalization.en.html
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,16 @@ <h2 class="notoc">Quick check</h2>


<p>Unicode normalization is something you need to be aware of if you are authoring HTML pages with CSS style sheets in UTF-8 (or any other Unicode encoding), particularly if you are dealing with text in a script that uses accents or other diacritics.</p>

<p>This page addresses the question:<br>
<span class="question">What is Unicode Normalization, and why do I need to know about it when creating HTML and CSS content?</span></p>







<section id="n11nwhat">
<h2>What is Unicode normalization?</h2>

Expand All @@ -105,9 +108,13 @@ <h2>What is Unicode normalization?</h2>





<section id="n11nhow">
<h2>What do I need to know about normalization?</h2>



<section id="choosing">
<h3>Choosing a normalization form</h3>

Expand Down Expand Up @@ -144,7 +151,9 @@ <h3>Choosing a normalization form</h3>
<h3>Converting the normalization form of a page</h3>

<p style="">You should also try to avoid automatically converting content from one normalization form to another, as it may obliterate some important code point distinctions, such as in the carefully crafted examples of <span lang="hu" class="qterm">világ</span> above, or in filenames or URLs, or text included in the page from elsewhere, etc.</p>

<p style="">It may also introduce a security risk, especially in code syntax. For example, the following code points are canonically equivalent: <span class="codepoint" translate="no"><bdi lang="en">&#x003E;&#x0338;</bdi> [<span class="uname">U+003E GREATER-THAN SIGN</span> + <span class="uname">U+0338 COMBINING LONG SOLIDUS OVERLAY</span>]</span> and <span class="codepoint" translate="no"><bdi lang="en">&#x226F;</bdi> [<span class="uname">U+226F NOT GREATER-THAN</span>]</span>. Therefore source code in XML such as <code>&lt;character&gt;&#x0338;&lt;/character&gt;</code> can be corrupted by normalising to NFC.</p>

<p style="">Sometimes people choose to use compatibility characters in their content, most likely without realising what they are. Examples might include <span class="codepoint" translate="no"><bdi lang="en">&#x00BC;</bdi> [<span class="uname">U+00BC VULGAR FRACTION ONE QUARTER</span>]</span>, <span class="codepoint" translate="no"><bdi lang="en">&#x00B2;</bdi> [<span class="uname">U+00B2 SUPERSCRIPT TWO</span>]</span> (eg. for m²), and <span class="codepoint" translate="no"><bdi lang="en">&#x2116;</bdi> [<span class="uname">U+2116 NUMERO SIGN</span>]</span>. Blind normalization of that content would change those characters to the ASCII code points 1⁄4, 2, and No, respectively. In some cases this may affect the look of the text; in others it may affect the readability.</p>
</section>

Expand Down Expand Up @@ -173,6 +182,10 @@ <h3>How can I check pages for problems?</h3>

<p style="">You can find out whether an HTML page contains class names and id values that are not normalized according to NFC by using the <a class="print" href="http://validator.w3.org/i18n-checker/">W3C Internationalization Checker</a>. (Look for the row <samp>Markup / Non-NFC class or id names</samp>.)</p>
</section>
</section>






Expand Down

0 comments on commit eacec1b

Please sign in to comment.