qa-html-css-normalization.en: Formatting source

w3c · Feb 16, 2023 · eacec1b · eacec1b
1 parent d8f88d1
commit eacec1b
Showing 1 changed file with 13 additions and 0 deletions.
diff --git a/questions/qa-html-css-normalization.en.html b/questions/qa-html-css-normalization.en.html
@@ -79,13 +79,16 @@ <h2 class="notoc">Quick check</h2>
 
 
 <p>Unicode normalization is something you need to be aware of if you are authoring HTML pages with CSS style sheets in UTF-8 (or any other Unicode encoding), particularly if you are dealing with text in a  script that uses accents or other diacritics.</p>
+
 <p>This page addresses the question:<br>
 <span class="question">What is Unicode Normalization, and why do I need to know about it when creating HTML and CSS content?</span></p>
 
 
 
 
 
+
+
 <section id="n11nwhat">
 <h2>What is Unicode normalization?</h2>
 
@@ -105,9 +108,13 @@ <h2>What is Unicode normalization?</h2>
 
 
 
+
+
 <section id="n11nhow">
 <h2>What do I need to know about normalization?</h2>
 
+
+
 <section id="choosing">
 <h3>Choosing a normalization form</h3>
 
@@ -144,7 +151,9 @@ <h3>Choosing a normalization form</h3>
 <h3>Converting the normalization form of a page</h3>
 
 <p style="">You should also try to avoid automatically converting content from one normalization form to another, as it may obliterate some important code point distinctions, such as in the carefully crafted examples of <span lang="hu" class="qterm">világ</span> above, or in filenames or URLs, or text included in the page from elsewhere, etc.</p>
+
 <p style="">It may also introduce a security risk, especially in code syntax. For example, the following code points are canonically equivalent: <span class="codepoint" translate="no"><bdi lang="en">&#x003E;&#x0338;</bdi> [<span class="uname">U+003E GREATER-THAN SIGN</span> + <span class="uname">U+0338 COMBINING LONG SOLIDUS OVERLAY</span>]</span> and <span class="codepoint" translate="no"><bdi lang="en">&#x226F;</bdi> [<span class="uname">U+226F NOT GREATER-THAN</span>]</span>. Therefore source code in XML such as <code>&lt;character&gt;&#x0338;&lt;/character&gt;</code> can be corrupted by normalising to NFC.</p>
+
 <p style="">Sometimes people choose to use compatibility characters in their content, most likely without realising what they are. Examples might include <span class="codepoint" translate="no"><bdi lang="en">&#x00BC;</bdi> [<span class="uname">U+00BC VULGAR FRACTION ONE QUARTER</span>]</span>, <span class="codepoint" translate="no"><bdi lang="en">&#x00B2;</bdi> [<span class="uname">U+00B2 SUPERSCRIPT TWO</span>]</span> (eg. for m²), and <span class="codepoint" translate="no"><bdi lang="en">&#x2116;</bdi> [<span class="uname">U+2116 NUMERO SIGN</span>]</span>. Blind normalization of that content would change those characters to the ASCII code points 1⁄4, 2, and No, respectively. In some cases this may affect the look of the text; in others it may affect the readability.</p>
 </section>
 
@@ -173,6 +182,10 @@ <h3>How can I check pages for problems?</h3>
 
 <p style="">You can find out whether an HTML page contains  class names and id values that are not normalized according to NFC by using the <a class="print" href="http://validator.w3.org/i18n-checker/">W3C Internationalization Checker</a>. (Look for the row <samp>Markup / Non-NFC class or id names</samp>.)</p>
 </section>
+</section>
+
+
+