Skip to content

Commit

Permalink
Merge pull request #61 from aphillips/gh-pages
Browse files Browse the repository at this point in the history
Adding best practices for defining identifiers [I18N-ACTION-1128]
  • Loading branch information
aphillips committed Mar 31, 2022
2 parents 903f6c7 + 56896dc commit 7566751
Showing 1 changed file with 72 additions and 4 deletions.
76 changes: 72 additions & 4 deletions index.html
Expand Up @@ -2194,15 +2194,83 @@ <h3>Defining elements and attributes</h3>
<section id="markup_identifiers" class="subtopic">
<h3>Defining identifiers</h3>

<p>A common feature of document formats is the definition of various identifiers. This includes reserved keywords as well as user-defined values. To foster interoperability, implementations need to be able to match identifier values reliably and consistently. For a detailed look at this problem, see <cite>Character Model: String Matching</cite> [[CHARMOD-NORM]].</p>

<div class="req" id="identifier_content_internal_id">
<p class="advisement">Specifications that define <a>application internal identifiers</a> (which are never shown to users and are always used for matching or processing within an application or protocol) should limit the content to a printable subset of ASCII. ASCII case-insensitive matching is recommended.</p>
<details class="links"><summary>explanations &amp; examples</summary>
<p><a href="https://www.w3.org/TR/charmod-norm/#specifying-content-restrictions">Specifying Content Restrictions</a> in [[CHARMOD-NORM]]</p>
</details>
</div>

<p>Sometimes specifications need to define a set of identifiers that content authors interact with or which are meaningful to various types of end-users. Restricting the set of allowable characters to ASCII impedes usability, particularly for speakers of languages that do not use the Latin script or that use characters outside of the ASCII range.</p>

<div class="req" id="identifier_content_visible">
<p class="advisement">When identifiers are visible or potentially visible to users, specifications should allow the use of non-ASCII Unicode characters, in order to ensure that users in all languages can use the resulting document format or protocol with equal access. Case sensitivity (i.e. no case folding) is recommended.</p>
<details class="links"><summary>explanations &amp; examples</summary>
<p><a href="https://www.w3.org/TR/charmod-norm/#specifying-content-restrictions">Specifying Content Restrictions</a> in [[CHARMOD-NORM]]</p>
</details>
</div>

<div class="req" id="identifier_non_ascii_namespace">
<p class="advisement">If <a>application internal identifiers</a> are not restricted to ASCII, specifications should define the characters that are allowed to start and be part of a valid identifier.</p>
<details class="links"><summary>explanations &amp; examples</summary>
<p><a href="https://unicode.org/reports/tr31/">Unicode Identifier and Pattern Syntax</a> [[UAX31]]</p>
<p><a href="http://es5.github.io/x7.html#x7.6">Example</a>: ECMAScript 5, section 7.6 <em>Identifier Names and Identifiers</em></p>
</details>
</div>

<p>One key issue when defining an identifier namespace or set of identifiers in a new specification is the handling of combining marks and certain other characters (such as joiners or bidi controls) when parsing the document format: special focus needs to be paid to how the identifier can be "tokenized" (separated from the surrounding text). One means of doing this is to restrict the range of characters allowed to <em>start</em> an identifier to ensure that normal text processing doesn't interfere with matching the identifier later.</p>

<p><a href="https://unicode.org/reports/tr31/"><cite>Unicode Identifier and Pattern Syntax</cite></a> [[UAX31]] provides one model, used notably in programming languages such as Java or <a href="http://es5.github.io/x7.html#x7.6">JavaScript</a>. HTML and CSS also provide <a href="https://html.spec.whatwg.org/multipage/custom-elements.html#valid-custom-element-name">character range definitions</a> for custom identifiers, such as this <a href="https://www.w3.org/TR/xml/#sec-notation">EBNF</a> [[XML]] production:</p>

<pre>
PCENChar ::=
"-" | "." | [0-9] | "_" | [a-zA-Z] | #xB7 | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x37D] |
[#x37F-#x1FFF] | [#x200C-#x200D] | [#x203F-#x2040] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
[#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
</pre>

<div class="note">
<p>HTML and CSS processing is defined such that Unicode character properties (such as whether a given character is a combining mark) are not considered when parsing identifiers and tokens. This allows identifiers to start with a combining character and still be processed reliably, but a plain text editor might not handle the value identically.</p>
</div>

<p>Specifications should exercise care when defining identifiers with regards to the handling of whitespace. Note that there are Unicode horizontal whitespace characters other than the ASCII characters <code>U+0020 SPACE</code> and <code>U+0009 TAB</code>.</p>

<div class="req" id="identifier_content_surrogates">
<p class="advisement">Specifications should not allow surrogate <a>code points</a> (<code>U+D800</code> to <code>U+DFFF</code>) or non-character code points in identifiers.</p>
<details class="links"><summary>explanations &amp; examples</summary>
<p><a href="https://www.w3.org/TR/charmod-norm/#specifying-content-restrictions">Specifying Content Restrictions</a> in [[CHARMOD-NORM]]</p>
</details>
</div>

<div class="req" id="identifier_content_controls">
<p class="advisement">Specifications should not allow the <kbd>C0</kbd> (<code>U+0000</code> to <code>U+001F</code>) and <kbd>C1</kbd> (<code>U+0080</code> to <code>U+009F</code>) control characters in identifiers.</p>
<details class="links"><summary>explanations &amp; examples</summary>
<p><a href="https://www.w3.org/TR/charmod-norm/#specifying-content-restrictions">Specifying Content Restrictions</a> in [[CHARMOD-NORM]]</p>
</details>
</div>

<div class="req" id="identifier_case">
<p class="advisement">Identifiers should be case-sensitive.</p>
<p class="advisement">Identifiers should be case-sensitive when non-ASCII characters are allowed and case <strong>insensitive</strong> when only ASCII characters are allowed.</p>
<details class="links"><summary>explanations &amp; examples</summary>
<p><a href="https://www.w3.org/TR/charmod-norm/#specifying-content-restrictions">Specifying Content Restrictions</a> in [[CHARMOD-NORM]]</p>
</details>
</div>


<div class="req" id="identifier_content_display">
<p class="advisement"><a>Application internal identifier</a> fields or values must be wrapped with a localizable display value when displayed to end-users.</p>
<details class="links"><summary>explanations &amp; examples</summary>
<p><a href="https://www.w3.org/TR/charmod-norm/#specifying-content-restrictions">Specifying Content Restrictions</a> in [[CHARMOD-NORM]]</p>
</details>
</div>



</section>





<section id="markup_plaintext" class="subtopic">
<h3>Working with plain text</h3>

Expand Down

0 comments on commit 7566751

Please sign in to comment.