Skip to content

Commit

Permalink
[selectors-4] Make empty language strings match untagged elements. #6915
Browse files Browse the repository at this point in the history
  • Loading branch information
fantasai committed Nov 7, 2022
1 parent d42a578 commit 9b51686
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions selectors-4/Overview.bs
Expand Up @@ -1915,10 +1915,17 @@ The Language Pseudo-class: '':lang()''</h3>
when represented in BCP 47 syntax [[BCP47]],
it matches that <a>language range</a> in an <var>extended filtering</var>
operation per [[RFC4647]] <cite>Matching of Language Tags</cite> (section 3.3.2).
For this purpose, a wildcard [=language range=] (<code>"*"</code>) does not match
elements whose language is not tagged (e.g. <code>lang=""</code>),
but does match elements whose language is tagged as undetermined (<code>lang=und</code>).
The matching is performed [=ASCII case-insensitively=].
The <a>language range</a> does not need to be a valid language code to

This comment has been minimized.

Copy link
@aphillips

aphillips Nov 18, 2022

Contributor

Nit: BCP47 tends to prefer the term tag to the term code.

Note that "valid" has special meaning in BCP47 and there is also another type of validation called "well-formed". In this case, I think you are trying to say that the language range does not need to be valid or well-formed (i.e. it can be a garbage string and it won't match anything).

Best practices for this are found here in our specdev document.

perform this comparison.

A [=language range=] consisting of an empty string
('':lang("")'')
matches (only) elements whose language is not tagged.

Note: It is recommended that documents and protocols
indicate language using codes from [[BCP47]] or its successor,

This comment has been minimized.

Copy link
@aphillips

aphillips Nov 18, 2022

Contributor

or its successor

Note that this is redundant. The reason "BCP47" is preferred as a reference to RFC5646 + RFC4647 is that it is evergreen. If/when a successor were published, it would automatically be slotted as "BCP47".

The note overall is kind of messy. Note that xml:lang uses BCP47 too. I'd propose:

Note: It is recommended that documents and protocols indicate language using valid BCP47 language tags. In the case of XML-based formats, these tags can employ the attribute xml:lang.

(Note that XML 1.0 5e explicitly defines xml:lang to use BCP47; see section 2.12)

and in the case of XML-based formats, by means of <code>xml:lang</code> attributes. [[XML10]]
Expand Down

1 comment on commit 9b51686

@aphillips
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for an ex post facto set of comments: I didn't see this until I checked up on @frivoal's action item for I18N...

Please sign in to comment.