-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare language tags after normalizing to lower case. #55
Comments
Reference to RDF Semantics: The issue with c45d947 is that it is a required part of term-equals where as earlier it was "MAY" followed by "The value space of language tags is always in lower case." I think this is the only case where two things would be RDF term-equals without them being character-by-character equals (after escape processing). |
I think the specification is quite clear (quote from RDF 1.1 Concepts):
The tags Normalising language tags has the same effect as normalising lexical forms: it changes the graph. The fact that the normalised graph means the same does not imply that they are the same. |
This was discussed at TPAC discussiongkellogg: last issue is about BCP47 case issue; do we want to take this after the break? addison: this one seems easy gkellogg: the problem is: are two triples differing only by the language tag case two separate triples or a single triple? <AndyS> pfps - Issue: w3c/rdf-concepts#9 // PR w3c/rdf-concepts#48 gkellogg: no PR on this, only an issue. addison: BCP47 is clearly made to be case insensitive gkellogg: currently, literal term equality is term sensitive AndyS: what is the approach in XML? addison: from what you described earlier, this is probably one triple AndyS: then we need to decide which noramlization to use <AZ> The fact that the lower case and upper case mean the same does not imply that they are the same tag in the syntax <ora> Thanks Addison! ktl: I think we have what we need from i18n, thank you very much. addison: I will share some reference material My takeaway is that RDF was wrong to interpret |
@gkellogg Ok, but if this change is made, that would be a backward incompatibility change. If a SPARQL query counts the number of literals there are in the data, then in SPARQL 1.1, with |
I agree that we need to consider this seriously. But, the tacit advice in RDF concepts that implementations may normalize to lower case gives us cover. AFAIK, many implementations follow this option (my own does). Needs more discussion. |
@gkellogg -- In your #55 (comment), I think you should wrap the |
After the discussion on This week's call I believe we' agreed to separate this into two issues:
Proposed changes
|
I'll have some text tweaks... but these proposed changes look like the right direction. |
I believe we agreed that to within case-sensitivity parsing two literals that differ only in the language-tag case would result in just a single literal. Consistent formatting is a way of doing; there are other ways (e.g. dictionaries). We have the opportunity to get away from RDF preferring "lower-case" when BCP-47 says something different. |
BTW the BCP47 terminology is "format" (Although in one place later-on about extensions, it slips in "normalize"). |
As for Dataset canonicalization, it only has to add that language tags are lower-cased during canonicalization. Systems exist which today do not lower-case ("EN-gb" becomes "en-GB") and have unique language tags - they are not wrong. |
* Case normalization of language tags. Fixes #55. --------- Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com> Co-authored-by: Andy Seaborne <andy@apache.org>
Mixed in with #48, which has since been removed from that PR, is text to compare language tags after normalizing to lower case. This is consistent with the suggestion that language tags can be converted to lower case when language-tagged strings are introduced, but was never part of RDF 1.0 nor RDF 1.1. It arguably intrudes on D-entailment where
"foo"@en
and"foo"@EN
could be considered to have the same value but still be separate terms.The key commit which reverted the wording is c45d947.
The text was updated successfully, but these errors were encountered: