Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Standardizing on XSD 1.1 datatypes #93
As someone who works with BC dates with much of our material, there's a significant disconnect between the ISO standards for date encoding and XSD 1.0, specifically with respect to individual software implementations of SPARQL.
ISO 8601 dictates that AD 1 is "0001" and 1 BC is "0000". 2 BC is "-0001". The gYear in XSD 1.0, however, does not use "0000", but rather 1 BC is "-0001". I have been using Apache Fuseki for the last 6-7 years, and it has remained compliant to XSD 1.0, If you attempt to post RDF with the xsd:gYear datatype with a value of "0000", an error will result. I do not know what other SPARQL endpoints do.
Adding to the confusion, XSD 1.1 adheres more closely to ISO 8601, and therefore
I am not sure to what extent XSD 1.1 compliance has been discussed within the SPARQL community. This issue primarily affects BC dates, and so it applies only a small segment of our community.
Making matters more complicated is the fact that the W3C XSD group elected to use the same namespace for XSD 1.1 (https://www.w3.org/TR/xmlschema11-1/#xsd-nss).
However, they did declare a full list of URIs for various schema versions (https://www.w3.org/TR/xmlschema11-1/#nonnormative-language-ids). In this list is a URI for XML Schema Definition Language 1.1: http://www.w3.org/XML/XMLSchema/v1.1
Perhaps the solution is to assume the default namespace of http://www.w3.org/XML/XMLSchema is 1.0 and 1.1 should be explicitly declared with the above URI. If the versions of XSD dates can be differentiated by namespace, it would be possible to perform some basic math upon ingestion so that all dates are 1.1 compliant when conducting SPARQL queries.
But the SPARQL 1.2 community does need to make a decision on fully supporting XSD 1.1 datatypes or not so that software implementations can apply these recommendations consistently across all platforms.
Considerations for backward compatibility
Without a separate namespace to differentiate between datatype versions, per-project/endpoint documentation seems like the only way to ensure backward compatibility.
This is a very good question and I think it was answered in part in section datatypes of RDF1.1 which says RDF1.1 matches XSD 1.1.
Which could mean that we just need to tell everyone implementing RDF1.1 that they need to update their functions to match. Considering RDF 1.1. @afs can you remember if this was discussed at the time in either WG?
Live another day, learn another lesson (and they are mostly painful ;-)
It'd be easy for a sparql endpoint to report this in SPARQL Service Description (SD), we just need to agree a Feature URL for it. But the implementation level of SD is not so high... eg GraphDB doesn't support it.
I don't think a feature flag is sufficient. Mostly because in RDF 1.0 to 1.1 the meaning of the literal "0000"^^xsd:gYear changed in meaning. That means for this community dealing with historical data they have one more problem :( i.e. "-1" xsd 1.0 == "-2" xsd 1.1. this something wider than the SPARQL endpoints. And I rather have us all upgrade to the XSD1.1. semantics including in our functions than to have this weird half/half situation.
That retroactively changes the meaning of "0000"^^xsd:gYear. I think it's worth a little outreach on firstname.lastname@example.org to see who's going to get (more) screwed by this, but I basically support this. This decision was made unconsciously (as best I recall) by the SPARQL 1.1 WG ~7 years ago and now we just have to find the best way to clean up the unintended consequences.
I agree with @ericprud that some outreach would be a good step though first collect all changes for "dependency upgrades" into one place on the wiki and review together.
Would someone like to offer to make that wiki page happen?
It's not a optional flag issue - it's about the data, not the engine. Feed the same data into two engines and get different results.
Maybe other language ecosystems are more up-to-date.
Before defining a whole new set of datatypes and associated functions, we need to think though the implementation impact on engines of all engineering resource levels. If that means writing the arithmetic for much of F&O then that's not a small thing.
(Jena has all of XSD and F&O that apply. This issue would be a point fix ... @ewg118 - send in a patch!)
Is this the set of datatypes we are discussing:
and the derived types of decimal, duration:
I'm unclear xs:normalizedString - it seems to be a base for XML-related derived types. Is it useful otherwise?
Maybe mention https://www.w3.org/TR/xsd-precisionDecimal/