Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to format decimals? #58

Open
wschella opened this issue May 16, 2019 · 7 comments
Open

How to format decimals? #58

wschella opened this issue May 16, 2019 · 7 comments
Labels

Comments

@wschella
Copy link

wschella commented May 16, 2019

In regards to the formatting of decimals, how should spec-compliant engines handle trailing zero's in decimals? I find that the expected results in this test suite are inconsistent, i.e. sometimes they expect a trailing zero, sometimes they do not.

Some examples that have no trailing zero:

Some examples that do have a trailing zero:

Should this test suite be consistent in it's formatting? Either consistently have a trailing zero (in conformance with the canonical representation, or consistently remove it.
Or should we not compare the outputs by string equality?

cc: @rubensworks

@rubensworks
Copy link
Member

Following the RDF spec, literals should definitely be strictly equal:

Literal term equality: Two literals are term-equal (the same RDF literal) if and only if the two lexical forms, the two datatype IRIs, and the two language tags (if any) compare equal, character by character. Thus, two literals can have the same value without being the same RDF term.

I think we could go two ways:

  1. It should be defined somewhere (probably SPARQL 1.2) that functions should produce literals in canonical form.
  2. Mention in the test suite that literals in results should be compared by first converting to their canonical form, and then comparing character by character.

1 is IMO the cleanest solution, but it requires a spec update (unless I'm missing something). 2 is probably the most practical one.

@gkellogg
Copy link
Member

According to the SPARQL 1.1 Grammar trailing digits in number representations are preserved, and so 1.0 and 1.00 would be considered different terms. IIRC, if the tests represent literals differently, it is intentional.

@rubensworks
Copy link
Member

According to the SPARQL 1.1 Grammar trailing digits in number representations are preserved, and so 1.0 and 1.00 would be considered different terms.

Indeed, if terms originate from the underlying data source, then the terms must be returned as-is. However, this issue is about the response of functions, for which there seems to be no guideline on how decimals should be formatted.

IIRC, if the tests represent literals differently, it is intentional.

It is indeed possible that this was intentional, but it is unclear to me what the reason for this was.

For example in the spec tests, SECONDS("2010-06-21T11:28:01Z"^^xsd:dateTime) produces "1"^^http://www.w3.org/2001/XMLSchema#decimal, while 0/2 produces "0.0"^^http://www.w3.org/2001/XMLSchema#decimal.

However, the SPARQL spec does not seem to make any indications regarding the required format of these decimals, so I would expect the spec tests to be at least consistent in this regard. Perhaps there is some other reason for this inconsistency?

@rubensworks
Copy link
Member

Pinging @cygri to confirm that this indeed something underspecified in SPARQL 1.1, and whether or not this should be added to the errata.

@afs
Copy link
Contributor

afs commented May 17, 2019

Functions return values so the way to compare is by value, not by term. (2 is one way to do that but the principle to state is that it is value-equality).

XSD changed the canonical form of decimals between 1.0 to 1.1 to require the decimal point. It was "1"^^xsd:decimal, and became "1.0"^^xsd:decimal.

The SECONDS case is different because of context of use. There is a reasonable expectation that the term format is like xsd:dataTime - fixed two digits, then (concat (str (hours ?dt)) ":" (str (minutes ?dt)) ":" (str (seconds ?dt))) generates a legal lexicial form.

Or what about xsd:decimal("0") or STRDT("0", xsd:decimal) ? Canonical or did the application writer write it like that because they wanted exactly that lexicial form? Both are reasonable.

xsd:precisionDecimal preserves trailing zeros in the cannical form.

I don't think being totally prescriptive about term representation is a good idea. It is the value that matters.

@cygri
Copy link

cygri commented May 18, 2019

If the SPARQL spec doesn’t say that a particular lexical form is generated, then any lexical form that has the correct value should be considered correct.

So, as Andy said, the test suite documentation should state that literals are compared by value. Canonicalising actual and expected value before comparing them in the test runner is one way to achieve comparison by value.

(For strDT, the language of the spec demands that a literal with exactly the given lexical form is generated, so arguably substituting a different form of the same value would be incorrect there. For the cast/constructor functions, the spec makes no such demand.)

@ericprud
Copy link
Member

If the SPARQL spec doesn’t say that a particular lexical form is generated, then any lexical form that has the correct value should be considered correct.

But that leaves STR() of that value (needlessly?) underspecified. Also, those STRs might be used to construct important things like identifiers.

I think we'd like predictable behavior for at least integers. I.e. we'd like IRI(CONCAT("http://a.example/id=", STR(1123-1000))) to return <http://a.example/id=123> and not <http://a.example/id=0123> or <http://a.example/id=+123>.

Decimals are used less frequently to construct terms but if we say that integer constructors MUST return canonical forms, we may as well do the same for all supported XSD datatypes with a canonical form: e.g. IRI(CONCAT("http://a.example/id=", STR(1123.0-1000))) produces exactly <http://a.example/id=123.0> and not something with arbitrary leading and trailing zeros.

There may be some cases where it's hard to tell if something is being constructed vs. a lexical form is being cast, but hopefully they'll surface in this issue.

iirc, the SPARQL 1.0 tests assumed canonical forms, which means any vestiges of them should be updated to reflect the mildly spec-breaking 1 -> 1.0 change that @afs mentioned above.

@gkellogg gkellogg added the SPARQL label Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants