Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ambiguity about canonical N-Triples / N-Quads #66

Closed
pchampin opened this issue Jan 10, 2023 · 11 comments
Closed

ambiguity about canonical N-Triples / N-Quads #66

pchampin opened this issue Jan 10, 2023 · 11 comments

Comments

@pchampin
Copy link
Contributor

the specification of canonical N-Triples is silent about the datatype of xsd:string literals. More specifically :

    "hello world"

and

   "hello world"^^<http://www.w3.org/2001/XMLSchema#string>

are equivalent terms in N-Triples and N-Quads, and the spec does not say which one should be used as the canonical form.

Given that this is lacking from the N-Triples spec, the rd-canon spec should chose one and be explicit about it.

@pchampin pchampin changed the title ambiguity about canonical N-Triples / N-Quads ambiguities about canonical N-Triples / N-Quads Jan 10, 2023
@pchampin pchampin changed the title ambiguities about canonical N-Triples / N-Quads ambiguity about canonical N-Triples / N-Quads Jan 10, 2023
@TallTed
Copy link
Member

TallTed commented Jan 10, 2023

This should also be fed to the rdf-star WG, who can also update the N-Triples and N-Quads specs accordingly.

@gkellogg
Copy link
Member

Other than for Canonicalization, RDF serialization formats are typically restricted to parsing, not serializing; JSON-LD being the main exception.

RDF Concepts discusses this with MAY language:

Please note that concrete syntaxes may support simple literals consisting of only a lexical form without any datatype IRI or language tag. Simple literals are syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string. Similarly, most concrete syntaxes represent language-tagged strings without the datatype IRI because it always equals http://www.w3.org/1999/02/22-rdf-syntax-ns#langString.

Making this a MUST for canonical forms is indeed something that needs to go into the update N-Triples and N-Quads specs in their canonicalization sections. Similarly, rdf:langString MUST NOT be used for a language-tagged literals, although the grammar doesn't support this in any case.

This, and the previous note on the need for Canonicalization in N-Triples should be in cross-referenced issues for those specs, but best wait until after their repositories have been set up, which should happen before too much longer.

@TallTed
Copy link
Member

TallTed commented Jan 19, 2023

[@gkellogg] RDF serialization formats are typically restricted to parsing, not serializing

I'm not at all sure what you mean by that... "serialization formats" are not for "serializing"?

@gkellogg
Copy link
Member

[@gkellogg] RDF serialization formats are typically restricted to parsing, not serializing

I'm not at all sure what you mean by that... "serialization formats" are not for "serializing"?

Does sound like an oxymoron :) But, there are typically no normative statements on how to serialize RDF graphs or datasets, other than for N-Triples canonical form, which has it's own problems, and restricts itself to serializing a single triple, not a graph. The specs describe the syntax and how to parse it, but not how to serialize it. Another exception is JSON-LD, which _does_describe how to serialize datasets to JSON-LD.

@gkellogg
Copy link
Member

See w3c/rdf-n-triples#2 and w3c/rdf-n-quads#2.

@TallTed
Copy link
Member

TallTed commented Jan 25, 2023

there are typically no normative statements on how to serialize RDF graphs or datasets

Well, that seems like a horrendous oversight and, dare I say, a bug in each document with such lack. It's no wonder there are nonstop issues with interop and uptake, slowly growing interest in RDF/LD notwithstanding!

@pchampin
Copy link
Contributor Author

Well, that seems like a horrendous oversight

Well, the implicit contract of any serializer is to serialize your data to whatever parses back to the same data.

But granted, this could be made explicit, probably with a more specific definition of what we consider to be the "same" data (in RDF, this means "isomorphism", because blank nodes... well, you know!).

@gkellogg
Copy link
Member

Well, that seems like a horrendous oversight

I don't think RDF uptake can be laid on the lack of specs to define explicitly how to serialize an RDF Graph/Dataset, nor should it IMHO. At most might be a statement that serialized graph/dataset representations MUST be a valid representation of the associated grammar rules. If you think in terms of computer languages, the abstract RDF syntax is closer to a machine language, with N-Triples and N-Quads like assembly languages, and Turtle/TriG/RDFa/JSON-LD like high level languages targeting that machine language. An argument can be made that there is a normative way to represent the abstract syntax in N-Triples and N-Quads (not withstanding Blank Node identifiers), but not for the others. JSON-LD provides a way to transform a dataset into JSON-LD, but not the way to do so.

Looking elsewhere, SPARQL describes an algebra that is targeted by the syntax. There are systems that will re-serialize the algebra into the SPARQL Grammar, but no normative statements about doing so.

We provide a number of examples for representing data in the various concrete examples, and define how to parse those representations to transform them into the underlying representation. Trying to codify how to re-create that serialization from the underlying representation is certainly outside our charter, and not something we should get into in any case, IMHO.

But granted, this could be made explicit, probably with a more specific definition of what we consider to be the "same" data (in RDF, this means "isomorphism", because blank nodes... well, you know!).

We do define graph/dataset isomorphism, conceivably a statement could be made that an serialization of a graph or dataset, when re-parsed, MUST be isomorphic to that graph or dataset.

@peacekeeper
Copy link
Contributor

Has this been solved by merging #96 ?

@gkellogg
Copy link
Member

gkellogg commented May 3, 2023

Yes, I believe it has.

@peacekeeper
Copy link
Contributor

On the 10 May 2023 call, the WG decided to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants