Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Annotation syntax in Turtle* and SPARQL* #9

Closed
pchampin opened this issue Oct 13, 2020 · 30 comments
Closed

Include Annotation syntax in Turtle* and SPARQL* #9

pchampin opened this issue Oct 13, 2020 · 30 comments
Labels
concrete-syntax About Turtle-star and other concrete syntaxes sparql-star About SPARQL-star

Comments

@pchampin
Copy link
Collaborator

pchampin commented Oct 13, 2020

This has already been discussed on the mailing list.

The idea would be to have a notation like

:bob :age 42 {| :source <http://example.org/~bob/> |}.

as shortcut for

:bob :age 42.
<< :bob :age 42 >> :source <http://example.org/~bob/>.
@hartig
Copy link
Collaborator

hartig commented Oct 13, 2020

Just for the record at this point: Related to the extension of the Turtle* format as discussed in this issue, the RDF* data model itself (i.e., the "abstract syntax") may also be extended along the same lines. In an email on the mailing list I have outlined how such an extension can be defined.

@pchampin
Copy link
Collaborator Author

@hartig I don't see any need to extend the current abstract syntax. As expressed by the examples above, my understanding of the Annotation syntax is that it is purely syntactic sugar, to avoid repeating the annotated triples.

Or do you consider that the first snippet (using {| ... |}) is saying something different or something more than the second snippet (using << ... >>)?

@hartig
Copy link
Collaborator

hartig commented Oct 14, 2020

Or do you consider that the first snippet (using {| ... |}) is saying something different or something more than the second snippet (using << ... >>)?

No, absolutely not.

Here is the reason for me considering a related extension of the RDF* data model: Once you have added such an annotation syntax to Turtle*, the natural next step is to extend the (user-facing) syntax of SPARQL* in the same way. That is, to allow for queries to look as follows:

SELECT * WHERE {
   :bob :age ?age {| :source ?src |}.
}

Then, of course, such expressions may also be considered purely as syntactic sugar for the following:

SELECT * WHERE {
   :bob :age ?age .
   << :bob :age ?age >> :source ?src .
}

However, I am thinking now that it might be useful to define the query semantics for the first query pattern directly rather than having an implicit definition that relies on some syntactic rewriting. I mean, assume there are systems that employ physical data structures to directly support the first query pattern; such systems are not going to rewrite the pattern into the (equivalent) second version, and they are not going to rewrite the Turtle* annotation syntax (using {| ... |}) into Turtle* that uses only << ... >>. So, based on this thinking, a necessary first step for defining the query semantics directly for the first pattern, is to extend the RDF* data model.

@pchampin
Copy link
Collaborator Author

Ok, thanks for the explanation, I see more clearly your motivation now.
But I'm still not convinced...

As we agree that both pieces of data are essentially equivalent, then both queries above should give the same result, regardless of the concrete syntax used to feed the database. With your proposal, it seems to me that we would have to describe how to handle 4 different cases: {data-annotation, data-raw}×{query-annotation, query-raw}.

Some implementers might want to go through that trouble for optimisation purposes, but on the specification side, I really think we should stay minimal and simple.

@pchampin pchampin added sparql-star About SPARQL-star concrete-syntax About Turtle-star and other concrete syntaxes labels Nov 10, 2020
@lmedini
Copy link

lmedini commented Nov 10, 2020

Just a suggestion:

Wouldn't a syntax like

:bob :age 42 @{ :source <http://example.org/~bob/> }.

look more human-readable, since referring to annotations in other languages?

@klinovp
Copy link

klinovp commented Nov 19, 2020

I am not entirely sure how a system which supports the PG mode only should then load << :bob :age 42 >> :source <http://example.org/~bob/>. The problem is that it doesn't know if :bob :age 42 comes somewhere in the input data stream or not. Should it refuse to parse it and only accept :bob :age 42 {| :source <http://example.org/~bob/> |}? Should it wait and see if there's a matching embedded triple (that won't scale)?

Or is the SA semantics going to be a MUST for all RDF* implementations (that'd moot my question)?

@hartig
Copy link
Collaborator

hartig commented Nov 19, 2020

I wouldn't want to make SA a must. Instead, exactly as you suggest, I would propose that the parser of a PG-mode-only system rejects expressions of the form

<< :bob :age 42 >> :source <http://example.org/~bob/>

and only accepts expressions of the form

:bob :age 42 {| :source <http://example.org/~bob/> |}

@klinovp
Copy link

klinovp commented Nov 19, 2020

Right, that would make our life easier (as a PG-only system) but I'm a bit skeptical that an SA system would be happy to export the data using the {|..|} syntax. As an engineer, I'd think if you have a << :bob :age 42 >> :source <http://example.org/~bob/> triple in the data, you dump it to the output and that's it. You don't typically want to check if :bob :age 42 is asserted or not.

Put simply, interoperability between SA and PG systems could be tricky even for datasets on which their behaviour should coincide, i.e. where all embedded triples are asserted.

@hartig
Copy link
Collaborator

hartig commented Nov 19, 2020

I would say that it depends on the data structures that the system uses internally. For instance, if the triple (:bob, :age, 42) is asserted, then an SA-only system probably has this triple separately in its indexes (in addition to the nested triples that contain that triple as an embedded triple).

@gkellogg
Copy link
Member

I wouldn't want to make SA a must. Instead, exactly as you suggest, I would propose that the parser of a PG-mode-only system rejects expressions of the form

<< :bob :age 42 >> :source <http://example.org/~bob/>

and only accepts expressions of the form

:bob :age 42 {| :source <http://example.org/~bob/> |}

I’d be wary about describing behavior based on surface syntax, and understand this based on the resulting abstract syntax. What if a client retrieves a, N-Triples* format? What if the input is streamed? Do you need to have first parsed a base triple before an annotation is valid?

And, having some systems support PG and others SA is a road to incompatibility. I’d say, if a client parses input which has an SA assertion, and there is no matching base triple, it would be a validity constraint violation. Similarly, if the semantics of an annotation inferred that a triple not exist, that would be a validity constraint violation if operating under such a regime.

@pchampin
Copy link
Collaborator Author

pchampin commented Dec 2, 2020

@lmedini

Just a suggestion:

Wouldn't a syntax like

:bob :age 42 @{ :source <http://example.org/~bob/> }.

look more human-readable, since referring to annotations in other languages?

I like the idea of reusing @. I would go even further to suggest:

:bob :age 42 @[ :source <http://example.org/~bob/> ].

Curly brackets, in SPARQL or TriG, have a history of containing full triples. Square brackets, on the other hand, contain predicate-object lists.

@afs
Copy link
Collaborator

afs commented Dec 6, 2020

Noting that the annotation block can be multiline and quite big, having paired delimiters makes the visual pairing easier and parsers can have better error messages.

@afs
Copy link
Collaborator

afs commented Dec 6, 2020

% is better than @ - they could both be used for "introducting" syntax for a following block of some kind and % isn't as used as @.

[ ] indicate are blank nodes. [ ] as bnodes do nest, and there may be blank nodes in the annotation so we have overloaded ].

At least in Turtle, { } aren't used and in TriG they don't nest.

Other character choices like ! ! don't capture the start-finish so well. There is no perfect choice. {| |} was used in the original discussions.

Some other characters are in SPARQL and while positional (property paths) it is visually confusing to overload.

https://github.com/w3c/rdf-star/blob/main/tests/turtle/syntax/turtle-star-annotation-2.ttl

:s :p :o {| :source [ :graph <http://host1/> ;
                      :date "2020-01-20"^^xsd:date
                    ] ;
            :source [ :graph <http://host2/> ;
                      :date "2020-12-31"^^xsd:date
                    ]
          |} .

{| would be a token (AKA terminal) - it can not have whitespace separating the two characters.

The point here is to use the blank nodes to have separate groups relating to one triple, carrying on the "use modelling" style for some use cases and keeping the groups apart.

@gkellogg
Copy link
Member

gkellogg commented Dec 7, 2020

It may be obvious, but these are the changes I made to the Turtle* EBNF:

[12] object       ::= (iri | BlankNode | collection | blankNodePropertyList | literal | embTriple)
                      annotation?
[30] annotation   ::= '{|' predicateObjectList '|}'

Note that this allows empty annotations, and potentially annotating members of a collection, although I do not support this in my parser. It also doesn't allow annotating an embedded triple (which uses embObject instead of object.

Also, it could potentially allow recursive annotations within an annotation:

:a :b :c {| :d :e {| :f :g |} |} .

Certainly, these considerations are subject to discussion.

Also, IMO, given that {| |} syntax only works on asserted triples, there doesn't seem like a good motivation to have << >> also assert the embedded triple, so only the Separate Assertions mode of parsing << >> seems necessary in order to support both modes, at least via Turtle*.

@afs
Copy link
Collaborator

afs commented Dec 8, 2020

predicateObjectList does not allow zero predicate-objects.

[7]  predicateObjectList  ::=  verb objectList (';' (verb objectList)?)*

https://www.w3.org/TR/turtle/#grammar-production-predicateObjectList

@afs
Copy link
Collaborator

afs commented Dec 8, 2020

My reading (not check by computer... yet!):

predicateObjectList goes to object so annotation? applies and nested annotations are legal.

But - and this is the only other use of object -

[15]  collection  ::=  '(' object* ')'

To not have annotation syntax there, have objectC (same as current object, with no annotation).

@gkellogg
Copy link
Member

gkellogg commented Dec 8, 2020

To not have annotation syntax there, have objectC (same as current object, with no annotation).

We could do that, but it's not strictly necessary for the grammar to limit the usage inherently, it could be done with prose. Adding the grammar rule is simple enough, but hell is paved with a series of "simple" changes.

@afs
Copy link
Collaborator

afs commented Dec 8, 2020

prose is a last resort!

@afs
Copy link
Collaborator

afs commented Dec 8, 2020

(not check by computer... yet!):

The computer says "yes" (including no annotation in RDF collections, which is not too disruptive).

I've experimented with the syntax in both a hand crafted parser (faster) and one that is JavaCC that follows the spec text, with already <<>> added. Annotation {| |} work out as described above.

Turtle* is not a redesign of Turtle so what would be bad is non-local changes: having to rewrite a significant proportion of the turtle just for some RDF* feature or to localize to certain cases only.

So far - that is not looking likely with the current general proposal for annotation.

@pchampin
Copy link
Collaborator Author

pchampin commented Dec 9, 2020

FWIW I made a new PR #58 with preview enabled -- this makes it easier to discuss on a concrete proposal:

https://pr-preview.s3.amazonaws.com/w3c/rdf-star/pull/58.html

@afs
Copy link
Collaborator

afs commented Dec 10, 2020

@gkellogg -- I now built a Turtle parser that exactly follows the Turtle and RDF* grammars so as to check details.

A different way to handle "annotation" and have it not appear in a collection is to put it in the "objectList" production rather than "object". Then "collection" is unchanged.

[8]  objectList  ::=  object annotation? (',' object annotation? )*

This is contains the RDF* changes a little better compared to having than a new "listElt" (was "objectC") to duplicate the plain Turtle object rule (object without annotation).

Having the annotation like this made it a little easier to generate the triples (it's a steaming parser emitting triples as the parse runs); the object rule returns an RDF Term which is passed, with subject and predicate from the input to objectList, to the annotation production.

@pchampin
Copy link
Collaborator Author

@afs good idea. Done in c435fa6

@gkellogg
Copy link
Member

Looks goo, Andy; I’ll update my parser as well.

Do we have negative tests for, the object annotation in collection case?

@josd
Copy link
Contributor

josd commented Dec 10, 2020

@gkellogg -- I now built a Turtle parser that exactly follows the Turtle and RDF* grammars so as to check details.

A different way to handle "annotation" and have it not appear in a collection is to put it in the "objectList" production rather than "object". Then "collection" is unchanged.

[8]  objectList  ::=  object annotation? (',' object annotation? )*

This is contains the RDF* changes a little better compared to having than a new "listElt" (was "objectC") to duplicate the plain Turtle object rule (object without annotation).

Having the annotation like this made it a little easier to generate the triples (it's a steaming parser emitting triples as the parse runs); the object rule returns an RDF Term which is passed, with subject and predicate from the input to objectList, to the annotation production.

Excellent idea and I updated the EYE parser accordingly and the following

PREFIX : <http://example/>

:s :p :o {| :a :b |};
    :p2 :o2 {| :a2 :b2 |},
        :o3 {| :a3 :b3 |}.

produces

PREFIX : <http://example/>

:s :p :o.
<<:s :p :o>> :a :b.
:s :p2 :o2.
:s :p2 :o3.
<<:s :p2 :o2>> :a2 :b2.
<<:s :p2 :o3>> :a3 :b3.

Also the following

PREFIX : <http://example/>

:s :p :o {| :a :b {| :a2 :b2 |} |}.

produces

PREFIX : <http://example/>

:s :p :o.
<<:s :p :o>> :a :b.
<<<<:s :p :o>> :a :b>> :a2 :b2.

@afs
Copy link
Collaborator

afs commented Dec 10, 2020

Do we have negative tests for, the object annotation in collection case?

Not yet - there are just the two annotation examples. They would be added them if this is the direction the WG wishes to go. It would be helpful if the WG could decide that it wanted to explore the "annotation=PG" route and we start on evaluation tests. Given we are all time-poor, I think we have enough to indicate the direction to the point where we can get some confirmation then go back and complete the tests, not complete just syntax then move to evaluation.

Evaluation tests need to way to define results If we define NT* with <<>> as no inferred triple we can write different designs down. I don't see how to write evaluation tests within the "rdf-test" framework unless NT* is defined this way. The alternative seems to be only to be able to tell apart designs through counting in SPARQL.

@afs
Copy link
Collaborator

afs commented Dec 10, 2020

:s :p :o {| :a :b |};
    :p2 :o2 {| :a2 :b2 |},
        :o3 {| :a3 :b3 |}.
:s :p :o {| :a :b {| :a2 :b2 |} |}.

Yes - those are consequences.
This is more confusing:

:s :p :o1, :o2 {| :a :b |} .

because the annotation does not apply to :s :p :o1.
However, there comes a point when tinkering to design for every case does more harm than good.

(IMO Object lists are of lesser use in plain Turtle anyway!)

@josd
Copy link
Contributor

josd commented Dec 10, 2020

Yes indeed,

:s :p :o1, :o2 {| :a :b |} .

annotates :s :p :o2

:s :p :o1.
:s :p :o2.
<<:s :p :o2>> :a :b.

@pchampin
Copy link
Collaborator Author

This is more confusing:

:s :p :o1, :o2 {| :a :b |} .

I see how visually it can give the wrong impression. However, the rule is IMO simple enough to understand: annotations only apply to the object just before them.

@pchampin
Copy link
Collaborator Author

#58 is now merged.
Should we do the same for SPARQL* now?
I say we should...

hartig added a commit that referenced this issue Dec 11, 2020
…e lines of #58; first step to address the SPARQL* part of #9
@hartig
Copy link
Collaborator

hartig commented Dec 11, 2020

Should we do the same for SPARQL* now?

I am on it: #65 ;-)

@pchampin pchampin changed the title Include Annotation syntax in Turtle* and SPARQL* Include Annotation syntax in ~~Turtle* and~~ SPARQL* Feb 4, 2021
@pchampin pchampin changed the title Include Annotation syntax in ~~Turtle* and~~ SPARQL* Include Annotation syntax in Turtle* and SPARQL* Feb 4, 2021
@pchampin pchampin closed this as completed Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concrete-syntax About Turtle-star and other concrete syntaxes sparql-star About SPARQL-star
Projects
None yet
Development

No branches or pull requests

7 participants