-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDF-star for labelled property graphs #16
Comments
I think this is asking RDF-star to solve an apparent-non-monotonicity problem that property graphs routinely gloss over. Property graph architectures typically manage this by a combination of retracting assertions, putting up with awkward temporal models, and having no negative qualifiers. For example, there's a You could improve/complicate modeling by adding a Alternatively, everyone could maintain their own list of negative annotation predicates, which would be doable for wikidata-like use cases where the annotation predicates (or at least the negating ones) are centrally-controled. Either of these changes the current semantics of queries, be they SPARQL, DL, or some triplesMatching API (rdf-star-plus-plus). Note that the wontfix solution has a similar impact by telling users that they have to (remember to) write carefully-qualified SPARQL/DL/triplesMatching queries if there might be some annotation which expires or negates the assertion. You could try to deal with |
oops, didn't notice this was a UC&R list. should I move these comments back to w3c/rdf-star#33 ? |
In my understanding @pchampin didn't imply the problems with monotonicity that you refer to, rather to the contrary. But let me phrase it my way: the triple stating that Pavel works at Stardog is atemporal and insofar is true if Pavel works or worked at Stardog at any time now or in the past (or maybe even in the future, if we are very sure about that future). Further detail, added through annotation and retrieved through extra querying/filtering, doesn't change that. In that sense annotating triples is monotonic: it just adds more detail. Adding ever more detail is the most basic thing we do on the semantic web. Every detail added reduces the number of possible worlds that the global RDF graph describes. That is not a non-monotonic activity. What would be non-monotonic is if we outright rejected a statement in the annotation. If that was possible we would indeed have to check for every result if it's negated box is checked. But we are in a not too different situation already with RDF-star: for every annotation we have to check if the annotated triple is indeed asserted. Granted, normally we work the other way round and check if a triple we already know about also has some annotations attached, and that's much less troublesome.
I have to read up on that and maybe I don't properly understand what you refer to, but isn't almost anything we can say only truthy? To practically every fact we assert, more detail could be added. Often we can't even be totally sure. All that hasn't brought the semantic web down. EDIT: I read up on truthyness and I guess I do now get your point. There is still the unasserted quoted triple to express statements that we want to document but not assert. I guess that's practically close enough to negated statements in most cases and doesn't jeopardize monotonicity. |
The use case description says:
and
RDF standard reification gets around this problem by not reifying the statement type but describing an occurrence/instance of that type. There is no direct connection between the two: the subject of the reification quad stands for some speech act, so to say. The RDF-star CG report takes a similar route, but makes it optional to annotate an occurrence/instance or the type itself. The latter, annotating the type, is impossible in RDF standard reification. However, both approaches provide no direct connection between a stated triple occurrence/instance and annotations on it. The annotations are either made on all statements of that type or on some instance of it. In sharp contrast to that approach, singleton properties create a new type of property which is a subproperty of the intended relation, and thereby get around the limitations imposed by the set semantics. So singleton properties provide a hint at how this problem could be solved without requiring a fundamental change from set to bag semantics, right? It is a stretch, but if done right it could work well both on the surface and underground. The basic idea is that each singleton property is just a link between the statement and its annotations (the most important being which property the singleton property is a subproperty of). If we replace that link by some syntactic sugar, nothing of particular value is lost. Indeed, if blank nodes were allowed in predicate position, the whole singleton property approach would be a very natural way to model n-ary relations. Long prolegomenon, I know, but see how through these eyes the following RDF-star shortcut syntax
is just syntactic sugar for
This certainly doesn't break set semantics and on the surface it captures the intuitions of users. What it doesn't allow is to annotate the type itself, but that purpose can still be served by the unabbreviated syntax. To make this more similiar to the unabbreviated syntax lets replace the singleton property names by occurrences. Remember the CG report example (which for some reason omits to assert
The same in shortcut syntax (but
The same in singleton property-turned-occurrence syntax:
|
tl;dr: don't use a temporal assertion in the UC&R; it will mislead readers about good modeling and rdf-star utility in general.
Agreed; I intended to convey that by calling it "apparent-non-monotonicity" in that your average user would not think of your caveat and add: … MINUS { << ?employee :worksAt :Stardog >> :until ?expiry } before sending out XMass bonus checks. I believe standard reification provides a half-way solution to this human-engineering problem in that the query ?employee :worksAt :Stardog gets no results, forcing them to work harder structuring it as a reified statement, which may lead them to think "someone had a reason to reify this" and prowl around in the docs or the graph before writing: SELECT ?recipient {
?statement
rdf:subject ?recipient ;
rdf:predicate :worksAt ;
rdf:object :Stardog .
} MINUS { ?statement :until ?expiry } I think that a subproperty of IMO, the :Employment123
:employed :Pavel ;
:starting "2011"^^xsd:year ;
:ending "2019"^^xsd:year . For the UC&R, I'd steer towards invariants like: << :wrinklySkin :heritableIn :Pisum_sativum >> :positedBy :Gregor_Mendel . (Though honestly even that was the result of some cherry-picking and has lead centuries of scientists to see Mendelian behavior where the real causality is more nuanced. Still, good enough to give readers the idea.) |
No, this is the place for discussion of the issue. There is a separate place (currently the Wiki page) for a clean description of the issue. |
@rat10 The shorthand syntax doesn't do this occurrence stand-off. Maybe it should. Edit: On re-reading your message I see that you aren't claiming that the shorthand syntax does a standoff, but that it could. What the shorthand syntax
is shorthand for is
|
This issue, as is stated at the beginning, is taken from an issue to the RDF-star community group. As such, the issue keeps the discussion from there. If there is a place to modify the examples, it is in the Wiki page. But the question then is what to use instead of temporal or certainty factors. I don't think that changing to provenance is a good idea precisely because provenance is so far from temporal or certainty factors. I had reached out to the original submitter but didn't hear back so I have started the process of expanding the use case using only the existing discussion. In any case, the use case is to capture LPGs and my understanding of the use of LPGs is that this kind of annotation is quite common. |
I agree that it's (more) common (than it should be). I propose that it introduces unexpected usage constraints that nullify the value proposition of making surface graph simple. If you must always remember to look for an If you like the |
@ericprud The ways that statement annotation can be used are countless and such would be the number of predicates your proposal would require, like The bottom line is: LPG-style modelling (with or without RDF-star) is a way to distinguish a primary topic from secondary attributes. For every n-ary relation it largely depends on the application which aspect is considered primary and which is secondary. Sure there are the usual aspects like temporality (valid now or only some time?) but by and large we can't know in advance which aspect will be considered dominant in some application. We can put up a sign saying "This is RDF-star and you will use it wrongly!" or we can adjust. |
Indeed (although I don't really get what you mean by "standoff"). I could go even further and say: Let's go back to the 2017 version of RDF-Star (then RDF*) in which embedded triples were asserted. That would make the shortcut syntax unnecessary. Let "unasserted assertions" either be handled by RDF standard reification or by a graph literal datatype. Let also all the use cases that require syntactic fidelity like versioning, explainable AI, etc. be handled by graph literals. Let the intircacies of syntactic blank nodes be dealt with by Concise Bounded Descriptions in graph literals. Let the semantics of embedded triples by defined as outlined above, analogous to singleton properties. Even Souri's RDFn proposal can be represented that way. |
I agree that if you have a database with an ubiquitous meta-model where all assertions have starts, ends, certainties and provenance, you don't have to roll them into your predicate name. The way the original use case was presented illustrated illustrated a common modeling pathology in PGs; someone introduces a (non-mon) annotation, changing the interpretation of the graph, without calling up every other user and saying "you now always have to look for X". Since most PGs are either private or narrowly-scoped, readers of, e.g. https://neo4j.com/developer/graph-database/ can probably delay confronting these issues without too much cost. However, when setting readers' expectations for what RDF-Star can do for them, I believe the UC&R should pick one or more of:
Tx for examples used in LPG literature; helps scope and ground the conversations! I tried to characterize each usage into
The AnzoGraphDB tickit:* example avoids the present tence and implies a meta-model with e.g. "endDate" annotations. |
Thanks for the pointer. But it is still hard to get a good example. What is needed is something that can be correctly done in Labelled Property Graphs (so no using strings for things), that is atemporal (no dates or, maybe, only dates that don't give rise to the appearance of non-montonicity), binary, and isn't about provenance. Perhaps the air route example would do. |
I have argued above why I think the monotonicity argument is misled. How many facts are eternally and absolutey true? If a statement says that Alice buys a car and a second statement adds that that car is red, does that make the first statement false? How many properties in established ontologies are specific about their temporal aspect? It is the application that controls if the user gets only currently valid data or data that has been true at some time in space. It is the users responsibility to check if the application works as expected or if more querying is needed. |
The solution then is to submit a use case that is explicitly about temporal or other non-monotonic information. |
As outlined above IMO temporal annotations are no more non-monotonic than any other annotations. I suggest you give a principled account of what you think are non-monotonic annotations. Going this route would also require an idea of how to educate users of RDF-star in a succinct and unambiguous way which annotation domains are to be avoided, because that would be some very important information to give. I suspect it would be met with scepticism. [EDIT] You can formalize what so far was rather a "prudent approach" by the CG report: RDF-star can only be used for administrative house keeping and strictly close-to-the-metal, out-of-band application specific tasks. That will however lead to three questions:
IMO it would be much more sensible to adjust in the way I outlined above: don't force users to manoeuvre around STOP-signs, but teach them how to adjust their expectations. |
I wanted to react on this particular part of the wiki page of this UC
I can see two straighforward ways to work around this problem
If that's deemed too verbose, we could even imagine more syntactic sugar, e.g.
(I'm not advocating for it, just pointing out that it's possible)
I believe this satisfies the constraints of the original UC:
|
I agree that changing the abstract syntax is scary and for sure can't be done by a WG without extensive prior work and without being tasked to do just that. However, I have a different opinion on what bullets we have to bite. In
and if I'm not mistaken, that is what one would have to query for to not miss annotations on occurrences. IMO that is prohibitively complicated, especially as one can't be sure in advance which modelling decisions have been made by the creator of this data. E.g. as long as a multi-part annotation on a statement is the only one of that type, an author would be excused to not use the more involved This proposal is just syntactic sugar for what the CG report already propsoses. One could easily replace these blank nodes with proper IRIs. So the proposal uses statement identifiers, just hidden in syntax. Semantically this then maps very closely to RDF standard reification which talks about some occurrence/instance/token of an abstract statement. However, there is no direct link between a statement of the same form in some snippet of RDF and that reification (no matter if the syntax is RDF standard reification or your example above or say the RDF/XML id attribute), as there can't be. W.r.t. re-modelling this gives an edge to RDF standard reification which simply doesn't allow one to annotate a statement without first defining an identifier. Of course standard reification can't annotate types, but then again: I have yet to see a sound and concise description of which annotations are allowed in RDF-star, but many things seem to be considered not kosher - even temporal annotations! I think the WG should look at Singleton Properties again. They provide a semantically sound way to annotate statement instances. They lack a bit w.r.t. support on the surface as the proposal wasn't bold enough to suggest an extension to the syntax, but RDF-star provides that extension... The underlying conflict is that RDF compacts all statements of some type into one type, and in the process necessarily loses detail. This seems to be a conscious decision in the design of RDF, favoring integration over differentiation, but many users/applications do indeed find those details important. Understanding each statement instance as a subtype and supporting that in implementations, but out of view of users, would keep those details. It wouldn't break the abstract model, but it would stop breaking applications. In other words: it might be useful to bite the bullet and drop the early optimization on statement types. Instead keep them separate in the application and only merge them in query results, not too different from the way blank nodes are treated in practice: counting semantics in SPARQL, leaning and existential semantics for reasoning. [EDIT] In RDF anything can be modeled as an n-ary relation. Imagine the above example as one:
Now define that this is what
maps to. Add support in SPARQL to omit querying for the type. |
My minor comment concerns the name. Specifically "labeled". I think it's better to just use a "property graph". There are several reasons:
To sum up, I suggest "labelled property graphs" -> "property graphs". |
The rationale for "labelled" is that nodes and edges can have labels, not that they all have to have labels. It may be that the non-labelled version is more common. |
See https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-labelled-property-graphs for the current status of this use case.
Taken from w3c/rdf-star#33
** Brief Description of your use case:
As a KG vendor, we want Stardog customers to have easy to use means to attach properties to edges in their RDF graph or load property graph data with edge properties. Here "easy" specifically means that neither the customer nor the database should have to wreck the data model (and queries) to use any of the workarounds available in plain RDF for that purpose (like the RDF reification).
*** What you want to be able to do:
We want to be able to easily assert properties on edges and query them using SPARQL.
Also we want to enable customers to store that annotated statement in any named graph they want so we don't want to use named graphs for representing statement-level metadata.
*** What is the role of RDF-star quoted triples in your use case:
RDF quoted triples will be the subject of properties on edges.
*** Why it is hard or impossible to do what you want to do without quoted triples:
Using RDF reification or other approaches requires changes to the data model and, particularly, complex SPARQL queries to retrieve the data.
Regarding named graphs, there's a very simple argument why we want to keep both annotated triples and named graphs. We regularly see people wondering if they should manage different parts of their data in i) separate datasets (i.e. separate physical databases inside a server instance) ii) separate named graphs inside one dataset. There are pros and cons to both. Sometimes the choice isn't clear.
So far they've been able to just take data stored in the default graph of database X and move it into a named graph inside Y. Importantly, they won't need to change queries (or apps), they only need a connection string to a different database and a different query dataset (ie. FROM in SPARQL). The latter can be defined outside of queries as defined in the SPARQL Protocol. Now, we don't want RDF-star to limit that flexibility: if you want to take a bunch of triples with annotations and move them into a named graph, that should be similarly easy.
*** How you want quoted triples to behave in your use case:
*** An example RDF graph that shows part of your use case:
For example, if the customer has
:pavel :worksAt :Stardog
edge in the data and wants to add... :since 2011
to it, neither they nor the database should have to transform it into a bunch of different triples like[] rdf:subject :pavel ; rdfs:predicate :worksAt ...
(and then also rewrite queries so that?s :worksAt :Stardog
still returns:pavel
).As a further example we want to be able to have data like
and queries like
The text was updated successfully, but these errors were encountered: