-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cataloguing Use Cases From The National Library Of Sweden #23
Comments
Thanks for the three use cases. I'm doing some analysis and extracting semantic implications from them. I'll add sections to the Wiki pages. |
The provenance examples all appear to suffer from using two separate links to create a single relationship - created by X at time Y. Do you want to leave the as is or have them fixed up? |
For the provenance use case, does it matter what the form of a literal is as long as the value is the same? That is, would you ascribe the same provenance information to :x :y 1 . and :x y 01. Similarly, suppose that two IRIs denote the same thing in the universe (perhaps using owl:sameAs). Does it matter which one is in the asserted (or quoted) triple? |
Similar comments and similar questions apply to the change-log use case. |
@pfps Thank you for the analysis of all scenarios; very helpful comments and questions! I'll reply per separate comment. |
Yes, I see that at least the examples in "Manage Classification Metadata" conflate two properties of one occurrence by putting them on the triple itself. Would you agree that this can be solved by stating one relation (like <introduction-to-physics> a :Text ;
bf:classification <literature-education-physics> {|
dc:source [ bf:assigner <annif> ;
dc:date "2023-05-20T08:44:06Z" ]
|} . To what extent do the rest of the examples suffer from this (e.g. the wikidata examples)? I would like to highlight my poor modelling here (in this issue tracker), so that we can ensure that the RDF-star syntax, behaviour and documentation all come together to avoid this becoming common in the wild. I do think triples need to be types, but how this is effectively used is most important. We don't want to "hit people over the head" with semantics if the syntax leads people too easily astray. Note that with only one relation ( So again, yes, we must fix these examples (I'll gladly edit the wiki once we're in agreement); and also capture and learn from the errors so we can prevent them. I tried to make Annotations as Miscellaneous Marginalia more about this tension between "types vs. tokens/occurrences", which seems to be related to the problem of "triples vs. statements vs. events". But these cases overlap so much it was hard to disentangle them (the erroneous ones here are "illustrative" examples there). RDF-star annotations, which are the most useful form of RDF-star for our use cases, are "dangerous" in that the affordance of the syntax perhaps makes it hard to "see" that you're annotating the statement itself, and not something like the observation that led to the assertion, i.e. the occurrence, or event (act or effect). In other words, I mean "useful" as "concise", in that the annotation syntax pairs assertions with annotations (and multiple pairs of those under one subject, thanks to the already concise Turtle syntax). It is a very powerful form of expression, so we should check its effects thoroughly before "releasing it into the wild". |
It does not matter. Yes, we would ascribe the same provenance to those two lexical forms, since they represent the same value. Same goes for two IRIs denoting the same thing. We only use We want full transparency here, as far as I can see, even for this kind of detailed provenance. See the issue description section "How we want quoted triples to behave in our use cases" above for more details on this thinking. |
The same position (full transparency) goes for the union of changes use case (perhaps surprisingly). I believe this holds since the "environment" where this (the algorithm combining the graphs) operates, is "outside of the (temporally constrained) worlds" of the historical assertions (the graphs in the old RDF document versions, no longer in the union of asserted graphs of our published dataset). The algorithm works, in a "closed world", over a dataset of neutral graphs, along with a minimal vocabulary ( It is an open question whether or not the result is intended to be used in a union, default graph of our current assertions ("current beliefs"), which we share with the wider world (as published RDF documents). Rather than opting for quoted triple opacity to make these "blame graphs" publishable as is, I'd rather publish them as explicitly named graphs, and claim that they are neutral (likewise for the old versions themselves). But there is no formal way in RDF to do that. JSON-LD comes closest, I think, by stating:
(Aside: related behaviour for publishing/consuming named graphs when through other syntaxes where recently raised for RDFLib and Jena, respectively.) While this may be more about named graphs, it does relate to the question of opacity, or rather contrasts it with named graphs as "enough isolation" (and as I say above in the description, even that doesn't appear to require opacity). And the resolution to this might yield the answer to w3c/rdf-concepts#46. This could clarify that RDF triples, including quoted triples, is only about relating claims in graphs, and that RDF datasets, possibly more formalized in the future, continues to be about operational management of what is actually used/trusted/believed; and what it formally means, depending on chosen entailment regime (and, with neutral graphs, allowing for untrusted data to be part of those operations, if so desired). Practises may relate (such as quoted triples being linked to graph names as sources), but would be explicitly orthogonal. The act of believing, while possible to describe, cannot happen within a graph. |
See https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Annotations-as-Miscellaneous-Marginalia
https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Detailed-Provenance-in-Cooperative-Union-Cataloguing
https://github.com/w3c/rdf-ucr/wiki/Describing-a-Union-of-Changes-to-a-Named-Graph for clean versions of scenarios for this collection of use cases.
Contact information
Brief Description of our Use Cases
The National Library of Sweden serves the Swedish cooperative union catalog (Libris), which has different audiences both nationally and internationally. To overcome the silo effect of old technology, and to interoperate with different metadata standards, we have developed a cataloging system based on RDF, using linked vocabularies and datasets.
We have encountered a set of overlapping use cases In this catalog, based on needs for descriptive metadata, and by extension projects and data pipelines depending upon that. We believe that RDF-star may provide an effective means for dealing with these cases.
What we want to be able to do
Why it is hard or impossible to do what we want to do without quoted triples
RDF Statement reification could be used, but is unwieldy, especially in order to keep annotations coordinated with assertions. There is no syntax support for it apart from
rdf:ID
on predicate elements in RDF/XML.We use named graphs for effectively working with "record"-sized set of facts, in a single source (our system), commonly about one main entity. But for detailed provenance they are too coarse-grained. Multiple such "records" about the same thing are hard to succinctly display and edit as a combination of description sources. Also, since named graphs have no defined formal semantics (neither what the name denotes, nor what is considered in the union graph of a published dataset), formal interoperability isn't possible today.
Thus, RDF-star annotations appear to fill the gap here, but their semantics remain to be tested in practice.
Various patterns for qualification conflate metadata sources (triples as "occurrences of facts"), logical facts (the statement which has a truth value) and the events or entities that these facts conceptually describe. This is the kind of "creative modelling" that tends to lead to divergent practices and weak interoperability across applications.
If RDF-star semantics can work to clarify and unify design patterns here this would be a major argument in its favor.
What is the role of RDF-star quoted triples in our use cases
How we want quoted triples to behave in our use cases
As far as I can see, referentially transparent (at least for annotations).
We don't need referential opacity for quoted triples since we treat
owl:sameAs
(andowl:differentFrom
if ever used) to be about the reference, not the sense (as in Sense and reference). We are very careful of ingesting data usingowl:sameAs
because of that, due to the obvious risk for conflation of identity it entails.In the same way, no opacity is needed to prevent datatype entailment on quoted triples. Any encoded, lexical representation difference is an implementation detail, and not a semantically relevant difference. ("Provenance" here is about "who said what, where", not "how (was it encoded)". The moment a quote occurs in a graph, it is expressed within that context.)
We mainly need "opacity", as in "separate worlds", between graphs, until we deem them truthful and put them in the union graph. Quotation of suggested assertions are enough, we only "let in"
owl:sameAs
assertions that we are certain of are aliases of the exact same identity. (I'm not sure even these has to be referentially opaque in the linguistic sense; which seems to be supported by Carroll, Bizer, Hayes, Stickler - Named Graphs (2005), notably p. 6 and 7, along with this email.)Of course, this differs from the view in the CG report, and we need to work out if our use cases would work the same in either interpretation.
Example RDF graphs that shows parts of our use cases
I have added draft scenarios with example data to the wiki:
[EDIT: fixed links broken when pages were moved]
The text was updated successfully, but these errors were encountered: