how to implement SHACL Rules (forward chaining) efficiently #350
Replies: 9 comments 10 replies
-
@steveraysteveray What is the expansion ratio (total/explicit triples) you obtain? |
Beta Was this translation helpful? Give feedback.
-
@VladimirAlexiev, first, apologies that the open223 sample model files haven't been updated lately. I will also need to double-check why the entire QUDT vocabulary somehow made its way into the compiled version of the first sample model! Running the inferences on that model myself yields an expansion ratio of 1.77. This is mostly due to the "connection" inferencing mentioned in #343. As an aside, I look at the inferencing rules as used in the ASHRAE standard as similar to the way calculations can be applied in an Excel spreadsheet - more in a unidirectional flow rather than loops or recursion. |
Beta Was this translation helpful? Give feedback.
-
This is due to a bug in the script that generates all of the model pages. I'll work on fixing that today; hopefully it's just as simple as regenerating that model. I will probably edit the script so it generates 3 links for the model: the base turtle, turtle + inferred triples, and turtle + inferred triples + all dependent ontologies (easiest for demos -- just load a single graph!).
I've always thought of them as a fixed-point computation, though this is rooted in my experience building an OWL 2 RL inference engine on top of datalog. I think an efficient forward chaining implementation of SHACL rules would be very welcome (at least to me!). |
Beta Was this translation helpful? Give feedback.
-
Why forward chaining?
May be it is, may it isn't. It depends on the rules and the data.
That page says what is used! RETE for forward, SLD resolution for backward. Hybrid is forward rules that generate backwards rules. There are many algorithms for datalog:
Yes. |
Beta Was this translation helpful? Give feedback.
-
The above puzzles me. I'm not aware of having censored anything. "Recipes for disaster" have certainly existed in various standardization efforts. Where databases of any model are concerned, queries that would deliver Cartesian products as results are often considered such recipes, especially as the database in question grows. You were commenting on Scalability is often a question of specific query execution and/or construction tactics. Deployment resources (RAM, persistent storage, temporary storage, physical processor cores, logical processor threads, etc.) also play a significant part. Iteration of inference rules may or may not be scalable. It depends on how complex the rules are, how much data is being reasoned over, whether all that data is local, and various other considerations. Given that such iteration is in regular use by Holger and/or users of TopBraid, it certainly seems to be sufficiently scalable for their current purposes. Whether that will continue to be the case is beyond my assessment. |
Beta Was this translation helpful? Give feedback.
-
I don't see how iterating over the inferences can improve the scalability situation. Is it because we can define the number of iterations to limit it? Generally i think even with simple rules you can easily run into scalability/recursion issues. My understanding is that for this reason recursion was left out of the current/previous SHACL version. I think using Datalog would be a good way to implement rules. @afs how would you address that it is not implementable in SPARQL anymore (which i understand was a principle of SHACL)? |
Beta Was this translation helpful? Give feedback.
-
Absolutely. SHACL is already highly parallelizable and there is existing work on parallel datalog. If you limit the expressivity for "efficiency reasons", you are removing or restricting features. At that point, I want to see an explanation of how the user is going to achieve their task. |
Beta Was this translation helpful? Give feedback.
-
What is happening in one rule depends on the output of another rule (which may even depend on the first rule). It's about completeness of deductions rather than scalability. There are techniques to refine the iteration (e.g. grouping rules by dependencies) - it requires being able to look into the rule which is part of why the SHACL-AF
The goal is that each rule can be translated to SPARQL. A rule may sometimes need to be run more than once. I'm happy to do an AMA session at a WG meeting, or outside, if it helps. |
Beta Was this translation helpful? Give feedback.
-
@afs, I was a little surprised to see this. I have been trying to use native SHACL rules whenever it looks straightforward rather than SPARQL because I assumed it would be more efficient. But I'm much more comfortable writing SPARQL. Should I not bother doing rules in SHACL? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The following "provoked" the posting of this issue:
It's not just that an apriori-unknown number of iterations is needed.
Another problem is that the "productivity" of these iterations decreases (a "diminishing returns" problem). But each iteration runs all the rules, so it does the same number of checks (or even a bit more due to the newly inferred data).
Let's gather some info on how repositories optimize forward chaining:
Which SHACL rule features are easier to implement?
Some excuses on my part:
Beta Was this translation helpful? Give feedback.
All reactions