New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuple of n-items #98
Comments
How is this backward-compatible or even RDF-compatible?.. |
The database must be setup with fixed |
What particular aspect of RDF the proposal enter with conflict with outside serialization formats like Turtle that could also be adapted to work on tuple of n-items. |
Like, the whole RDF stack which has been always based on triples? |
This is a conservative stance. Without much argument(s) to uphold it. Moreover, it is plain wrong, named graphs are 4-tuple stores. |
RDF statements consist of three elements they are called triples. Quads are an extension of the model, as it says below. |
While formally quite different in consequences to RDF*/SPARQL* it does seem to point to the same need to make it easier to query for provenance to certain triples. The new syntax does not seem to add any new capabilities to the platforms (other than a default graph name/keyword also see #43). I.e. I can't imagine a query written in this syntax that could not be written in the existing GRAPH syntax. |
But @JervenBolleman I think the provide query example does not expose the complexity of what is being proposed. I can imagine a query eg like this for a 6-store: select * {graph <?g1,?g2,?g3> {?person a foaf:Person}} @amirouche I have some questions:
|
Thanks for responding to my proposal!
Sorry, I don't understand what this query does.
Performant in terms of speed yes. In terms of on-disk space it is another story, I have optimized that, I think there is no way to reduce "the indexation factor". The indexation factor is in the case of 6-tuple equal to 20. It is the central binomial coefficient. Without compression, it requires 20 times the original size of the data, to store it on disk.
Not yet. But that is more a question for the underlying storage engine: wiredtiger, foundationdb, rocksdb. All of them are used in production for big workloads.
Yes, that is both hoply and srfi-168 provide two implementations respectively for Python and the other for Scheme.
That is exactly the gist of this "invention". The minimal number of indexes required is the central binomial coefficient. It is explained in https://math.stackexchange.com/q/3146568/23663 and the related https://stackoverflow.com/q/55143485/140837
I think that the tradeoff that is made by n-tuples vs. "n-tuples in 3-tuples" is trading space disk for lower CPU usage. That said, reification of every 3-tuple will also require more space. That is an open question if the same data stored in 3-tuple vs n-tuple have different disk usage. "n-tuple in 3-tuples" will always be slower. Does it answer the question?
I already use It seems odd that the "provenance" of a given tuple is deferred to another set of tuple. I read on the mailing list that n-tuple are called "chunk" and "graph of chunks" in cognitive science. |
Adding more indexes slows down data loading and update: this should be taken into account in a realistic benchmark. If you use C as singleton, I.e. as a tuple id, then you can attach any extra fields to that id. |
This feature is out of scope for SPARQL 1.2 |
@VladimirAlexiev Thanks a lot for the feedback! By the way, I found a way to reduce indexation factor to something like 1 time the original size of the data plus something that depends on the total number of n-tuples. That is, much less that 20 times the size of the original data. I will close the issue since it is out-of-scope for current iteration. Thanks all! |
Why?
Like explained in
Frey:2017
, the named graph (ngraph) allows to represent metadata in a more performant way. I propose to generalize that scheme to support any number of metadata item that must add information to every tuple e.g. like provenance, licence or history significance.Previous work
(graph subject predicate object alive change-id uid)
.Proposed solution
In the case of quad store, instead of
The proposition is to also support the following more general syntax:
Considerations for backward compatibility
Backward compatible.
The text was updated successfully, but these errors were encountered: