VAET index #351

darkleaf · 2020-06-02T09:15:58Z

Why the VAET index wasn't implemented?
Will be it useful to implement it?

tonsky · 2020-06-02T12:21:07Z

For all use-cases AVET covers what VAET does. I’m not sure what’s its purpose in Datomic is. If you have a use-case, I might consider adding it. Otherwise, it’ll just be a performance penalty

refset · 2020-06-28T16:21:37Z

VAET is only necessary when you don't know which reverse reference-type attributes might be applicable for a given entity. For a small DataScript-sized system this is probably never an issue. At a larger scale, with potentially millions of attributes, you need VAET in order to do explorative navigation of the graph efficiently (when you can't be sure which reverse attributes might be relevant beforehand). Without VAET you have to scan through all the possible combinations in AVET.

tonsky · 2020-06-28T20:09:57Z

This only arises when deleting entities, right?

refset · 2020-06-28T23:14:30Z

I'm saying that if you have many different :db.type/ref attributes then you can't query [?e _ e123] without naively scanning through all combinations in AVET.

I don't know exactly how VAET would be wired up internally, but I would expect to see something else more advanced than the EAVT filter here:

datascript/src/datascript/db.cljc

Line 504 in 6e80073

(filter (fn [^Datom d] (and (= v (.-v d))

Again, I doubt most users would be storing enough attributes or overall data in DataScript to warrant changing anything. Most of the time people know exactly which attributes they need in their queries.

For the curious, this is where near enough the same kind of scanning occurs for deleting entities:

datascript/src/datascript/db.cljc

Line 1278 in 6e80073

    
           v-datoms (vec (mapcat (fn [a] (-search db [nil a e])) (-attrs-by db :db.type/ref)))]

wbrown · 2020-06-29T00:45:30Z

In the Datalog-derived system I built, VAET was used to determine which records had a particular attribute-value pair, where the cardinality of the attribute was low; I found it to be essential in deletion. But this was for a multi-terabyte database with billions of tuples.

…

On Jun 28, 2020, at 7:14 PM, Jeremy Taylor ***@***.***> wrote: I'm saying that if you have many different :db.type/ref attributes then you can't query [?e _ :some-id] without naively scanning through all combinations in AVET. I don't know exactly how VAET would be wired up internally, but I would expect to see something else more advanced than the EAVT filter here: https://github.com/tonsky/datascript/blob/6e80073355ef35bf0b0d94afd3dd5d0f97ded2c1/src/datascript/db.cljc#L504 Again, I doubt most users would be storing enough attributes or overall data in DataScript to warrant changing anything. Most of the time people know exactly which attributes they need in their queries. For the curious, this is where near enough the same kind of scanning occurs for deleting entities: https://github.com/tonsky/datascript/blob/6e80073355ef35bf0b0d94afd3dd5d0f97ded2c1/src/datascript/db.cljc#L1278 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

tonsky · 2020-06-29T13:56:05Z

[?e _ e123]

That would require VAET, of course, but I can’t imagine an use-case where anybody would need that type of query. The only use-case I come up with is deletion, and it’s a tradeoff — faster deletions or faster insertions (one less index to fill).

refset · 2020-06-29T17:03:47Z

Yeah, it's certainly a tradeoff. I suspect that without VAET the deletion time cost might be too unpredictable at a large-enough scale to maintain consistent throughput, whereas the space (+ time) costs of inserting into VAET are amortised which makes it a much safer default for a highly-distributed production system like Datomic.

I think the non-deletion use-cases can be described as "exploratory" queries, where your data set is large (many attributes) and not well understood (can't predict which attributes might be relevant). I imagine Roam's knowledge graph "backlinks" feature might be a good use-case for benefiting from a native VAET (backlink!) index, if such a graph ever grew large enough (i.e. multi-user).

Incidentally, I just came across these discussions that mention Datomic's ability to do wildcard reverse lookups with the Entity API via (keys (.cache (.touch ent))):

https://stackoverflow.com/questions/15629085/in-datomic-how-is-it-possible-to-find-out-what-keys-are-available-for-reverse-l
https://stackoverflow.com/questions/14189647/get-all-fields-from-a-datomic-entity
(admittedly I can't see this behaviour officially documented anywhere, and I've not confirmed that it works with my own eyes)

Providing a similar index to VAET is something that the Crux team has been contemplating recently, motivated by our work on a generic entity navigation UI. Crux avoids the deletion issue by choosing not to enforce referential integrity, i.e. there are no explicit reference-type attributes, only values and IDs that might happen to correspond.

I'll leave it at that for my contributions to this thread but hopefully it's useful context for some. I remember thinking about this very question several years ago, so it's good to have finally written it down :)

bluesun · 2020-07-07T09:53:26Z

Here is a use case that produces potentially large numbers of attributes and in which a query of the form of [?es _ e123] is relevant:

One way to import large vectors (for example an ordered sequence of events) is to use “numbered” attributes. For example [event-source-id :345 event-id] where event-source is the entity representing the source of the ordered sequence of events and event is the event at index 345 in the vector.

In this case, [?event-source _ specific-event-id] is a legitimate query clause to find the source metadata relating to a specific event.

This use case uses potentially a large number of attributes. And this data design is the one of the few ones having similar performance characteristic to a vector in Clojure or array in other languages.

This use case was important enough to be integrated in the RDF schema standard:
https://www.w3.org/TR/rdf-schema/#ch_containermembershipproperty

bluesun · 2020-08-11T12:54:46Z

Here is a use case that produces potentially large numbers of attributes and in which a query of the form of [?es _ e123] is relevant:

One way to import large vectors (for example an ordered sequence of events) is to use “numbered” attributes. For example [event-source-id :345 event-id] where event-source is the entity representing the source of the ordered sequence of events and event is the event at index 345 in the vector.

In this case, [?event-source _ specific-event-id] is a legitimate query clause to find the source metadata relating to a specific event.

This use case uses potentially a large number of attributes. And this data design is the one of the few ones having similar performance characteristic to a vector in Clojure or array in other languages.

This use case was important enough to be integrated in the RDF schema standard:
https://www.w3.org/TR/rdf-schema/#ch_containermembershipproperty

Hi @tonsky , I am curious about your feedback on this point. I can easily extend the argumentation if you are interested...

tonsky · 2020-08-11T15:40:56Z

Thank you @bluesun, your argument is clear. I do not like the approach, it feels like misuse of the attribute field, it produces an unbounded amount of keywords, requires string concatenation/parsing if you want two ordered attributes. And, it works poorly with existing indexes structure :)

refset mentioned this issue Aug 16, 2022

VA index for wildcard reverse reference lookups? xtdb/xtdb#923

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VAET index #351

VAET index #351

darkleaf commented Jun 2, 2020

tonsky commented Jun 2, 2020

refset commented Jun 28, 2020

tonsky commented Jun 28, 2020

refset commented Jun 28, 2020 •

edited

wbrown commented Jun 29, 2020 via email

tonsky commented Jun 29, 2020

refset commented Jun 29, 2020 •

edited

bluesun commented Jul 7, 2020

bluesun commented Aug 11, 2020

tonsky commented Aug 11, 2020

VAET index #351

VAET index #351

Comments

darkleaf commented Jun 2, 2020

tonsky commented Jun 2, 2020

refset commented Jun 28, 2020

tonsky commented Jun 28, 2020

refset commented Jun 28, 2020 • edited

wbrown commented Jun 29, 2020 via email

tonsky commented Jun 29, 2020

refset commented Jun 29, 2020 • edited

bluesun commented Jul 7, 2020

bluesun commented Aug 11, 2020

tonsky commented Aug 11, 2020

refset commented Jun 28, 2020 •

edited

refset commented Jun 29, 2020 •

edited