Why are Datascript queries so slow? #130

jbsar · 2015-11-24T18:26:02Z

The benchmark query q4 takes 40 milliseconds to execute on a 20000 entities database on the JVM (even longer in CLJS).

By contrast, a lookup in a Clojure map of 20000 maps with the same data takes 300 nanoseconds on the JVM. Why does it take 100000 times more in Datascript?

I understand that multiple lookups in the index trees are necessary in order to filter by gender and gather the 3 datoms results. But I would imagine that would take a few microseconds to complete.

I am currently designing a front-end app that will contain a lot of entities (in the 10000s). From what I understand, it would be unusable with Datascript because reaction times would be in the 100s of milliseconds.

Am I missing something?

tonsky · 2015-11-25T13:08:08Z

This is q4:

["q4" '[:find ?e ?l ?a :where [?e :name "Ivan"]
                              [?e :last-name ?l]
                              [?e :age ?a]
                              [?e :sex :male]]]

What this query does is quite different from map lookup:

For 20000-person database, there’s not a one, but ~2500 Ivans. Half of them have :sex :male, so the size of the result set is ~1250 tuples. Allocating result set, filling it with tuples and creating tuples themselves is all part of the query, and it’s not free. DataScript also does deduplication of result (you wont’s see same tuple twice in the results), which is not free as well.

Also note that this query has 3 joins (2500 × 20000, 2500 × 20000, 2500 × 10000) which are also not free. Because of the way data is stored (flat sorted sets, not hierarchical nested maps), you can’t “just lookup” rest of the structure given its id.

Bottom line is, yes, if you can do with hierarchical nested maps, and don’t need to query them other than lookup-by-id, you better off with them.

DataScript is in different category, so expect different tradeoffs: query speed depends on the size of result set, you need to sort clasuses to have smaller joins, accessing entity properties is not free given its id, etc. As a benefit, you gain ability to query dataset for different projections, forward and reverse reference lookups, joins between different sets, etc. And direct index lookup (datascript.core/datoms) is still fast and comparable to lookup in a map (at least comparable, think binary lookup vs hashtable lookup, logarithm vs constant). Queries do much more than that.

jbsar · 2015-11-25T18:42:43Z

Thanks for your eye-opening answer!

I guess I will have to find another solution to hold the state of my application. Do you think there's a chance of drastic performance improvements happening (especially in CLJS)?

tonsky · 2015-11-26T09:50:14Z

I’m optimizing certain sorts of queries in https://github.com/tonsky/datascript/blob/master/src/datascript/query_v3.cljc. But it won’t be orders of magnitude probably. I don’t think it’s possible to have complex queries and high performance at the same time. When you need speed, use direct index lookup.

vibl · 2015-11-26T11:54:01Z

I presented an idea that might speed up queries: #132

It'd be great if you could give some feedback on it. It looks too easy to be right...

Vianney

mhuebert · 2016-01-07T01:48:16Z

I've been gradually replacing my queries with index lookups and seeing orders-of-magnitude speedups.

Eg. this recursive pull query was taking ~150 milliseconds with my dataset:

(d/q '[:find (pull ?e [:uuid :node/label :node/ui-selected :node/order :node/cell :ui-collapsed {:node/_up ...}]) .
                       :in $ ?uuid :where [?e :uuid ?uuid]] @db uuid)

I replaced it with this function and the time dropped to ~0.1 milliseconds:

(defn get-node-tree [eid]
  (-> (select-keys (d/entity @db eid) [:uuid :node/label :node/ui-selected :node/order :node/cell :ui-collapsed])
      (assoc :node/_up (map (comp get-node-tree first) (d/datoms @db :avet :node/up eid)))))

vibl · 2016-01-07T10:58:51Z

All this performance data begs a question: could there be better performing algorithms for Datascript? Some that perform in O(log n) rather than O(n) or worse?

EugenDueck · 2016-12-21T00:48:14Z

@mhuebert That is a huge performance gain! It seems this would be one of the probably many cases where a query optimizer (if something like that already exists in datascript - I'm just starting to look into the code) would shine. Once optimized, repeated invocations of the same query would exhibit hand-optimized performance.

jbsar changed the title ~~Benchmark result seem~~ Benchmark result seem unlikely Nov 24, 2015

jbsar changed the title ~~Benchmark result seem unlikely~~ Benchmark results seem unlikely Nov 24, 2015

jbsar changed the title ~~Benchmark results seem unlikely~~ Why are Datascript queries so slow? Nov 24, 2015

vibl mentioned this issue Nov 25, 2015

Idea for a reactive and faster alternative to Datascript #132

Open

tommikaikkonen mentioned this issue Apr 17, 2016

Similarities with redux-schema? redux-orm/redux-orm#13

Closed

tiensonqin mentioned this issue May 14, 2021

Refactor queries && fix issues related to references logseq/logseq#1872

Merged

5 tasks

logseq-cldwalker mentioned this issue Mar 7, 2022

Optimizing nested query rules for graph traversal logseq/logseq#4424

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are Datascript queries so slow? #130

Why are Datascript queries so slow? #130

jbsar commented Nov 24, 2015

tonsky commented Nov 25, 2015

jbsar commented Nov 25, 2015

tonsky commented Nov 26, 2015

vibl commented Nov 26, 2015

mhuebert commented Jan 7, 2016

vibl commented Jan 7, 2016

EugenDueck commented Dec 21, 2016 •

edited

Loading

Why are Datascript queries so slow? #130

Why are Datascript queries so slow? #130

Comments

jbsar commented Nov 24, 2015

tonsky commented Nov 25, 2015

jbsar commented Nov 25, 2015

tonsky commented Nov 26, 2015

vibl commented Nov 26, 2015

mhuebert commented Jan 7, 2016

vibl commented Jan 7, 2016

EugenDueck commented Dec 21, 2016 • edited Loading

EugenDueck commented Dec 21, 2016 •

edited

Loading