Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are Datascript queries so slow? #130

Open
jbsar opened this issue Nov 24, 2015 · 7 comments
Open

Why are Datascript queries so slow? #130

jbsar opened this issue Nov 24, 2015 · 7 comments

Comments

@jbsar
Copy link

jbsar commented Nov 24, 2015

The benchmark query q4 takes 40 milliseconds to execute on a 20000 entities database on the JVM (even longer in CLJS).

By contrast, a lookup in a Clojure map of 20000 maps with the same data takes 300 nanoseconds on the JVM. Why does it take 100000 times more in Datascript?

I understand that multiple lookups in the index trees are necessary in order to filter by gender and gather the 3 datoms results. But I would imagine that would take a few microseconds to complete.

I am currently designing a front-end app that will contain a lot of entities (in the 10000s). From what I understand, it would be unusable with Datascript because reaction times would be in the 100s of milliseconds.

Am I missing something?

@jbsar jbsar changed the title Benchmark result seem Benchmark result seem unlikely Nov 24, 2015
@jbsar jbsar changed the title Benchmark result seem unlikely Benchmark results seem unlikely Nov 24, 2015
@jbsar jbsar changed the title Benchmark results seem unlikely Why are Datascript queries so slow? Nov 24, 2015
@tonsky
Copy link
Owner

tonsky commented Nov 25, 2015

This is q4:

["q4" '[:find ?e ?l ?a :where [?e :name "Ivan"]
                              [?e :last-name ?l]
                              [?e :age ?a]
                              [?e :sex :male]]]

What this query does is quite different from map lookup:

For 20000-person database, there’s not a one, but ~2500 Ivans. Half of them have :sex :male, so the size of the result set is ~1250 tuples. Allocating result set, filling it with tuples and creating tuples themselves is all part of the query, and it’s not free. DataScript also does deduplication of result (you wont’s see same tuple twice in the results), which is not free as well.

Also note that this query has 3 joins (2500 × 20000, 2500 × 20000, 2500 × 10000) which are also not free. Because of the way data is stored (flat sorted sets, not hierarchical nested maps), you can’t “just lookup” rest of the structure given its id.

Bottom line is, yes, if you can do with hierarchical nested maps, and don’t need to query them other than lookup-by-id, you better off with them.

DataScript is in different category, so expect different tradeoffs: query speed depends on the size of result set, you need to sort clasuses to have smaller joins, accessing entity properties is not free given its id, etc. As a benefit, you gain ability to query dataset for different projections, forward and reverse reference lookups, joins between different sets, etc. And direct index lookup (datascript.core/datoms) is still fast and comparable to lookup in a map (at least comparable, think binary lookup vs hashtable lookup, logarithm vs constant). Queries do much more than that.

@jbsar
Copy link
Author

jbsar commented Nov 25, 2015

Thanks for your eye-opening answer!

I guess I will have to find another solution to hold the state of my application. Do you think there's a chance of drastic performance improvements happening (especially in CLJS)?

@tonsky
Copy link
Owner

tonsky commented Nov 26, 2015

I’m optimizing certain sorts of queries in https://github.com/tonsky/datascript/blob/master/src/datascript/query_v3.cljc. But it won’t be orders of magnitude probably. I don’t think it’s possible to have complex queries and high performance at the same time. When you need speed, use direct index lookup.

@vibl
Copy link

vibl commented Nov 26, 2015

I presented an idea that might speed up queries: #132

It'd be great if you could give some feedback on it. It looks too easy to be right...

Vianney

@mhuebert
Copy link

mhuebert commented Jan 7, 2016

I've been gradually replacing my queries with index lookups and seeing orders-of-magnitude speedups.

Eg. this recursive pull query was taking ~150 milliseconds with my dataset:

(d/q '[:find (pull ?e [:uuid :node/label :node/ui-selected :node/order :node/cell :ui-collapsed {:node/_up ...}]) .
                       :in $ ?uuid :where [?e :uuid ?uuid]] @db uuid)

I replaced it with this function and the time dropped to ~0.1 milliseconds:

(defn get-node-tree [eid]
  (-> (select-keys (d/entity @db eid) [:uuid :node/label :node/ui-selected :node/order :node/cell :ui-collapsed])
      (assoc :node/_up (map (comp get-node-tree first) (d/datoms @db :avet :node/up eid)))))

@vibl
Copy link

vibl commented Jan 7, 2016

All this performance data begs a question: could there be better performing algorithms for Datascript? Some that perform in O(log n) rather than O(n) or worse?

@EugenDueck
Copy link

EugenDueck commented Dec 21, 2016

@mhuebert That is a huge performance gain! It seems this would be one of the probably many cases where a query optimizer (if something like that already exists in datascript - I'm just starting to look into the code) would shine. Once optimized, repeated invocations of the same query would exhibit hand-optimized performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants