Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index usage doesn't match datomic behavior and appears to be much less efficient #462

Closed
latacora-paul opened this issue Apr 16, 2024 · 1 comment

Comments

@latacora-paul
Copy link

latacora-paul commented Apr 16, 2024

Hi!

I've started measuring how many datoms are "scanned" by a query using db filters and was surprised to see way more datoms being considered than I expected to answer pretty simple queries. I then checked this behavior against datomic and saw that the number of datoms being considered there exactly matched my mental model for how queries translate into index traversals.

This suggests to me that there are some sizable performance gains for certain queries if we can figure it out. This is especially relevant to me because I'm trying to use the new storage protocols against a remote store (dynamodb) and need to minimize the amount of load calls.

Below you'll find two tests that demonstrate the behavior on each platform - in the datascript case it's considering a total of 50 datoms to answer the query (even though there are only 25 in the db!) whereas in datomic it's only looking at 2 (as expected).

This was tested on version 1.6.3 of datascript and version 1.0.6733 of datomic peer.

Datascript:

(deftest datascript-example
  (let [count-datoms-scanned
        (fn [db query & args]
          (let [inspected-datoms (atom [])
                filtered-db      (datascript.core/filter db (fn [db datom] (swap! inspected-datoms conj datom)))
                answer           (apply datascript.core/q query filtered-db args)]
            [(count (deref inspected-datoms)) answer]))
        letters
        (map (comp keyword str char) (range 97 123))
        schema
        (reduce (fn [schema letter] (assoc schema letter {:db/index true})) {} letters)
        tx-data
        (for [[pre post] (partition 2 1 letters)]
          [:db/add (str (random-uuid)) pre post])
        db
        (-> schema
            (datascript.core/empty-db)
            (datascript.core/db-with tx-data))
        magic-entity-id 8
        magic-attribute-id :h]
    ; THIS FAILS - DATASCRIPT IS ACTUALLY LOOKING AT 50 DATOMS INSTEAD OF EXPECTED 2
    (is (= [2 #{[magic-entity-id magic-attribute-id :i]}]
           (count-datoms-scanned
             db
             '[:find ?e ?a ?v
               :in $ ?entity
               :where
               [?entity ?a ?v]
               [?e ?a ?v]]
             magic-entity-id))
        "I can jump straight to the datoms that matter because of indices")))

Datomic:

; PS the "magic-entity-id" was constant in my trials but I'm not sure how well it translates to other machines/versions

(deftest datomic-example
  (let [count-datoms-scanned
                           (fn [db query & args]
                             (let [inspected-datoms (atom [])
                                   filtered-db      (datomic.api/filter db (fn [db datom] (swap! inspected-datoms conj datom)))
                                   answer           (apply datomic.api/q query filtered-db args)]
                               [(count (deref inspected-datoms)) answer]))
        letters
                           (map (comp keyword str char) (range 97 123))
        schema
                           (for [letter letters]
                             {:db/ident       letter
                              :db/valueType   :db.type/keyword
                              :db/cardinality :db.cardinality/one
                              :db/index       true})
        tx-data
                           (for [[pre post] (partition 2 1 letters)]
                             [:db/add (str (random-uuid)) pre post])
        db
                           (-> (datomic.api/connect
                                 (doto (str "datomic:mem://" (str (random-uuid)))
                                   (datomic/create-database)))
                               (datomic.api/db)
                               (datomic.api/with schema)
                               :db-after
                               (datomic.api/with tx-data)
                               :db-after)
        magic-entity-id    17592186045425
        magic-attribute-id 79]
    (is (= [2 #{[magic-entity-id magic-attribute-id :i]}]
           (count-datoms-scanned
             db
             '[:find ?e ?a ?v
               :in $ ?entity
               :where
               [?entity ?a ?v]
               [?e ?a ?v]]
             magic-entity-id))
        "I can jump straight to the datoms that matter because of indices")))
@tonsky tonsky closed this as completed in 86d8204 Apr 23, 2024
@tonsky
Copy link
Owner

tonsky commented Apr 23, 2024

If you can try master — it should be better now for this specific case. If you find more such cases, send them my way, I’ll see if I can fix them too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants