Skip to content

6. Querying

Paula Gearon edited this page Jun 30, 2021 · 1 revision

Contents

Query Structure

Asami queries can be provided as a:

  • Sequence
  • String
  • Map

Internally, all queries will be converted to the map form, but the other formats may be more useful in different contexts. For consistency, the sequence form will typically be shown.

Query Form

Queries have the form:

find-clause [ in-clause] [ with-clause ] where-clause

In general, Asami queries are similar to Datomic queries, but with reduced functionality.

Find Clause

This clause starts with the keyword :find followed by the elements to be returned from a query. If using the map form, then use the key :find with the elements in a sequence.

find-clause        := ':find' (find-element+ | find-sequence | find-tuple | find-scalar)
find-sequence      := '[' variable '...' ']'
find-tuple         := '[' variable+ ']'
find-scalar        := variable '.'
find-element       := variable | aggregate
variable           := '?'[A-Za-z0-9-_*]+  (a symbol starting with ?)
aggregate          := aggregate-operator '(' variable ')'
aggregate-operator := 'sum' | 'count' | 'avg' | 'max' | 'min' | 'first' | 'last'

Note: find-sequence, find-tuple, and find-scalar should all wrap find-element rather than variable. This will be addressed in Issue #51

Find Clause Description

The elements in a :find clause describe the data to be returned from a query. These elements select the data that is identified in :where clause. It is an error for a variable to appear in the :find clause if it does not appear in the :where clause.

The results from a query will have different types, depending on the form of the :find clause:

Form Example Return Type
find-element+ ?a ?b ?c Sequence of vector
find-sequence [?a ...] Sequence of scalars
find-tuple [?a ?b] Single vector of scalars
find-scalar ?a . Single scalar value
  • Elements: This is the basic query form, and simply projects the selected items, with one vector containing these values for every result row. While rarely used, results have metadata with includes the variable name for each column.
  • Sequence: Similar to a standard selection with a single variable. However, every row returns just a single value instead of a vector containing a value.
  • Tuple: This packages a single result row as a vector, without the sequence that would usually wrap the rows. Any result rows after the first will be ignored.
  • Scalar: Returns a single value from the first result row. Any result rows after the first will be ignored.

Aggregate Expressions

The find clause may also include aggregate expressions. These return the aggregate value like any other variable.

Example: Consider the following set of results from the find clause: :find ?student ?subject

?student ?subject
"Alice" "PH101"
"Alice" "PH102"
"Bob" "ME100"
"Bob" "ME101"
"Bob" "E3101"

If this clause is changed to: :find ?student (count ?subject) Then the results will change to:

?student ?count-subject
"Alice" 2
"Bob" 3

Note that the column name (found in the metadata of the results) is automatically generated from the operator and the variable.

Available aggregate operators are:

Operator Description
count The number of matching results.
sum The sum of all values. Numerical data only.
avg The average of all values. Numerical data only.
max The maximum value. Data must be comparable.
min The minimum value. Data must be comparable.
first The first value in the results.
last The last value in the results.

In Clause

This clause starts with the keyword :in followed by the elements for describing data sources. If using the map form, then use the key :in with the elements in a sequence.

in-clause          := ':in' in-element+
find-element       := '$' | variable | in-tuple | in-sequence
in-sequence        := '[' variable '...' ']'
in-tuple           := '[' variable+ ']'
variable           := '?'[A-Za-z0-9-_*]+  (a symbol starting with ?)

In Clause Description

This describes the sources for executing a query. These sources are provided as arguments to the query, and will be bound to the variables in the :in clause. The variables will then have these values associated with them throughout the processing of the query.

Each element of an :in clause will be aligned with the trailing arguments to the asami.core/q query function, to provide the data sources for the query. If no :in clause is provided, then the query will use presume that a single argument will be provided, which will be bound to the default database for the query. This is an equivalent clause to: :in $

The default database is the source that tuple-patterns in the :where clause use to resolve their data. Most queries should have this available.

The elements to be bound may be:

Element Example Description
$ $ A special variable indicating the default database is in this position
variable ?a A single scalar value
sequence [?a ...] The bound value is a sequence of values to be bound to the variable
tuple [?a ?b] A bound value is a tuple of values that will be assigned to each of the vars

Example: The clause: :in $ ?a [?b ?c] [?d ...]

Can have parameters passed: db "Fred" ["car" 5] [1 2 3 4 5]

This will define:

  • The default database is db
  • The variable ?a is set to the string "Fred"
  • The variable ?b is set to the string "car", and ?c is set to 5
  • The variable ?d is bound to the series [1 2 3 4 5]. Anything connecting to ?d will be considered against each of these values.

With Clause

This clause starts with the keyword :with followed by the variables to be considered during aggregation. If using the map form, then use the key :with with the elements in a sequence.

with-clause        := ':with' variable+
variable           := '?'[A-Za-z0-9-_*]+  (a symbol starting with ?)

With Clause Description

When aggregate queries are executed on a variable, as much of the query as possible is executed without that variable leading to partial results. The aggregate is then executed for each one of these partial results. If any variables should not appear in the initial step, and should instead be considered in each execution of the aggregate, then they will be listed in the :with clause.

Example: Given the following data:

{:db/ident "f" :label "female"}
{:db/ident "m" :label "male"}
{:db/ident "alice", :name "Alice", :gender {:db/ident "f"}}
{:db/ident "barbara", :name "Barbara", :gender {:db/ident "f"}}
{:db/ident "cary", :name "Cary", :gender {:db/ident "m"}}
{:db/ident "antoine", :name "Antoine", :age 16, :gender {:db/ident "m"}}
{:db/ident "betty", :name "Betty", :age 14, :gender {:db/ident "f"}}
{:db/ident "chuck", :name "Chuck", :age 12, :gender {:db/ident "m"}}
{:db/ident "ann", :name "Ann", :age 13, :gender {:db/ident "f"}}
{:db/ident "bob", :name "Bob", :age 11, :gender {:db/ident "m"}}
[:db/add "alice" :child "antoine"]
[:db/add "alice" :child "betty"]
[:db/add "alice" :child "chuck"]
[:db/add "barbara" :child "ann"]
[:db/add "barbara" :child "bob"]
[:db/add "cary" :child "ann"]
[:db/add "cary" :child "bob"]

This describes:

  • Alice has 3 children: Antione, Betty, and Chuck.
  • Barbara has 2 children: Ann and Bob.
  • Cary has the same 2 children as Barbara: Ann and Bob.

Selecting the number of children with female parents:

:find (count ?child) :where [?parent :gender ?f] [?f :label "female"] [?parent :child ?child]

This results in:

?count-child
5

This can be changed to group the children by parent if the ?parent variable is considered in the aggregation:

:find (count ?child) :with ?parent :where [?parent :gender ?f] [?f :label "female"] [?parent :child ?child]
?count-child
3
2

Where Clause

This clause starts with the keyword :where followed by the constraints of a query. If using the map form, then use the key :where with the elements in a sequence.

with-clause         := ':where' (constraint | binding)+
constraint          := pattern-constraint | or-constraint | and-constraint | not-constraint | optional-constraint | filter-constraint
or-constraint       := '(or' constraint+ ')'
and-constraint      := '(and' constraint+ ')'
not-constraint      := '(not' constraint+ ')'
optional-constraint := '(optional' constraint+ ')'
filter-constraint   := '[(' predicate arg* ')]'
pattern-constraint  := '[' element [attribute-element [element]] ']'
binding             := '[(' arg arg+ ')' variable ']'
attribute-element   := element[ '*' | '+' ]
element             := variable | scalar-value
arg                 := variable | scalar-value | symbol
scalar-value        := [-][1-9][0-9]* | '"' .* '"' | keyword       (i.e. numbers, strings or keywords)
keyword             := ':' [label '/'] label
symbol              := [label '/'] label
label               := [ [a-zA-Z_-.?*!] ] [a-zA-Z0-9_-.?*!]*       (note: not the quote character)
variable            := '?'[A-Za-z0-9-_*]+                          (a symbol starting with ?)

Where Clause Description

Formally speaking, every result of a query is based on setting the variables to those values where the constraints are valid.

In practice, this is done by selecting every set of allowed values, and removing those that are not allowed. Each set of allowed values for the variables is called a binding. The different types of constraints provide different mechanisms for selecting these bindings.

Pattern Constraint

These form a pattern which will be matched against the default graph. The graph is made up of tuples which describe an "edge" as a connection between two nodes: node edge-label node

Pattern constraints have up to 3 elements which are either scalar values or variables. The constraint will match to every tuple in the graph that has the same scalar values in the same positions. The result is the variables of the constraint being bound to the values in the matching positions for each matching tuple.

Example: A graph has the following tuples:

[:pete :type :person]
[:pete :name "Peter"]
[:anne :type :person]
[:anne :name "Anne"]
[:ziggy :type :cat]
[:ziggy :name "Ziggy"]

The pattern:

[?p :type :person]

Will match 2 tuples:

[:pete :type :person]
[:anne :type :person]

The result is that ?p gets 2 bindings of:

[[?p = :pete] [?p = :anne]]

The pattern:

[?p :name ?name]

Matches 3 tuples:

[:pete :name "Peter"]
[:anne :name "Anne"]
[:ziggy :name "Ziggy"]

The result is that ?p and ?name both get bindings:

[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"] [?p = :ziggy, ?name = "Ziggy"]]

Notice how these bindings appear in pairs. When ?p is :anne, then ?name is "Anne". When ?p changes, then so must ?name. In this way, each binding forms a set.

Transitive Attributes

Attributes (also called Properties, Predicates, or Edges) can refer to relationships between entities that are transitive. An example of a transitive relationship is "is smaller than". So if a mouse is smaller than a cat, and a cat is smaller than a goat, then the mouse is smaller than a goat. Similarly, if the goat is smaller than an elephant, then so is the mouse and the cat.

We can see this with some simple data:

(def data
  [{:db/id -1 :name "Washington Monument"}
   {:db/id -2 :name "National Mall"}
   {:db/id -3 :name "Washington, DC"}
   {:db/id -4 :name "USA"}
   {:db/id -5 :name "Earth"}
   {:db/id -6 :name "Solar System"}
   {:db/id -7 :name "Orion-Cygnus Arm"}
   {:db/id -8 :name "Milky Way Galaxy"}
   {:db/id -9 :db/ident 9 :name "Arlington"}
   {:db/id -10 :db/ident 10 :name "Falls Church"}
   [:db/add -1 :is-in -2]
   [:db/add -2 :is-in -3]
   [:db/add -3 :is-in -4]
   [:db/add -4 :is-in -5]
   [:db/add -5 :is-in -6]
   [:db/add -6 :is-in -7]
   [:db/add -7 :is-in -8]
   [:db/add -9 :neighbor -3]
   [:db/add -10 :neighbor -9]])

We can see what "Washington Monument" is in:

=> (d/q '[:find [?name ...] :where [?e :name "Washington Monument"] [?e :is-in ?e2] [?e2 :name ?name]] (d/db conn))
("National Mall")

We can also follow that attribute transitively by appending the :is-in attribute with a + character:

=> (d/q '[:find [?name ...] :where [?e :name "Washington Monument"] [?e :is-in+ ?e2] [?e2 :name ?name]] (d/db conn))
("National Mall" "Washington, DC" "USA" "Earth" "Solar System" "Orion-Cygnus Arm" "Milky Way Galaxy")

To see everything starting at the Washington Monument, we want to follow the :is-in attribute zero steps or more. This is done by appending a * character to the attribute:

=> (d/q '[:find [?name ...] :where [?e :name "Washington Monument"] [?e :is-in* ?e2] [?e2 :name ?name]] (d/db conn))
("Washington Monument" "National Mall" "Washington, DC" "USA" "Earth" "Solar System" "Orion-Cygnus Arm" "Milky Way Galaxy")

We may also want want to follow any kind of attribute. To do this, use a variable attribute:

=> (d/q '[:find ?name :where [?e :name "Falls Church"][?e ?a* ?e2][?e2 :name ?name]] (d/db conn))
(["Falls Church"] ["Arlington"] ["Solar System"] ["Washington, DC"] ["USA"] ["Orion-Cygnus Arm"] ["Earth"] ["Milky Way Galaxy"])

Path Querying

As a special case, if you know 2 specific nodes in the system, you can ask for the path between them.

=> (d/q '[:find ?e . :where [?e :name "Falls Church"]] (d/db conn))
:tg/node-10736
=> (d/q '[:find ?e . :where [?e :name "Milky Way Galaxy"]] (d/db conn))
:tg/node-10734
=> (d/q '[:find ?a . :where [:tg/node-10736 ?a+ :tg/node-10734]] (d/db conn))
[:neighbor :neighbor :is-in :is-in :is-in :is-in :is-in]

Notice that this is a single binding value that contains a vector value!

Alternatively, the entity/value ends of the constraint can be defined earlier in the query, but this is not necessarily guaranteed without specifying the order of operations. Typically, if both ends of the transitive constraint are not bound, then only the first step will be returned:

=> (d/q '[:find ?a ?name :where [?e :name "Falls Church"][?e2 :name ?name][?e ?a* ?e2]] (d/db conn))
([nil "Falls Church"]
 [:neighbor "Arlington"]
 [:neighbor "Solar System"]
 [:neighbor "Washington, DC"]
 [:neighbor "USA"]
 [:neighbor "Orion-Cygnus Arm"]
 [:neighbor "Earth"]
 [:neighbor "Milky Way Galaxy"])

However, if the order is enforced (using the :user query planner) to ensure that the transitive attribute is evaluated last, then full paths are returned:

=> (d/q '[:find ?a ?name :where [?e :name "Falls Church"][?e2 :name ?name][?e ?a* ?e2]] (d/db conn) :planner :user)
([[:neighbor :neighbor :is-in :is-in :is-in :is-in] "Orion-Cygnus Arm"]
 [[:neighbor :neighbor :is-in :is-in :is-in] "Solar System"]
 [[:neighbor :neighbor :is-in :is-in] "Earth"]
 [[:neighbor :neighbor] "Washington, DC"]
 [[:neighbor :neighbor :is-in] "USA"]
 [[:neighbor :neighbor :is-in :is-in :is-in :is-in :is-in] "Milky Way Galaxy"]
 [[:neighbor] "Arlington"])
Unlabeled Variables

The _ symbol can be used in a pattern constraint to indicate a position where the result is "don't care". It acts the same as a variable, but the result is ignored.

Shortened Pattern Constraint

If a pattern constraint has fewer than 3 elements, then this is the equivalent to a full 3-element constraint where the remaining elements are _. The following two constraints are exactly equivalent:

[?e :tg/entity]
[?e :tg/entity _]

This indicates that the first position binds values to the variable ?e, while the second must match :tg/entity. The third position can be anything.

Similarly, the following 2 patterns are identical:

[?e]
[?e _ _]

This will return every value that forms the start of an edge in the graph.

There are also 2 special types of bindings that can be created by a pattern constraint: an empty binding, and a universal binding.

Empty Bindings

The empty bindings occurs when no tuples match. For instance, the binding [?p :type :dog] results in the empty seq: [].

Once an empty bindings appears in a conjunctive join expression, then the total result will be empty.

Universal Bindings

The universal bindings occurs when a pattern matches without any variables being bound. For instance [:ziggy :name "Ziggy"] matches exactly one tuple. This creates a single valid binding, despite no variables appearing: [[]]

This value is a seq that is not empty, despite not containing any data.

A conjunctive join of any expression to a universal binding is itself, meaning that a join to a conjunctive binding is an identity operation.

OR Constraint

The OR Constraint is also called a Disjunction. This type of constraint is made of a list of other constraints. It takes the results of each of the other constraints and forms a union. The variables bound by each of the component constraints must all be the same.

Example: Using the data above, the constraint:

(or [?p :type :cat] [?p :name "Anne"])

Results in bindings of:

[[?p = :ziggy] [?p = :anne]]

AND Constraint

The AND Constraint is also called a Conjunction. This type of constraint is made of a list of other constraints. The result is an inner join of constraints, where matching variables between constraint bindings must match to be considered in the final result. For each match, all the variables from each of the component constraints get bound. If there are no matching variables, then the result is an outer product.

Example: Using the data above, the constraint:

(and [?p :type :cat] [?p :name ?name])

Specifies a conjunction between bindings of:

[[?p = :ziggy]] and

[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"] [?p = :ziggy, ?name = "Ziggy"]]

These constraints share the variable ?p which is only equal for a single value: [[?p = :ziggy, ?name = "Ziggy"]]

Note: The top level :where clause has an implicit AND constraint on it.

NOT Constraint

The NOT constraint is made of a list of other constraints, and should be used in a conjunction against other constraints. It performs a conjunction on its arguments to create a binding, and this is used to remove bindings from the outer conjunction.

Example:

[?p :name ?name] (not [?p :type :cat])

This is a compound constraint, formed by an implicit conjunction between a pattern constraint and a NOT constraint. The pattern constraint matches the three :name tuples, leading to the bindings:

[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"] [?p = :ziggy, ?name = "Ziggy"]]

Meanwhile, the pattern inside the NOT constraint leads to the bindings: [[?p = :ziggy]]. This matches the other bindings with the ?p variable, so every bindings where there is a matching ?p will be removed. The result is:

[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"]]

OPTIONAL Constraint

This is similar to the AND constraint. The OPTIONAL expression contains a series of constraints with create bindings as normal. However, the difference is that bindings are not required for a result to be returned. This is referred to as a Left Outer Join in relational databases.

Example: A graph has the following tuples:

[:pete :type :person]
[:pete :name "Peter"]
[:pete :email "peter@example.com"]
[:anne :type :person]
[:anne :name "Anne"]
[:anne :email "anne@ex.net"]
[:ziggy :type :cat]
[:ziggy :name "Ziggy"]

The pattern sequence:

[?p :name ?name] [?p :email ?email]

Results in the bindings:

[[?p = :pete, ?name = "Peter", ?email = "peter@example.com"]
 [?p = :anne, ?name = "Anne", ?email = "anne@ex.net"]]

This does not include any information about Ziggy because he doesn't have an email address. This can be addressed with the pattern sequence:

[?p :name ?name] (optional [?p :email ?email])

This results in the bindings:

[[?p = :pete, ?name = "Peter", ?email = "peter@example.com"]
 [?p = :anne, ?name = "Anne", ?email = "anne@ex.net"]
 [?p = :ziggy, ?name = "Ziggy", ?email = nil]]

Filter Constraint

This kind of constraint runs code against bound variables. Whenever the code returns a false or nil value, then the associated bindings are removed.

Example:

[?p :name ?name] [(re-find #"e" ?name)]

This is a regular expression search that looks for the letter "e". The ?name variable only contains an "e" when it is set to "Peter" or "Anne". So the result is:

[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"]]

There are a few things to note about the functions:

  • All functions in clojure.core are available.
  • Functions may not be nested.
  • In Clojure, any function in scope can be accessed. There have been inconsistency issues using using eval on ClojureScript, so most functions are not directly accessible.
  • The function can be a variable. This means that a novel function can be passed in via the :in clause.

Binding

A Binding is not actually a constraint. Instead, it binds a new value based on the output of a Clojure/ClojureScript expression. The expressions follow the same rules as for Filter Constraints. Once a new variable has been bound, other constraints and bindings can refer to this variable

Example:

[?p :name ?name] [(count ?name) ?name-length]

After the first pattern has been resolved, the bindings are:

[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"] [?p = :ziggy, ?name = "Ziggy"]]

The binding expression adds a new variable to each binding:

[[?p = :pete, ?name = "Peter", ?name-length = 5] [?p = :anne, ?name = "Anne", ?name-length = 4] [?p = :ziggy, ?name = "Ziggy", ?name-length = 5]]

Examples

Start by loading some example movie data:

(require '[asami.core :as d :refer [q]])
(def db-uri "asami:mem://movie")
(d/create-database db-uri)
(def conn (d/connect db-uri))
(def first-movies [{:movie/title "Explorers"
                    :movie/genre "adventure/comedy/family"
                    :movie/release-year 1985}
                   {:movie/title "Demolition Man"
                    :movie/genre "action/sci-fi/thriller"
                    :movie/release-year 1993}
                   {:movie/title "Johnny Mnemonic"
                    :movie/genre "cyber-punk/action"
                    :movie/release-year 1995}
                   {:movie/title "Toy Story"
                    :movie/genre "animation/adventure/comedy"
                    :movie/release-year 1995
                    :movie/sequel "Toy Story 2"}
                   {:movie/title "Sense and Sensibility"
                    :movie/genre "drama/romance"
                    :movie/release-year 1995}])

@(d/transact conn {:tx-data first-movies})  ;; dereference with @ to ensure the transaction completes
(def db (d/db conn))

Simple pattern constraints. Get the names of movies released in 1995:

=> (q '[:find ?name :where [?m :movie/release-year 1995][?m :movie/title ?name]] db)
(["Johnny Mnemonic"] ["Toy Story"] ["Sense and Sensibility"])

Projecting a single variable as a sequence. Get the names of movies released in 1995:

=> (q '[:find [?name ...] :where [?m :movie/release-year 1995][?m :movie/title ?name]] db)
("Johnny Mnemonic" "Toy Story" "Sense and Sensibility")

Filter constraint. Get the names and years of movies released after 1990:

=> (q '[:find ?name ?year :where [?m :movie/release-year ?year][?m :movie/title ?name][(> ?year 1990)]] db)
(["Demolition Man" 1993] ["Johnny Mnemonic" 1995] ["Toy Story" 1995] ["Sense and Sensibility" 1995])

Filter, using external data. Get the names of movies that include "comedy". This needs a regex, but the parser has trouble with this right now, so just pass it as an extra parameter:

=> (q '[:find [?name ...] :in $ ?re :where [?m :movie/title ?name][?m :movie/genre ?genre][(re-find ?re ?genre)]] db #"comedy")
("Explorers" "Toy Story")

Single return value. What is the one movie released in 1985?

=> (q '[:find ?name . :where [?m :movie/title ?name][?m :movie/release-year 1985]] db)
"Explorers"

Providing bindings for a variable. What are the movies released in the following list of years: 1985, 1990, 1993

=> (q '[:find [?name ...] :in $ [?year ...] :where [?m :movie/title ?name][?m :movie/release-year ?year]] db [1985 1990 1993])
("Explorers" "Demolition Man")

OR constraint. What are the movies released in 1993 or 1995:

=> (q '[:find [?name ...] :where [?m :movie/title ?name] (or [?m :movie/release-year 1993] [?m :movie/release-year 1995])] db)
("Demolition Man" "Johnny Mnemonic" "Toy Story" "Sense and Sensibility")

NOT constraint, Map form. 1985 wasn't a very funny year. What are the comedies that were not in 1985? The query is getting long, so break it up and pass the clauses in a map.

=> (q '{:find [[?name ...]]
        :in [$ ?comedy]
        :where [[?m :movie/title ?name]
                [?m :movie/genre ?genre]
                [(re-find ?comedy ?genre)]
                (not [?m :movie/release-year 1985])]}
      db #"comedy")
("Toy Story")

OPTIONAL constraint. What are the movies released in 1995, and their sequels, if they have one:

=> (q '[:find ?name ?sequel :where [?m :movie/title ?name] [?m :movie/release-year 1995] (optional [?m :movie/sequel ?sequel])] db)
(["Toy Story" "Toy Story 2"] ["Sense and Sensibility" nil] ["Johnny Mnemonic" nil])

Query Planning

By default, Asami will try to determine the best order to execute constraint, in order to minimize process time and memory. This is based on both the structure of the query, and the contents of the storage. However, there are occasions when a query may execute in an unexpected order.

In order to see the execution strategy of a query, use the show-plan function. This accepts identical parameters to the q query function:

=>  (d/show-plan '[:find ?name ?sequel :where [?m :movie/title ?name] [?m :movie/release-year 1995] (optional [?m :movie/sequel ?sequel])] db)
{:plan ([?m :movie/release-year 1995] (optional [?m :movie/sequel ?sequel]) [?m :movie/title ?name])}

The returned value is structured Clojure data.

The plan can be overridden by providing an option to select the :user planner:

=>  (d/show-plan '[:find ?name ?sequel :where [?m :movie/title ?name] [?m :movie/release-year 1995] (optional [?m :movie/sequel ?sequel])] db :planner :user)
{:plan ([?m :movie/title ?name] [?m :movie/release-year 1995] (optional [?m :movie/sequel ?sequel]))}