-
Notifications
You must be signed in to change notification settings - Fork 29
6. Querying
Asami queries can be provided as a:
- Sequence
- String
- Map
Internally, all queries will be converted to the map form, but the other formats may be more useful in different contexts. For consistency, the sequence form will typically be shown.
Queries have the form:
find-clause [ in-clause] [ with-clause ] where-clause
In general, Asami queries are similar to Datomic queries, but with reduced functionality.
This clause starts with the keyword :find
followed by the elements to be returned from a query. If using the map form, then use the key :find
with the elements in a sequence.
find-clause := ':find' (find-element+ | find-sequence | find-tuple | find-scalar)
find-sequence := '[' variable '...' ']'
find-tuple := '[' variable+ ']'
find-scalar := variable '.'
find-element := variable | aggregate
variable := '?'[A-Za-z0-9-_*]+ (a symbol starting with ?)
aggregate := aggregate-operator '(' variable ')'
aggregate-operator := 'sum' | 'count' | 'avg' | 'max' | 'min' | 'first' | 'last'
Note: find-sequence, find-tuple, and find-scalar should all wrap find-element rather than variable. This will be addressed in Issue #51
The elements in a :find
clause describe the data to be returned from a query. These elements select the data that is identified in :where
clause. It is an error for a variable to appear in the :find
clause if it does not appear in the :where
clause.
The results from a query will have different types, depending on the form of the :find
clause:
Form | Example | Return Type |
---|---|---|
find-element+ | ?a ?b ?c |
Sequence of vector |
find-sequence | [?a ...] |
Sequence of scalars |
find-tuple | [?a ?b] |
Single vector of scalars |
find-scalar | ?a . |
Single scalar value |
- Elements: This is the basic query form, and simply projects the selected items, with one vector containing these values for every result row. While rarely used, results have metadata with includes the variable name for each column.
- Sequence: Similar to a standard selection with a single variable. However, every row returns just a single value instead of a vector containing a value.
- Tuple: This packages a single result row as a vector, without the sequence that would usually wrap the rows. Any result rows after the first will be ignored.
- Scalar: Returns a single value from the first result row. Any result rows after the first will be ignored.
The find clause may also include aggregate expressions. These return the aggregate value like any other variable.
Example:
Consider the following set of results from the find clause: :find ?student ?subject
?student | ?subject |
---|---|
"Alice" | "PH101" |
"Alice" | "PH102" |
"Bob" | "ME100" |
"Bob" | "ME101" |
"Bob" | "E3101" |
If this clause is changed to: :find ?student (count ?subject)
Then the results will change to:
?student | ?count-subject |
---|---|
"Alice" | 2 |
"Bob" | 3 |
Note that the column name (found in the metadata of the results) is automatically generated from the operator and the variable.
Available aggregate operators are:
Operator | Description |
---|---|
count |
The number of matching results. |
sum |
The sum of all values. Numerical data only. |
avg |
The average of all values. Numerical data only. |
max |
The maximum value. Data must be comparable. |
min |
The minimum value. Data must be comparable. |
first |
The first value in the results. |
last |
The last value in the results. |
This clause starts with the keyword :in
followed by the elements for describing data sources. If using the map form, then use the key :in
with the elements in a sequence.
in-clause := ':in' in-element+
find-element := '$' | variable | in-tuple | in-sequence
in-sequence := '[' variable '...' ']'
in-tuple := '[' variable+ ']'
variable := '?'[A-Za-z0-9-_*]+ (a symbol starting with ?)
This describes the sources for executing a query. These sources are provided as arguments to the query, and will be bound to the variables in the :in
clause. The variables will then have these values associated with them throughout the processing of the query.
Each element of an :in
clause will be aligned with the trailing arguments to the asami.core/q
query function, to provide the data sources for the query. If no :in
clause is provided, then the query will use presume that a single argument will be provided, which will be bound to the default database for the query. This is an equivalent clause to: :in $
The default database is the source that tuple-patterns in the :where
clause use to resolve their data. Most queries should have this available.
The elements to be bound may be:
Element | Example | Description |
---|---|---|
$ |
$ |
A special variable indicating the default database is in this position |
variable | ?a |
A single scalar value |
sequence | [?a ...] |
The bound value is a sequence of values to be bound to the variable |
tuple | [?a ?b] |
A bound value is a tuple of values that will be assigned to each of the vars |
Example:
The clause:
:in $ ?a [?b ?c] [?d ...]
Can have parameters passed:
db "Fred" ["car" 5] [1 2 3 4 5]
This will define:
- The default database is
db
- The variable
?a
is set to the string "Fred" - The variable
?b
is set to the string "car", and?c
is set to 5 - The variable
?d
is bound to the series[1 2 3 4 5]
. Anything connecting to?d
will be considered against each of these values.
This clause starts with the keyword :with
followed by the variables to be considered during aggregation. If using the map form, then use the key :with
with the elements in a sequence.
with-clause := ':with' variable+
variable := '?'[A-Za-z0-9-_*]+ (a symbol starting with ?)
When aggregate queries are executed on a variable, as much of the query as possible is executed without that variable leading to partial results. The aggregate is then executed for each one of these partial results. If any variables should not appear in the initial step, and should instead be considered in each execution of the aggregate, then they will be listed in the :with
clause.
Example: Given the following data:
{:db/ident "f" :label "female"}
{:db/ident "m" :label "male"}
{:db/ident "alice", :name "Alice", :gender {:db/ident "f"}}
{:db/ident "barbara", :name "Barbara", :gender {:db/ident "f"}}
{:db/ident "cary", :name "Cary", :gender {:db/ident "m"}}
{:db/ident "antoine", :name "Antoine", :age 16, :gender {:db/ident "m"}}
{:db/ident "betty", :name "Betty", :age 14, :gender {:db/ident "f"}}
{:db/ident "chuck", :name "Chuck", :age 12, :gender {:db/ident "m"}}
{:db/ident "ann", :name "Ann", :age 13, :gender {:db/ident "f"}}
{:db/ident "bob", :name "Bob", :age 11, :gender {:db/ident "m"}}
[:db/add "alice" :child "antoine"]
[:db/add "alice" :child "betty"]
[:db/add "alice" :child "chuck"]
[:db/add "barbara" :child "ann"]
[:db/add "barbara" :child "bob"]
[:db/add "cary" :child "ann"]
[:db/add "cary" :child "bob"]
This describes:
- Alice has 3 children: Antione, Betty, and Chuck.
- Barbara has 2 children: Ann and Bob.
- Cary has the same 2 children as Barbara: Ann and Bob.
Selecting the number of children with female parents:
:find (count ?child) :where [?parent :gender ?f] [?f :label "female"] [?parent :child ?child]
This results in:
?count-child |
---|
5 |
This can be changed to group the children by parent if the ?parent
variable is considered in the aggregation:
:find (count ?child) :with ?parent :where [?parent :gender ?f] [?f :label "female"] [?parent :child ?child]
?count-child |
---|
3 |
2 |
This clause starts with the keyword :where
followed by the constraints of a query. If using the map form, then use the key :where
with the elements in a sequence.
with-clause := ':where' (constraint | binding)+
constraint := pattern-constraint | or-constraint | and-constraint | not-constraint | optional-constraint | filter-constraint
or-constraint := '(or' constraint+ ')'
and-constraint := '(and' constraint+ ')'
not-constraint := '(not' constraint+ ')'
optional-constraint := '(optional' constraint+ ')'
filter-constraint := '[(' predicate arg* ')]'
pattern-constraint := '[' element [attribute-element [element]] ']'
binding := '[(' arg arg+ ')' variable ']'
attribute-element := element[ '*' | '+' ]
element := variable | scalar-value
arg := variable | scalar-value | symbol
scalar-value := [-][1-9][0-9]* | '"' .* '"' | keyword (i.e. numbers, strings or keywords)
keyword := ':' [label '/'] label
symbol := [label '/'] label
label := [ [a-zA-Z_-.?*!] ] [a-zA-Z0-9_-.?*!]* (note: not the quote character)
variable := '?'[A-Za-z0-9-_*]+ (a symbol starting with ?)
Formally speaking, every result of a query is based on setting the variables to those values where the constraints are valid.
In practice, this is done by selecting every set of allowed values, and removing those that are not allowed. Each set of allowed values for the variables is called a binding. The different types of constraints provide different mechanisms for selecting these bindings.
These form a pattern which will be matched against the default graph. The graph is made up of tuples which describe an "edge" as a connection between two nodes: node edge-label node
Pattern constraints have up to 3 elements which are either scalar values or variables. The constraint will match to every tuple in the graph that has the same scalar values in the same positions. The result is the variables of the constraint being bound to the values in the matching positions for each matching tuple.
Example: A graph has the following tuples:
[:pete :type :person]
[:pete :name "Peter"]
[:anne :type :person]
[:anne :name "Anne"]
[:ziggy :type :cat]
[:ziggy :name "Ziggy"]
The pattern:
[?p :type :person]
Will match 2 tuples:
[:pete :type :person]
[:anne :type :person]
The result is that ?p
gets 2 bindings of:
[[?p = :pete] [?p = :anne]]
The pattern:
[?p :name ?name]
Matches 3 tuples:
[:pete :name "Peter"]
[:anne :name "Anne"]
[:ziggy :name "Ziggy"]
The result is that ?p
and ?name
both get bindings:
[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"] [?p = :ziggy, ?name = "Ziggy"]]
Notice how these bindings appear in pairs. When ?p
is :anne
, then ?name
is "Anne"
. When ?p
changes, then so must ?name
. In this way, each binding forms a set.
Attributes (also called Properties, Predicates, or Edges) can refer to relationships between entities that are transitive. An example of a transitive relationship is "is smaller than". So if a mouse is smaller than a cat, and a cat is smaller than a goat, then the mouse is smaller than a goat. Similarly, if the goat is smaller than an elephant, then so is the mouse and the cat.
We can see this with some simple data:
(def data
[{:db/id -1 :name "Washington Monument"}
{:db/id -2 :name "National Mall"}
{:db/id -3 :name "Washington, DC"}
{:db/id -4 :name "USA"}
{:db/id -5 :name "Earth"}
{:db/id -6 :name "Solar System"}
{:db/id -7 :name "Orion-Cygnus Arm"}
{:db/id -8 :name "Milky Way Galaxy"}
{:db/id -9 :db/ident 9 :name "Arlington"}
{:db/id -10 :db/ident 10 :name "Falls Church"}
[:db/add -1 :is-in -2]
[:db/add -2 :is-in -3]
[:db/add -3 :is-in -4]
[:db/add -4 :is-in -5]
[:db/add -5 :is-in -6]
[:db/add -6 :is-in -7]
[:db/add -7 :is-in -8]
[:db/add -9 :neighbor -3]
[:db/add -10 :neighbor -9]])
We can see what "Washington Monument" is in:
=> (d/q '[:find [?name ...] :where [?e :name "Washington Monument"] [?e :is-in ?e2] [?e2 :name ?name]] (d/db conn))
("National Mall")
We can also follow that attribute transitively by appending the :is-in
attribute with a +
character:
=> (d/q '[:find [?name ...] :where [?e :name "Washington Monument"] [?e :is-in+ ?e2] [?e2 :name ?name]] (d/db conn))
("National Mall" "Washington, DC" "USA" "Earth" "Solar System" "Orion-Cygnus Arm" "Milky Way Galaxy")
To see everything starting at the Washington Monument, we want to follow the :is-in
attribute zero steps or more. This is done by appending a *
character to the attribute:
=> (d/q '[:find [?name ...] :where [?e :name "Washington Monument"] [?e :is-in* ?e2] [?e2 :name ?name]] (d/db conn))
("Washington Monument" "National Mall" "Washington, DC" "USA" "Earth" "Solar System" "Orion-Cygnus Arm" "Milky Way Galaxy")
We may also want want to follow any kind of attribute. To do this, use a variable attribute:
=> (d/q '[:find ?name :where [?e :name "Falls Church"][?e ?a* ?e2][?e2 :name ?name]] (d/db conn))
(["Falls Church"] ["Arlington"] ["Solar System"] ["Washington, DC"] ["USA"] ["Orion-Cygnus Arm"] ["Earth"] ["Milky Way Galaxy"])
As a special case, if you know 2 specific nodes in the system, you can ask for the path between them.
=> (d/q '[:find ?e . :where [?e :name "Falls Church"]] (d/db conn))
:tg/node-10736
=> (d/q '[:find ?e . :where [?e :name "Milky Way Galaxy"]] (d/db conn))
:tg/node-10734
=> (d/q '[:find ?a . :where [:tg/node-10736 ?a+ :tg/node-10734]] (d/db conn))
[:neighbor :neighbor :is-in :is-in :is-in :is-in :is-in]
Notice that this is a single binding value that contains a vector value!
Alternatively, the entity/value ends of the constraint can be defined earlier in the query, but this is not necessarily guaranteed without specifying the order of operations. Typically, if both ends of the transitive constraint are not bound, then only the first step will be returned:
=> (d/q '[:find ?a ?name :where [?e :name "Falls Church"][?e2 :name ?name][?e ?a* ?e2]] (d/db conn))
([nil "Falls Church"]
[:neighbor "Arlington"]
[:neighbor "Solar System"]
[:neighbor "Washington, DC"]
[:neighbor "USA"]
[:neighbor "Orion-Cygnus Arm"]
[:neighbor "Earth"]
[:neighbor "Milky Way Galaxy"])
However, if the order is enforced (using the :user
query planner) to ensure that the transitive attribute is evaluated last, then full paths are returned:
=> (d/q '[:find ?a ?name :where [?e :name "Falls Church"][?e2 :name ?name][?e ?a* ?e2]] (d/db conn) :planner :user)
([[:neighbor :neighbor :is-in :is-in :is-in :is-in] "Orion-Cygnus Arm"]
[[:neighbor :neighbor :is-in :is-in :is-in] "Solar System"]
[[:neighbor :neighbor :is-in :is-in] "Earth"]
[[:neighbor :neighbor] "Washington, DC"]
[[:neighbor :neighbor :is-in] "USA"]
[[:neighbor :neighbor :is-in :is-in :is-in :is-in :is-in] "Milky Way Galaxy"]
[[:neighbor] "Arlington"])
The _
symbol can be used in a pattern constraint to indicate a position where the result is "don't care". It acts the same as a variable, but the result is ignored.
If a pattern constraint has fewer than 3 elements, then this is the equivalent to a full 3-element constraint where the remaining elements are _
. The following two constraints are exactly equivalent:
[?e :tg/entity]
[?e :tg/entity _]
This indicates that the first position binds values to the variable ?e
, while the second must match :tg/entity
. The third position can be anything.
Similarly, the following 2 patterns are identical:
[?e]
[?e _ _]
This will return every value that forms the start of an edge in the graph.
There are also 2 special types of bindings that can be created by a pattern constraint: an empty binding, and a universal binding.
The empty bindings occurs when no tuples match. For instance, the binding [?p :type :dog]
results in the empty seq: []
.
Once an empty bindings appears in a conjunctive join expression, then the total result will be empty.
The universal bindings occurs when a pattern matches without any variables being bound. For instance [:ziggy :name "Ziggy"]
matches exactly one tuple. This creates a single valid binding, despite no variables appearing: [[]]
This value is a seq that is not empty, despite not containing any data.
A conjunctive join of any expression to a universal binding is itself, meaning that a join to a conjunctive binding is an identity operation.
The OR Constraint is also called a Disjunction. This type of constraint is made of a list of other constraints. It takes the results of each of the other constraints and forms a union. The variables bound by each of the component constraints must all be the same.
Example: Using the data above, the constraint:
(or [?p :type :cat] [?p :name "Anne"])
Results in bindings of:
[[?p = :ziggy] [?p = :anne]]
The AND Constraint is also called a Conjunction. This type of constraint is made of a list of other constraints. The result is an inner join of constraints, where matching variables between constraint bindings must match to be considered in the final result. For each match, all the variables from each of the component constraints get bound. If there are no matching variables, then the result is an outer product.
Example: Using the data above, the constraint:
(and [?p :type :cat] [?p :name ?name])
Specifies a conjunction between bindings of:
[[?p = :ziggy]]
and
[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"] [?p = :ziggy, ?name = "Ziggy"]]
These constraints share the variable ?p
which is only equal for a single value: [[?p = :ziggy, ?name = "Ziggy"]]
Note: The top level :where
clause has an implicit AND constraint on it.
The NOT constraint is made of a list of other constraints, and should be used in a conjunction against other constraints. It performs a conjunction on its arguments to create a binding, and this is used to remove bindings from the outer conjunction.
Example:
[?p :name ?name] (not [?p :type :cat])
This is a compound constraint, formed by an implicit conjunction between a pattern constraint and a NOT constraint. The pattern constraint matches the three :name
tuples, leading to the bindings:
[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"] [?p = :ziggy, ?name = "Ziggy"]]
Meanwhile, the pattern inside the NOT constraint leads to the bindings: [[?p = :ziggy]]
. This matches the other bindings with the ?p
variable, so every bindings where there is a matching ?p
will be removed. The result is:
[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"]]
This is similar to the AND
constraint. The OPTIONAL
expression contains a series of constraints with create bindings as normal. However, the difference is that bindings are not required for a result to be returned. This is referred to as a Left Outer Join in relational databases.
Example: A graph has the following tuples:
[:pete :type :person]
[:pete :name "Peter"]
[:pete :email "peter@example.com"]
[:anne :type :person]
[:anne :name "Anne"]
[:anne :email "anne@ex.net"]
[:ziggy :type :cat]
[:ziggy :name "Ziggy"]
The pattern sequence:
[?p :name ?name] [?p :email ?email]
Results in the bindings:
[[?p = :pete, ?name = "Peter", ?email = "peter@example.com"]
[?p = :anne, ?name = "Anne", ?email = "anne@ex.net"]]
This does not include any information about Ziggy because he doesn't have an email address. This can be addressed with the pattern sequence:
[?p :name ?name] (optional [?p :email ?email])
This results in the bindings:
[[?p = :pete, ?name = "Peter", ?email = "peter@example.com"]
[?p = :anne, ?name = "Anne", ?email = "anne@ex.net"]
[?p = :ziggy, ?name = "Ziggy", ?email = nil]]
This kind of constraint runs code against bound variables. Whenever the code returns a false
or nil
value, then the associated bindings are removed.
Example:
[?p :name ?name] [(re-find #"e" ?name)]
This is a regular expression search that looks for the letter "e". The ?name
variable only contains an "e" when it is set to "Peter" or "Anne". So the result is:
[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"]]
There are a few things to note about the functions:
- All functions in
clojure.core
are available. - Functions may not be nested.
- In Clojure, any function in scope can be accessed. There have been inconsistency issues using using
eval
on ClojureScript, so most functions are not directly accessible. - The function can be a variable. This means that a novel function can be passed in via the
:in
clause.
A Binding is not actually a constraint. Instead, it binds a new value based on the output of a Clojure/ClojureScript expression. The expressions follow the same rules as for Filter Constraints. Once a new variable has been bound, other constraints and bindings can refer to this variable
Example:
[?p :name ?name] [(count ?name) ?name-length]
After the first pattern has been resolved, the bindings are:
[[?p = :pete, ?name = "Peter"] [?p = :anne, ?name = "Anne"] [?p = :ziggy, ?name = "Ziggy"]]
The binding expression adds a new variable to each binding:
[[?p = :pete, ?name = "Peter", ?name-length = 5] [?p = :anne, ?name = "Anne", ?name-length = 4] [?p = :ziggy, ?name = "Ziggy", ?name-length = 5]]
Start by loading some example movie data:
(require '[asami.core :as d :refer [q]])
(def db-uri "asami:mem://movie")
(d/create-database db-uri)
(def conn (d/connect db-uri))
(def first-movies [{:movie/title "Explorers"
:movie/genre "adventure/comedy/family"
:movie/release-year 1985}
{:movie/title "Demolition Man"
:movie/genre "action/sci-fi/thriller"
:movie/release-year 1993}
{:movie/title "Johnny Mnemonic"
:movie/genre "cyber-punk/action"
:movie/release-year 1995}
{:movie/title "Toy Story"
:movie/genre "animation/adventure/comedy"
:movie/release-year 1995
:movie/sequel "Toy Story 2"}
{:movie/title "Sense and Sensibility"
:movie/genre "drama/romance"
:movie/release-year 1995}])
@(d/transact conn {:tx-data first-movies}) ;; dereference with @ to ensure the transaction completes
(def db (d/db conn))
Simple pattern constraints. Get the names of movies released in 1995:
=> (q '[:find ?name :where [?m :movie/release-year 1995][?m :movie/title ?name]] db)
(["Johnny Mnemonic"] ["Toy Story"] ["Sense and Sensibility"])
Projecting a single variable as a sequence. Get the names of movies released in 1995:
=> (q '[:find [?name ...] :where [?m :movie/release-year 1995][?m :movie/title ?name]] db)
("Johnny Mnemonic" "Toy Story" "Sense and Sensibility")
Filter constraint. Get the names and years of movies released after 1990:
=> (q '[:find ?name ?year :where [?m :movie/release-year ?year][?m :movie/title ?name][(> ?year 1990)]] db)
(["Demolition Man" 1993] ["Johnny Mnemonic" 1995] ["Toy Story" 1995] ["Sense and Sensibility" 1995])
Filter, using external data. Get the names of movies that include "comedy". This needs a regex, but the parser has trouble with this right now, so just pass it as an extra parameter:
=> (q '[:find [?name ...] :in $ ?re :where [?m :movie/title ?name][?m :movie/genre ?genre][(re-find ?re ?genre)]] db #"comedy")
("Explorers" "Toy Story")
Single return value. What is the one movie released in 1985?
=> (q '[:find ?name . :where [?m :movie/title ?name][?m :movie/release-year 1985]] db)
"Explorers"
Providing bindings for a variable. What are the movies released in the following list of years: 1985, 1990, 1993
=> (q '[:find [?name ...] :in $ [?year ...] :where [?m :movie/title ?name][?m :movie/release-year ?year]] db [1985 1990 1993])
("Explorers" "Demolition Man")
OR constraint. What are the movies released in 1993 or 1995:
=> (q '[:find [?name ...] :where [?m :movie/title ?name] (or [?m :movie/release-year 1993] [?m :movie/release-year 1995])] db)
("Demolition Man" "Johnny Mnemonic" "Toy Story" "Sense and Sensibility")
NOT constraint, Map form. 1985 wasn't a very funny year. What are the comedies that were not in 1985? The query is getting long, so break it up and pass the clauses in a map.
=> (q '{:find [[?name ...]]
:in [$ ?comedy]
:where [[?m :movie/title ?name]
[?m :movie/genre ?genre]
[(re-find ?comedy ?genre)]
(not [?m :movie/release-year 1985])]}
db #"comedy")
("Toy Story")
OPTIONAL constraint. What are the movies released in 1995, and their sequels, if they have one:
=> (q '[:find ?name ?sequel :where [?m :movie/title ?name] [?m :movie/release-year 1995] (optional [?m :movie/sequel ?sequel])] db)
(["Toy Story" "Toy Story 2"] ["Sense and Sensibility" nil] ["Johnny Mnemonic" nil])
By default, Asami will try to determine the best order to execute constraint, in order to minimize process time and memory. This is based on both the structure of the query, and the contents of the storage. However, there are occasions when a query may execute in an unexpected order.
In order to see the execution strategy of a query, use the show-plan
function. This accepts identical parameters to the q
query function:
=> (d/show-plan '[:find ?name ?sequel :where [?m :movie/title ?name] [?m :movie/release-year 1995] (optional [?m :movie/sequel ?sequel])] db)
{:plan ([?m :movie/release-year 1995] (optional [?m :movie/sequel ?sequel]) [?m :movie/title ?name])}
The returned value is structured Clojure data.
The plan can be overridden by providing an option to select the :user
planner:
=> (d/show-plan '[:find ?name ?sequel :where [?m :movie/title ?name] [?m :movie/release-year 1995] (optional [?m :movie/sequel ?sequel])] db :planner :user)
{:plan ([?m :movie/title ?name] [?m :movie/release-year 1995] (optional [?m :movie/sequel ?sequel]))}