Permalink
Fetching contributors…
Cannot retrieve contributors at this time
754 lines (614 sloc) 25.6 KB

Contents

Namespace: thi.ng.fabric.facts.dsl

Introduction

Whilst the query vertex tree functionality provided by the fabric-facts core module is powerful, it’s somewhat “low-level”, imperative and verbose to use. Therefore, this namespace provides a more userfriendly, succinct “high level” layer to construct query trees in a completely declarative way and exposes further functionality, like an expression language to define result filters, query variable projection, variable injections, aggregations, sorting etc.

The overall approach to define queries is based on the idea of matching patterns in the graph, usually by specifying a mixture of qvars (query variables) and fixed literals and/or other constraint expressions. The query spec DSL is highly inspired by the W3C SPARQL language, albeit expressed in a more Clojuresque way and not limited to RDF semantics.

Query specs are described as standard Clojure maps and therefore offer all the usual benefits including composability & EDN serialization. The latter therefore also allows query specs to be stored in the graphs as part of facts and then be able to define queries returning query specs (e.g. for UI templating/widgets), thus enabling a completely dynamic and datadriven approach to design applications/UIs.

The expression language part of the DSL is responsible to compile variable lookups and result processing expressions into actual Clojure functions. The language is extensible via its multi-method implementation. See section further below for details.

Public API

The only public part of this namespace consists of this single function:

(defn add-query-from-spec!
  [g spec]
  (-> (compile-query g spec)
      (compile-query-result g spec)))

Examples

Dozens of examples can be found in the test namespace for this module. For a general overview of query patterns and supported query types, please also see the Queries section in the thi.ng.fabric.facts.core namespace.

(dsl/add-query-from-spec!
 graph
 '{:q         [{:where [[?res schema ?schema] [?res type ?type]]}
               {:optional [[?res title ?title]]}]
   :filter    (or (= ?res toxi) (= ?type person))
   :group-by  [?type ?res]
   :aggregate {?num (agg-count ?res)}
   :order     ?type})

Via the thi.ng.fabric.facts.queryviz namespace, such query specs can also be visualized (using Graphviz). The following diagram is a visual representation of the query above:

../../assets/qviz-dsl-ex01a.png

Query specification

A query spec map can have the following toplevel keys. Only the =:q= key is required, all others are optional.

KeyPossible valuesDefaultDescription
:qVectornoneVector of sub-queries (see below)
:aggregateaggregation fn / DSL exprnoneInject new qvars of aggregated values
:bind{?qvar fn}noneInject new qvars in result maps
:filterpredicate fn / DSL exprnoneKeep only result maps for which fn returns truthy
:group-byfn, qvar, qvar vec, DSL exprnoneGrouping criteria for results, causes result map
:limitpositive intnoneLimit results to max. N items
:orderfn, qvar, qvar vec, DSL exprnoneSort results by given key fn
:select:* or [?a ?b ..]:*Keep all or only given qvars in result maps
:uniquetrue / falsefalseEnsure all qvars in each result map have unique vals
:values{?qvar #{v1 v2} …}nonePre-bind qvar to set of possible values

Sub-query types

The DSL supports the following sub-query types. Each type (apart from :path) takes a number of query patterns and always computes their joined result set. The sub-query types only differ in how each sub-query’s results are merged with those of others. Each sub-query can also specify a number of options to modify its result set. These options are described in this section.

:where - join queries

Standard join with other sub-query results.

;; match any two people knowing eachother
;; but only if we know both of their nicknames too
{:q [{:where [[?a knows ?b]
              [?a nickname ?anick]
              [?b nickname ?bnick]]}]}

:optional - optional join queries

Optional join with other sub-query results. Since on the RHS is the optional part in this join type, this type should not be used as first sub-query in a main spec.

Examples:

;; match people acquaintances
;; optionally match 1st person's nickname and/or age
{:q [{:where [[?a knows ?b]]}
     {:optional [[?a nickname ?nick]]}
     {:optional [[?a age ?age]]}]}

Important: Note the use of two separate :optional sub-queries. If both the optional ?nick and ?age where part of the same sub-query, then only ?a’s with both ?nick and ?age would be matched. By keeping them separate, this query will also match ?a’s with only ?nick or ?age values.

:union - query unions

A query union simply merges its own result maps with that of other sub-queries. This query type too should not be used as first sub-query in a main spec.

Important: This type is only meant to merge distinct result sets without any shared qvars!

Examples:

;; return any "knowing" and friend relationships
{:q [{:where [[?a knows ?b]]}
     {:union [[?e friend ?f]]}]}

:minus - query negation (pattern based result removal)

;; remove matching triples from existing result set:
;; first match all "knows" relationships, but
;; remove any results where ?b is bill's friend
{:q [{:where [[?a knows ?b]]}
     {:minus [[?b friend bill]]}]}

:path - bounded path queries

A sub-query of this type takes a single path query pattern and options defining the accepted min/max path lengths:

Examples:

;; match grandparent rels
{:q [{:path [?gp [parent-of parent-of] ?gc]}]}

;; match ancestor rels w/ path lens 2-4
{:q [{:path [?a [parent-of] ?b] :min 2 :max 4}]}

;; match friends of children (and optionally their friends)
{:q [{:path [?p [parent-of friend friend] ?f] :min 2 :max 3}]}

Query & sub-query options

Unless noted otherwise, query options have the same semantics regardless if used at the top-level or for individual sub-queries. In the latter case, of course they only apply to a single sub-query.

Use of these modifiers can have a positive impact on down-stream joins and other result processing performance and memory requirements. However, be careful to not overconstrain intermediate results.

Furthermore, sub-queries with options are not re-usable by other queries. So in cases where there’re many similar queries, it is recommended to only apply top-level options, manipulating the final result set of a query.

:aggregate

TODO (see tests for various examples)

:bind

Inject (or overwrite) qvars in result maps (e.g. to format dates, concatenate strings etc.). The value of this option must be a map of {?qvar expression..}

;; find all persons and their first/surnames
;; inject new qvar ?name (string concat) into results
;; only return person & name
{:q [{:where [[?p firstname ?fn]
              [?p surname ?sn]]
      :bind {?name (str ?fn " " ?sn)}
      :select [?p ?name]}]}

:filter

A single DSL expression or Clojure predicate fn applied to each single result (map). The result is only kept if the filter is truthy.

;; global & local filters
;; find procedures created by any of the 3 authors before given date
{:q [{:where [[?p type procedure] [?p date-created ?date]]}
     {:optional [[?p author ?a]] :filter (in-set? ?a alice bob charlie)}]
 :filter (< ?date #inst "2015-09-01")}

:limit

Limit the result set to N items.

;; find first underage person and (optionally his/her name)
{:q [{:where [[?a type person] [?a age ?age]] :filter (< ?age 21) :limit 1}
     {:optional [[?a name ?n]]}]}

:select

Select qvars to be projected into result set. This is also useful for large join queries w/ various intermediate qvars, not required to be retained outside the sub-query.

;; find books in price range w/ their authors and topics (via category)
;; don't include price and category in results
{:q [{:where [[?a author ?b]
              [?b type book]
              [?b price ?p]]
      :filter (and (>= ?p 10) (<= ?p 20))
      :select [?a ?b]}
     {:where [[?b category ?cat]
              [?cat includes-topic ?t]]
      :select [?b ?t]}]}

:unique

The query engine ensures that - within a single query pattern - unique qvars contain unique values (see fact-verifier description in core namespace). However, when using query joins, unions, aggregations, qvars injections etc. this behavior is not guaranteed, unless explicitly enabled by setting this option key to true. If enabled, each result map’s qvars will have unique values (or the result map is removed if not). For most cases, this can be ignored, but this feature is essential for some queries, as shown below.

;; given these facts...

[a knows b]
[b knows a]
[b knows c]
[c knows b]

;; ...this query...
{:q [{:where [[?x knows ?y] [?y knows ?z]]}] :select [?x ?z]}

;; ...produces these results:
#{{?x b, ?z b}
  {?x a, ?z a}
  {?x a, ?z c}  ;; unique
  {?x c, ?z a}  ;; unique
  {?x c, ?z c}}

;; Without :unique flag this query produces results where ?x & ?z are
;; bound to same values, which is not what we want in this case!

;; Same query with :unique...
{:q [{:where [[?x knows ?y] [?y knows ?z]]}] :select [?x ?z] :unique true}

;; ...produces the correct results:
#{{?x a, ?z c} {?x c, ?z a}}

:values

Pre-constrains qvars to a set of possible values. Unlike the :filter option, which applies to results and allows a lot of flexibility through the use of expression syntax, specifying :values acts as pre-filter and is applied to index selection vertices, preceeding the query vertex in terms of data flow. Only literal values are supported for this option. Also, queries specifying this option are not re-usable, but in most circumstances are faster than post-filtered queries, since the initial set of matching facts is potentially much smaller.

;; query based on :filter example above
;; qvar ?a is pre-bound to values in given set
{:q [{:where [[?p type procedure] [?p date-created ?date]]}
     {:optional [[?p author ?a]]}]
 :values {?a #{alice bob charlie}}}

Implementation

(defn compile-query-option
  [expr] (if (fn? expr) expr (compile-expr expr)))

(defn compile-result-order
  [order]
  (if (fn? order)
    order
    (if (sequential? order)
      (fn [r] (reduce #(conj % (get r %2)) [] order))
      (fn [r] (get r order)))))

(defn compile-query-bindings
  [binds]
  (let [binds (reduce-kv
               (fn [acc k v] (assoc acc k (compile-expr v)))
               {} binds)]
    (fn [res]
      (let [res'(reduce-kv
                 (fn [acc k f]
                   (assoc acc k (f acc)))
                 res binds)]
        (info :pre-bind res)
        (info :post-bind res')
        res'))))

(defn compile-result-aggregation
  [agg]
  (let [agg (reduce-kv
             (fn [acc k v] (assoc acc k (compile-expr v)))
             {} agg)]
    (fn [results]
      (reduce-kv
       (fn [acc k afn]
         (assoc acc k (afn results)))
       {} agg))))

(defn compile-result-grouping
  [group]
  (if (fn? group)
    group
    (cond
      (and (sequential? group) (every? ff/qvar? group))
      (fn [r] (reduce #(conj % (get r %2)) [] group))

      (ff/qvar? group)
      (fn [r] (get r group))

      :else
      (compile-expr group))))

(defn sub-query-options
  [opts spec]
  (-> opts
      (select-keys [:filter :limit :select :unique :bind :values])
      (update :filter #(if % (compile-query-option %)))
      (update :bind #(if % (compile-query-bindings %)))
      (update :values #(or % (:values spec)))))

(defmulti compile-sub-query
  (fn [g parent q spec] (some #{:where :optional :union :minus :path} (keys q))))

(defmethod compile-sub-query :where
  [g parent q spec]
  (let [pat  (:where q)
        opts (sub-query-options q spec)
        q    (if (< 1 (count pat))
               (ff/add-query-join! g (:transform spec) pat opts)
               (ff/add-param-query! g (:transform spec) (first pat) opts))]
    (if parent
      (ff/add-join! g parent q {})
      q)))

(defmethod compile-sub-query :optional
  [g parent q spec]
  (let [pat  (:optional q)
        opts (sub-query-options q spec)
        q    (if (< 1 (count pat))
               (ff/add-query-join! g (:transform spec) pat opts)
               (ff/add-param-query! g (:transform spec) (first pat) opts))]
    (if parent
      (ff/add-join! g ff/join-optional parent q {})
      q)))

(defmethod compile-sub-query :union
  [g parent q spec]
  (let [pat  (:union q)
        opts (sub-query-options q spec)
        q    (if (< 1 (count pat))
               (ff/add-query-join! g (:transform spec) pat opts)
               (ff/add-param-query! g (:transform spec) (first pat) opts))]
    (if parent
      (ff/add-query-union! g [parent q] {})
      q)))

(defmethod compile-sub-query :minus
  [g parent q spec]
  (let [pat  (:minus q)
        opts (sub-query-options q spec)
        q    (if (< 1 (count pat))
               (ff/add-query-join! g (:transform spec) pat opts)
               (ff/add-param-query! g (:transform spec) (first pat) opts))]
    (if parent
      (ff/add-query-negation! g parent q {})
      q)))

(defmethod compile-sub-query :path
  [g parent q spec]
  (let [opts (merge (select-keys q [:min :max]) (sub-query-options q spec))
        q    (ff/add-path-query! g (:transform spec) (:path q) opts)]
    (if parent
      (ff/add-join! g parent q {})
      q)))

(defn compile-query
  [g {:keys [q] :as spec}]
  (let [spec (update spec :transform #(or % (ff/fact-transform g)))]
    (reduce
     (fn [acc sq] (compile-sub-query g acc sq spec))
     (compile-sub-query g nil (first q) spec)
     (rest q))))

(defn compile-query-result-spec
  [spec]
  (cond-> spec
    (:bind spec)      (update :bind compile-query-bindings)
    (:filter spec)    (update :filter compile-query-option)
    (:order spec)     (update :order compile-result-order)
    (:group-by spec)  (update :group-by compile-result-grouping)
    (:aggregate spec) (-> (assoc :aggregate* (:aggregate spec))
                          (update :aggregate compile-result-aggregation))))

(defn compile-query-result
  [result g spec]
  (ff/add-query-result! g (compile-query-result-spec spec) result))

Expressions

Expressions in the DSL are defined as standard S-Expressions and in general follow the same rules as in Clojure/Lisp. However, due to the context in which they’re used, there’re slightly different evaluation semantics. Furthermore, the DSL does not differentiate between lists or vectors, so it’s valid to use either (expr (expr)) or [expr [expr]]. Using S-Expressions allows users to serialize query specs as EDN and the multi-method based implementation provides an easy-to-use mechanism to extend the DSL with more functionality.

Overview & examples

A simple query filter might state (< ?age 21) (or [< ?age 21]). As in standard Clojure, this is a predicate producing a boolean result. However, the DSL compiles this expression to this Clojure code:

(fn [res] (< ((fn [r] (r '?age)) res) ((fn [_] 21) res)))

This indirection is needed because the qvar ?age is not bound to a fixed value (remember, the result set of a query is a multi-set) and for a :filter expression this compiled function is applied to each individual result map in the set:

(def results '#{{?age 23 ?name "alice"} {?age 18 ?name "bob"}})

(filter (compile-expr '(< ?age 21)) results)
;; ({?age 18, ?name "bob"})

The compile-expr multi-method is used recursively to analyze and compile a given DSL (sub)expression and (if possible) delegates to matched implementations:

  • ::const - equivalent to identity (default catch-all)
  • ::qvar - query variable lookup function
  • ::unary - special overrides for single arity fns
  • ::binary - special overrides for 2-arity fns
  • ::varargs - fns with variable length arity

Type checking

The last 3 dispatch types above can make use of basic type checking of their arguments and will return nil if any of the args violates the type check. If type checking is disabled (e.g. see str below), any argument type can be used.

Polymorphic

Comparisons have been implemented using c.c/compare to allow not just comparing of numbers, but also strings, dates etc. This is only safe to use though, if it’s known that both arguments are of the same type.

Implementation

(def vararg-ops
  {'+    [number? +]
   '-    [number? -]
   '*    [number? *]
   '/    [number? /]
   '=    [nil =]
   'not= [nil not=]
   'str  [nil str]})

(def unary-ops
  {'not   [nil not]
   'int   [number? int]
   'float [number? double]
   'abs   [number? #(if (neg? %) (- %) %)]
   'sqrt  [number? #(Math/sqrt (double %))]
   'exp   [number? #(Math/exp (double %))]
   'sin   [number? #(Math/sin (double %))]
   'asin  [number? #(Math/asin (double %))]
   'cos   [number? #(Math/cos (double %))]
   'acos  [number? #(Math/acos (double %))]
   'tan   [number? #(Math/tan (double %))]
   'atan  [number? #(Math/atan (double %))]
   'floor [number? #(long (Math/floor (double %)))]
   'ceil  [number? #(long (Math/ceil (double %)))]
   'round [number? #(Math/round (double %))]})

(def binary-ops
  {'<     [nil nil #(neg? (compare % %2))]
   '>     [nil nil #(pos? (compare % %2))]
   '<=    [nil nil #(<= (compare % %2) 0)]
   '>=    [nil nil #(>= (compare % %2) 0)]
   'pow   [number? number? #(Math/pow (double %) (double %2))]
   'atan2 [number? number? #(Math/atan2 (double %) (double %2))]
   'logn  [number? number? #(/ (Math/log (double %)) (Math/log (double %2)))]})

(defmulti compile-expr
  (fn [expr]
    (cond
      (sequential? expr) (let [op (first expr)]
                           (cond
                             (vararg-ops op) ::varargs
                             (unary-ops op)  ::unary
                             (binary-ops op) ::binary
                             :else           op))
      (ff/qvar? expr)    ::qvar
      :else              ::const)))

(defmethod compile-expr ::const
  [const] (fn [_] const))

(defmethod compile-expr ::qvar
  [qvar] #(% qvar))

(defmethod compile-expr ::varargs
  [[op & more]]
  (let [[check op] (vararg-ops op)
        args (mapv compile-expr more)]
    (if check
      (fn [res]
        (let [args' (sequence (comp (map #(% res)) (filter identity)) args)]
          (when (every? check args')
            (apply op args'))))
      (fn [res]
        (apply op (sequence (comp (map #(% res)) (filter identity)) args))))))

(defmethod compile-expr ::unary
  [[op x]]
  (let [[check op] (unary-ops op)
        x (compile-expr x)]
    (if check
      (fn [res] (let [x' (x res)] (when (check x') (op x'))))
      (fn [res] (op (x res))))))

(defmethod compile-expr ::binary
  [[op x y]]
  (let [[checkx checky op] (binary-ops op)
        x (compile-expr x)
        y (compile-expr y)]
    (cond
      (and checkx checky) (fn [res]
                            (let [x' (x res) y' (y res)]
                              (when (and (checkx x') (checky y')) (op x' y'))))
      checkx              (fn [res]
                            (let [x' (x res) y' (y res)]
                              (when (checkx x') (op x' y'))))
      checky              (fn [res]
                            (let [x' (x res) y' (y res)]
                              (when (checky y') (op x' y'))))
      :else               (fn [res] (op (x res) (y res))))))

(defmethod compile-expr 'and
  [[_ & more]]
  (let [args (mapv compile-expr more)]
    (fn [res] (every? #(% res) args))))

(defmethod compile-expr 'or
  [[_ & more]]
  (let [args (mapv compile-expr more)]
    (fn [res] (some #(% res) args))))

(defmethod compile-expr 'match
  [[_ re x]]
  (let [re (if (regexp?* re) re (re-pattern re))
        x  (compile-expr x)]
    (fn [res] (let [x' (x res)] (when (string? x') (re-find re x'))))))

(defmethod compile-expr 'in-set?
  [[_ x & more]]
  (let [x       (compile-expr x)
        choices (mapv compile-expr more)]
    (fn [res] (let [x' (x res)] (some #(= (% res) x') choices)))))

Aggregation expressions

These expressions are used to compute reductions and inject them as new qvars into the result set. Unlike the other expressions above, aggregation fns are provided the full result set (instead of just single results).

Important: For queries making use of :group-by and :aggregate, the aggregation fns are applied to each group separately, thus only receiving sub-sets of the full result set. A decision had to be made which of the two options is processed earlier and :group-by made sense to take precedence in most use cases.

(defn aggregation-with
  [op x]
  (let [x  (compile-expr x)
        tx (comp (map x) (filter identity))]
    (fn [results]
      (when (seq results)
        (transduce tx op results)))))

(def min* (fn ([] nil) ([x] x) ([x y] (if x (min x y) y))))
(def max* (fn ([] nil) ([x] x) ([x y] (if x (max x y) y))))

(defmethod compile-expr 'agg-sum
  [[_ x]]
  (aggregation-with + x))

(defmethod compile-expr 'agg-min
  [[_ x]]
  (aggregation-with min* x))

(defmethod compile-expr 'agg-max
  [[_ x]]
  (aggregation-with max* x))

(defmethod compile-expr 'agg-avg
  [[_ x]]
  (let [x  (compile-expr x)
        tx (comp (map x) (filter identity))]
    (fn [results]
      (let [res (sequence tx results)]
        (when (seq res)
          (double (/ (reduce + res) (count res))))))))

(defmethod compile-expr 'agg-mean
  [[_ x]]
  (let [x  (compile-expr x)
        tx (comp (map x) (filter identity))]
    (fn [results]
      (let [res (sequence tx results)]
        (nth (sort res) (bit-shift-right (count res) 1) nil)))))

(defmethod compile-expr 'agg-collect
  [[_ x]]
  (let [x  (compile-expr x)
        tx (comp (map x) (filter identity))]
    (fn [results] (into #{} tx results))))

(defmethod compile-expr 'agg-count
  [[_ x]]
  (if x
    (let [collect (compile-expr ['agg-collect x])]
      (fn [results] (count (collect results))))
    (fn [results] (count results))))

Grouping expressions

(defmethod compile-expr 'group-bins-of
  [[_ x n]]
  (let [x (compile-expr x)]
    (fn [res] (* (Math/floor (/ (x res) n)) n))))

Helper functions

(defn regexp?*
  [x] #?(:clj (= java.util.regex.Pattern (type x)) :cljs (regexp? x)))

Complete namespace definition

(ns thi.ng.fabric.facts.dsl
  #?@(:clj
      [(:require
        [thi.ng.fabric.core :as f]
        [thi.ng.fabric.facts.core :as ff]
        [clojure.set :as set]
        [clojure.core.async :as a :refer [go go-loop chan close! <! >! alts! timeout]]
        [taoensso.timbre :refer [debug info warn]])]
      :cljs
      [(:require-macros
        [cljs.core.async.macros :refer [go go-loop]]
        [cljs-log.core :refer [debug info warn]])
       (:require
        [thi.ng.fabric.core :as f]
        [thi.ng.fabric.facts.core :as ff]
        [clojure.set :as set]
        [cljs.core.async :refer [chan close! <! >! alts! timeout]])]))

<<helpers>>

<<expr>>

<<dsl>>

<<dsl-public>>