Skip to content

Better support query composition through datatype coercion & plugable backends #24

Description

@RickMoynihan

I really love what you've done flint. It's really fantastic, and almost exactly what I've been wanting for years.

However the main practical and design issue I have with it concerns the syntactic choices for RDF datatypes. I think instead of leveraging syntax for data literals and URIs, it should instead expect values which implement one (or maybe more) protocols. I believe the most notable example of this is the "<http://uri/>" syntax where strings are parsed by flint to see if they contain the characters < >and if so are treated as URIs. However there are others e.g. the map representation of{:en "foo"}` language tagged literals.

Where is this a problem

Firstly please forgive the hypothetical/simplified example (which here would likely be better served by a direct join in the query). Generally though this is not always possible.

For example; if I use the grafter library (a clojure wrapper over RDF4j) or Jena/RDF4j to dispatch a flint query and handle content negotiation and the parsing of RDF result types then I might have some code like this:

(require '[com.yetanalytics.flint :as f]
              '[grafter-2.rdf4j.repository :as repo])

(def repo (repo/sparql-repo "http://localhost:3030/foo/sparql"))

(def datasets (with-open [conn (repo/->connection repo)]
    (into [] (repo/query conn
                         (f/format-query '{:prefixes {:dc "<http://purl.org/dc/elements/1.1/>"
                                                      :qb "<http://purl.org/linked-data/cube#>"}
                                           :select [?dataset]
                                           :where [[?dataset a :qb/DataSet]]
                                         :pretty? true)))))

datasets ;; => [#object[java.net.URI 0x26b005d3 " http://statistics.gov.scot/data/employment"],,,]

Now imagine I want to feed that list of datasets into another query; perhaps binding a values clause... The grafter backend, has conveniently preserved the proper types of things (and where possible mapped them into clojure representations). What I want to do is leverage that knowledge, and compose/chain the queries together like this:

(with-open [conn (repo/->connection repo)]
    (into [] (repo/query conn
                         (f/format-query '{:prefixes {:dc "<http://purl.org/dc/elements/1.1/>"
                                                      :qb "<http://purl.org/linked-data/cube#>"}
                                           :select   [?compspec]
                                           :where    [[:values {?dataset datasets}]
                                                      [?dataset (cat :qb/structure :qb/component) ?compspec]]}

                                         :pretty? true))))

However I can't because flint demands my dataset URIs are strings of the form "<http://uri">, so I need to engage in the following kind of munging: (map #(str "<" % ">") datasets).

Additionally there are some correctness issues here which though rare; are increasingly likely when dealing with billions of triples. In these cases magic modes of interpretation like this make it much harder to debug because the RDF types do not map 1-1 to the types as they stand in flint.

Suggested solution

Define a protocol, something like ->sparql-string which backends can extend to arbitrary types. Separate RDF4j, Jena or Grafter backends could then extend their types to yield sparql strings for flint.

Flint could then maintain backwards compatibility by providing its own (optional) backend; which would implement these protocols with the same semantics you currently have e.g. lang-strings as maps could be implemented by doing extend-protocol java.util.Map.

It may also be worth considering passing ->sparql-string the :prefixes and appropriate other options e.g. a :base-uri to allow generating CURIs/QNames. e.g. "2022"^^xsd:gYear and the relativizing of URI's for prettier queries?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions