Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL Execution Sequence #21

Open
turnguard opened this issue Apr 3, 2019 · 7 comments
Open

SPARQL Execution Sequence #21

turnguard opened this issue Apr 3, 2019 · 7 comments
Labels
query Extends the Query spec

Comments

@turnguard
Copy link

SPARQL execution sequence is mostly decided by the execution engine and query optimizations that vary from engine to engine..

in general there are only a couple of features that allow for influencing the execution sequence, like VALUES, SERVICE and a Subquery.

this request splits into the following:

  1. clear sequence what must be executed first: SERVICE, Subquery..
  2. execution hints, with which the author of the query can override the engine's execution order or optimization..
@jindrichmynarz
Copy link

I use sub-queries if I require a particular execution order and needs to avoid reordering in query optimization. This is verbose, but workable. It might be acceptable since requiring a specific execution order is not a common case. However, I would be interested in other ideas on how to declare or influence the execution order.

@tfrancart
Copy link

I want to have scenarios like fetch the labels/latitude/longitude/whatever of these remote URIs referenced in my local data in this remote database, e.g. :

SELECT ?person ?countryLabel
WHERE {
  ?person ex:livesIn ?countryDBpedia .
  SERVICE <http://dbpedia.org/sparql> {
    ?countryDBPedia rdfs:label ?countryLabel
  }
}

AS SPARQL 1.1 is currently defined, the SERVICE clause executes first, which returns millions of labels, and is then joined with the local criteria, which of course makes the query unusable.

@cygri
Copy link

cygri commented Apr 3, 2019

clear sequence what must be executed first: SERVICE, Subquery

I think this is a bad idea. Most of the time, users don't want to think about query execution plans and want to leave it to the engine to choose a good plan. If the execution plan is fixed by the spec, then the user has to optimise every query manually.

execution hints, with which the author of the query can override the engine's execution order or optimization

Yes! In those cases where getting the plan right really matters, and the engine is not up to the job, the user should have a way to override it. Some engines already provide this, e.g., BlazeGraph query hints.

@VladimirAlexiev
Copy link
Contributor

@tfrancart where do you read in the spec that "SERVICE clause executes first"?

https://www.w3.org/TR/sparql11-federated-query/#values gives an example where the outer query provides bindings and then they are passed to the inner federated query as VALUES:

SELECT * {?s foaf:knows ?o } VALUES (?s) { (:a) (:b) }

It almost seems to me that the postfix form of VALUES (i.e. SELECT {...} VALUES) is created to facilitate such implementation. The bound values are appended to the inner query without having to mess with its internal structure.
@afs, @ericprud, @kasei can you confirm or deny?

@jindrichmynarz where do you read in the spec that subqueries must be executed in the textual order given?

@cygri
Copy link

cygri commented Apr 3, 2019

SPARQL engines are free to do whatever they want in whatever order, as long as the result is correct. @tfrancart is mistaken when he says that the 1.1 spec requires execution of SERVICE before triple patterns. Using any sort of syntax (like subqueries) to “force” a particular execution order relies on the execution strategy of a specific implementation. This is often the best one can do, but is risky because execution strategies sometimes change with a new release of an implementation.

@tfrancart
Copy link

@VladimirAlexiev @cygri : correct, the SPARQL spec does not say that. Sorry. My own experience with triplestores tells me that their simple implementation is to execute SERVICE clause first, then join. This makes sense because it is easier to implement than executing local criterias first, then passing bindings to the SERVICE clause.

I would be happy to learn the "subquery-trick" that would allow to force the execution order in the scenario I described above, to garantee that the service clause uses the binding set result of the local query.

@jindrichmynarz
Copy link

jindrichmynarz commented Apr 3, 2019

@VladimirAlexiev What I meant with sub-queries is using the bottom-up execution order of nested sub-queries, not the textual order of adjacent sub-queries.

SELECT *
WHERE {
  {
    SELECT *
    WHERE {
      # execute first
    }
  }
  # execute second
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
query Extends the Query spec
Projects
None yet
Development

No branches or pull requests

6 participants