Datatype triple patterns #182

IS4Code · 2023-04-12T21:18:49Z

Why?

Presently, SPARQL triple syntax does not offer enough granularity when matching literals ‒ the object of a property can be specified in a triple pattern as (among others) a variable or a fixed literal, but it cannot be something in between. If you want to find triples based on the datatypes, you have to match all of them, and then filter:

?s ?p ?o .
FILTER (DATATYPE(?o) = xsd:integer)

A trivial SPARQL engine implementation would load all triples in existence into a set and then filter them based on the criteria.
It is possible, of course, to optimize this query and retrieve all matching triples as a single step, but not all engines might do that, and still this is something that could be expressed more concisely.

Proposed solution

I propose an extension of the node syntax like this:

?s ?p ?o^^xsd:integer .

Such a query would only match triples with the xsd:integer datatype, and bind ?o to the lexical value of the literal (as a simple literal/xsd:string).

Of course there are other options for the object:

# any literal with this lexical value, regardless of datatype (?dt is bound to IRI)
?s ?p "10"^^?dt .

# any literal (?o is simple literal, ?dt is IRI)
?s ?p ?o^^?dt .

I think this would add more expressiveness to the language, and a possibility for better optimizations to triple store that can index by the datatype and evaluate such queries efficiently.

Language-tagged literals

These are literals as well, but I am unsure whether they should be matched by a pattern like ?o^^?dt. However, if they are allowed, I think the natural solution (if not deemed too convoluted) might be to bind ?dt to a literal with the xsd:language datatype, essentially treating something like "hello world"@en as ""hello world"^^"en"^^xsd:language" (not a proposed syntax), binding ?o to "hello world" and ?dt to "en"^^xsd:language. After all, a language tag is the datatype of a language-tagged literal, at least syntactically.

Other SPARQL functions, such as STRDT, could be modified to allow either IRI or xsd:language as the datatype.

Considerations for backward compatibility

I don't think ^^ in this position could have been valid previously, so all existing valid queries should remain valid and unambiguous.

The text was updated successfully, but these errors were encountered:

namedgraph · 2023-04-12T21:23:25Z

Isn't rdfs:langString the datatype of language-typed literals?

IS4Code · 2023-04-12T21:33:11Z

Isn't rdfs:langString the datatype of language-typed literals?

Semantically yes, but it is not written in the triple. Such a thing should be produced only from something like "hello world"^^rdf:langString, not from an actual language-tagged literal.

Specifically, I was aiming for ?o^^?dt to produce something where STRDT(?o, ?dt) gives back the original object. With rdf:langString, information about the actual language tag is lost.

redmer · 2023-04-17T08:30:31Z

This proposal would make working with typed strings better, so I appreciate it. Combined with ?o@?lang to match the language tag, it would match both ways that (typed and language tagged) strings are available in SPARQL and Turtle. Then you'd use STRDT() to reconstitute a string from ?o^^?dt and STRLANG() with ?o@?lang. This was proposed in #17.

Having STRDT(?o, "en^^xsd:language) return "hello world"@en would be a (breaking?) change from SPARQL 1.1.

Keeping datatype(""@en) = ?dt = rdfs:langString the same would give fewer differences with the Turtle parsing rules and SPARQL's datatype(). It would indeed entail that ^^?dt "shadows" the language tag information. But with also a parallel literal syntax to match languages, this may not be a true problem.

Looking at other proposals, #34 suggest ?o@* to match all language tagged (≟ only rdfs:langStringdatatype) strings or?o^^* for all datatype tagged (≟ all non-xsd:string and/or all non-rdfs:langString`). #112 looks only tangentially related.

I think that would result in, for the following data, the bindings as in the following table:

:s :p "Hello, world!"@en .

SPARQL	?o	?dt	?lang
`:s :p ?o`	`"Hello, world!"@en`
`:s :p ?o^^?dt`	`"Hello, world!"`	`rdfs:langString`
`:s :p ?o^^rdfs:langString`	`"Hello, world!"`
`:s :p ?o@?lang`	`"Hello, world!"`		`"en"`
`:s :p ?o@en`	`"Hello, world!"`
? `:s :p ?o^^?dt , ?o@?lang`	`"Hello, world!"`	`rdfs:langString`	`"en"`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datatype triple patterns #182

Datatype triple patterns #182

IS4Code commented Apr 12, 2023

namedgraph commented Apr 12, 2023

IS4Code commented Apr 12, 2023

redmer commented Apr 17, 2023

Datatype triple patterns #182

Datatype triple patterns #182

Comments

IS4Code commented Apr 12, 2023

Why?

Proposed solution

Language-tagged literals

Considerations for backward compatibility

namedgraph commented Apr 12, 2023

IS4Code commented Apr 12, 2023

redmer commented Apr 17, 2023