Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datatype triple patterns #182

Open
IS4Code opened this issue Apr 12, 2023 · 3 comments
Open

Datatype triple patterns #182

IS4Code opened this issue Apr 12, 2023 · 3 comments

Comments

@IS4Code
Copy link

IS4Code commented Apr 12, 2023

Why?

Presently, SPARQL triple syntax does not offer enough granularity when matching literals ‒ the object of a property can be specified in a triple pattern as (among others) a variable or a fixed literal, but it cannot be something in between. If you want to find triples based on the datatypes, you have to match all of them, and then filter:

?s ?p ?o .
FILTER (DATATYPE(?o) = xsd:integer)

A trivial SPARQL engine implementation would load all triples in existence into a set and then filter them based on the criteria.
It is possible, of course, to optimize this query and retrieve all matching triples as a single step, but not all engines might do that, and still this is something that could be expressed more concisely.

Proposed solution

I propose an extension of the node syntax like this:

?s ?p ?o^^xsd:integer .

Such a query would only match triples with the xsd:integer datatype, and bind ?o to the lexical value of the literal (as a simple literal/xsd:string).

Of course there are other options for the object:

# any literal with this lexical value, regardless of datatype (?dt is bound to IRI)
?s ?p "10"^^?dt .

# any literal (?o is simple literal, ?dt is IRI)
?s ?p ?o^^?dt .

I think this would add more expressiveness to the language, and a possibility for better optimizations to triple store that can index by the datatype and evaluate such queries efficiently.

Language-tagged literals

These are literals as well, but I am unsure whether they should be matched by a pattern like ?o^^?dt. However, if they are allowed, I think the natural solution (if not deemed too convoluted) might be to bind ?dt to a literal with the xsd:language datatype, essentially treating something like "hello world"@en as ""hello world"^^"en"^^xsd:language" (not a proposed syntax), binding ?o to "hello world" and ?dt to "en"^^xsd:language. After all, a language tag is the datatype of a language-tagged literal, at least syntactically.

Other SPARQL functions, such as STRDT, could be modified to allow either IRI or xsd:language as the datatype.

Considerations for backward compatibility

I don't think ^^ in this position could have been valid previously, so all existing valid queries should remain valid and unambiguous.

@namedgraph
Copy link

Isn't rdfs:langString the datatype of language-typed literals?

@IS4Code
Copy link
Author

IS4Code commented Apr 12, 2023

Isn't rdfs:langString the datatype of language-typed literals?

Semantically yes, but it is not written in the triple. Such a thing should be produced only from something like "hello world"^^rdf:langString, not from an actual language-tagged literal.

Specifically, I was aiming for ?o^^?dt to produce something where STRDT(?o, ?dt) gives back the original object. With rdf:langString, information about the actual language tag is lost.

@redmer
Copy link

redmer commented Apr 17, 2023

This proposal would make working with typed strings better, so I appreciate it. Combined with ?o@?lang to match the language tag, it would match both ways that (typed and language tagged) strings are available in SPARQL and Turtle. Then you'd use STRDT() to reconstitute a string from ?o^^?dt and STRLANG() with ?o@?lang. This was proposed in #17.

Having STRDT(?o, "en^^xsd:language) return "hello world"@en would be a (breaking?) change from SPARQL 1.1.

Keeping datatype(""@en) = ?dt = rdfs:langString the same would give fewer differences with the Turtle parsing rules and SPARQL's datatype(). It would indeed entail that ^^?dt "shadows" the language tag information. But with also a parallel literal syntax to match languages, this may not be a true problem.

Looking at other proposals, #34 suggest ?o@* to match all language tagged (≟ only rdfs:langStringdatatype) strings or?o^^* for all datatype tagged (≟ all non-xsd:string and/or all non-rdfs:langString`). #112 looks only tangentially related.

I think that would result in, for the following data, the bindings as in the following table:

:s :p "Hello, world!"@en .
SPARQL ?o ?dt ?lang
:s :p ?o "Hello, world!"@en
:s :p ?o^^?dt "Hello, world!" rdfs:langString
:s :p ?o^^rdfs:langString "Hello, world!"
:s :p ?o@?lang "Hello, world!" "en"
:s :p ?o@en "Hello, world!"
? :s :p ?o^^?dt , ?o@?lang "Hello, world!" rdfs:langString "en"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants