Value space of rdf:JSON datatype #65

gkellogg · 2023-09-20T21:43:27Z

Updates for #62 delved into updating the definition of the value space of the rdf:JSON datatype to use more primitive concepts from INFRA (arrays, maps, strings, booleans, and null) as well as number from ECMAScript.#62

The existing value space is based on the JCS representation of the JSON literal value. The proposed update could look like the following:

The value space

is a single JSON value in the form of an array, map, string, number, boolean, or null.

Array entries may be any of the above JSON values.
Map keys are strings with values, which may be any of the above JSON values.

Two JSON values are considered equal if they are the same string, number, boolean, or null; if they are both arrays with entries which are pairwise equal; or if they are both maps with equal map entries.

The text was updated successfully, but these errors were encountered:

afs · 2023-09-21T14:46:01Z

array, map, string, number, boolean, or null.

As a general principle - I'm in favour of linking to original definitions where possible rather than incorporating material or normative referencing derived works which may diverge because they are for a specific or different purpose.

For JSON, RFC 8259 - I think that is the original definitive place (it would mean "map" -> "object"). RFC 8259 is the current STD-90.

(certainly for "string" - because a JSON string is not an RDF string or xsd:string)

pfps · 2023-09-21T15:24:25Z

But just what are the JSON values, particularly number?

afs · 2023-09-21T15:39:03Z

What are the requirements?

https://www.w3.org/TR/json-ld11/#terms-imported-from-other-specifications

where number goes to
https://tc39.es/ecma262/#sec-terms-and-definitions-number-value

but is that what the value space for a JSON fragment is for?

If it is "JSON processors treat them the same" then https://www.rfc-editor.org/rfc/rfc8259.html#section-6
and apply (some of) RFC8785 because the ultimate abstract value is not important.

gkellogg · 2023-09-21T21:57:30Z

array, map, string, number, boolean, or null.

As a general principle - I'm in favour of linking to original definitions where possible rather than incorporating material or normative referencing derived works which may diverge because they are for a specific or different purpose.

For JSON, RFC 8259 - I think that is the original definitive place (it would mean "map" -> "object"). RFC 8259 is the current STD-90.

(certainly for "string" - because a JSON string is not an RDF string or xsd:string)

First, we need to decide if we want to go for this decomposed notion of a JSON value for the value space. I'm fine with sourcing RFC8259, which would get out of the problem of having to go to ECMAScript for numbers.

JSON-LD (and INFRA) tend to use the term "map" rather than "object", as "object" is overly general. We can use the term "map" while still referencing the "Object" section in the RFC8259.

Regarding strings, certainly the strings referenced as JSON values (or within a JSON serialization) reference "strings" from RFC8259, and may include their own escape sequences. While "\uDEAD" may be represented (it technically can be in JSON-LD 1.1), this would be an aspect of the JSON value, rather than the lexical representation which would not allow a surrogate natively. JSON-LD 1.2 would likely be updated to exclude surrogates. It should be clear, and we may need to state it as such, that a JSON string is disjoint from an RDF string.

Of course, the other alternative is to not go with the decomposed notion of a JSON value as the value space, in which case we're dealing exclusively with RDF strings containing a JSON serialization. Note that the existing value space uses JCS/RFC8765 for the canonical form of JSON, which has similar requirements for character representation as our own, and requires implementations to terminate if a "loan surrogate" is found.

afs · 2023-09-22T12:10:39Z

do the two styles agree on what matches for numbers? (I think JCS does because it (in effect) goes through binary)

pfps · 2023-09-22T13:24:16Z

JCS has the decided advantage of only processing a subset of JSON. Unless rdf:JSON is limited to that subset depending on JCS may not be possible.

afs · 2023-09-22T16:29:16Z

I-JSON: RFC 7493

gkellogg · 2023-09-22T17:05:44Z

JCS was used by JSON-LD to create the RDF serialization of a JSON value in the Object to RDF Conversion algorithm, so it never did allow for surrogates, although JCS was not finalized at that time, so the definition of canonical lexical form may not strictly define that restriction. Any strict update to the rdf:JSON definition within JSON-LD would use JCS directly, and further limit code points similar how we've one in RDF Concepts and disallow surrogates explicitly. Obviously, this is what I-JSON did.

pfps · 2023-09-22T17:47:30Z

I-JSON has a lot more restrictions than just nice strings. Is rdf:JSON supposed to have these other restrictions too? If so, these other restrictions need to be stated explicitly.

The nice strings restriction needs to be either stated or true. I think that it is not true currently.

afs · 2023-09-22T17:54:53Z

"A lot"?

It is those things that make for accurate consistent parsing.

pfps · 2023-09-22T18:03:59Z

Number restrictions to IEEE floating point double.
No duplicate member names.

Ok, so not a lot in absolute terms. But a large part of the JSON syntax is affected.

afs · 2023-09-23T19:17:42Z

It is the areas where there is no common, stable, implemented values.
Unless @pfps has a proposal?

gkellogg · 2023-09-23T19:44:22Z

JSON doesn't allow duplicate keys (member names), either, although it is not typically an error condition; the last key wins. Limitations of I-JSON (and JCS) on string and number representation should not be a problem, as they're effectively already in place in JSON-LD due to the tacit correspondence to JCS.

pfps · 2023-09-24T00:24:17Z

For JSON numbers, I suggest xsd:decimal.
For objects, I suggest name-value pairs.

afs · 2023-09-24T08:03:51Z

Why have something that has different interpretations across different JSON implementations?

I-JSON/JCS reflects where JSON is standardised, de-facto and de-jure.

pfps · 2023-09-24T13:09:51Z

The question is whether rdf:JSON is going to be the JSON that "does not attempt to impose ECMAScript’s internal data representations on other programming languages" and thus has objects containing "zero or more name/value pairs", strings as "sequence[s] of zero or more Unicode characters", and numbers as potentially unbounded decimal values or the JSON that has objects as EMCAScript objects with all "properties of an object [...] uniquely identified using property keys", strings as "ordered sequences of zero or more 16-bit unsigned integer values", and numbers as a "double-precision 64-bit format IEEE 754-2019 values".

If rdf:JSON is going to be the former, then all references should be to json.org and RFCs, JSON values should not be tied to ECMASCRIPT, and string ordering should be by Unicode codepoint; if rdf:JSON is going to be the latter, then all references should be to the ECMAScript 2024 Language Specification or whatever document currently defines ECMASCRIPT and JSON values and string ordering can be by UTF-16 code unit.

domel · 2023-09-24T14:30:41Z

Agree. But referencing to json.org that can change at any time, is not a good idea.

afs · 2023-09-24T15:14:43Z

json.org has a link at the top to ECMA-404 (the link is broken (!! given the number) but ECMA-404 exists)

The JSON syntax specified by this specification and by RFC 8259 are intended to be identical.

The warning on the EMCA-404 download page is worth noting.

gkellogg · 2023-09-25T22:30:40Z

Suggest a PR that does the following:

The lexical space is the set of RDF strings which conform to the JSON Grammar as described in Section 2 JSON Grammar of [RFC8259] which are also I-JSON messages [RFC7493].

The value space is the set of arrays, objects, strings, _numbers, and JSON literals (boolean and null) [RFC8259]. Two values are considered equal if they are the same string, number, JSON literal; if they are both arrays with elements which are pairwise equal; or if they are both objects with equal members.

The ** lexical to value mapping** map every element of the lexical space to the result of parsing it into a JSON value.

I don't think we need to get into the relationship between JSON strings and RDF strings, or exactly what a JSON number is, other than as defined in RFC8259. Note that the lexical space is an RDF string, as any lexical value must be.

pfps · 2023-09-27T15:16:56Z

I prefer a value space that is not tied to ECMAscript and a lexical order that is not tied to UTF-16. I suggest the following, which handles all JSON texts:

Value space:

The value space of rdf:JSON is recursively defined as the union of

objects - finite bags of members, which are pairs of string (names) rdf:JSON values (values)
arrays - finite sequences of elements, which are rdf:JSON values
numbers - the value space of xsd:decimal
strings - finite sequences of UNICODE code points
false, null, and true - constants different from any other elements of the value space

Ordering:

Objects are less than arrays, which are less than numbers, which are less than strings, which are less than false, which is less than null, which is less than true.
Object members are ordered by lexicographic ordering over their name and value.
Objects are ordered by first sorting their members from lesser to greater and then using lexicographic order over the resulting sequences.
Arrays are ordered by lexicographic ordering over their elements.
Numbers are ordered by the ordering of real numbers.
Strings are ordered by lexicographic ordering over code points.

Canonical form:

The canonical form of an object is { followed by the canonical form of its members in order from lesser to greater separated by , followed by }.
The canonical form of an array is [ followed by the canonical form of its elements in sequence order separated by , followed by ].

The canonical form of a number is its xsd:decimal canonical form.

The canonical form of a string is " followed by the string with " replaced by ", \ replaced by \,
U+0008 replaced by \b, U+0009 replaced by \t, U+000A replaced by \n, U+000C replaced by \f, U+000D replaced by \r,
and other code points between U+0000 through U+001F, inclusive, replaced by \uhhhh where hhhh is the lower-case four-digit hexadecimal numeral for the code point followed by ".

The canonical form of false is the string false, the canonical form of null is the string null, the canonical form of true is the string true.

gkellogg · 2023-09-27T21:41:05Z

I prefer a value space that is not tied to ECMAscript and a lexical order that is not tied to UTF-16.

PR #66 does not currently reference either spec directly (indirectly through RFC8259 and JCS).

I suggest the following, which handles all JSON texts:

Value space:

The value space of rdf:JSON is recursively defined as the union of

objects - finite bags of members, which are pairs of string (names) rdf:JSON values (values)

arrays - finite sequences of elements, which are rdf:JSON values

numbers - the value space of xsd:decimal

strings - finite sequences of UNICODE code points

false, null, and true - constants different from any other elements of the value space

Note that xsd:decimal is neither adequate to represent all JSON numbers nor consistent with JSON-LD. If defined in terms of XSD types, it should stick with what JSON-LD does and use either xsd:integer or xsd:double depending on the existence of a fractional part.

Ordering:

Objects are less than arrays, which are less than numbers, which are less than strings, which are less than false, which is less than null, which is less than true. Object members are ordered by lexicographic ordering over their name and value. Objects are ordered by first sorting their members from lesser to greater and then using lexicographic order over the resulting sequences. Arrays are ordered by lexicographic ordering over their elements. Numbers are ordered by the ordering of real numbers. Strings are ordered by lexicographic ordering over code points.

Ordering should be consistent with ordering the JCS representation. This implies that:

strings starting with " (U+0022) would come before
numbers (leading decimal U+0030-U+0039 or hyphen U+002D), which come before
array starting with [ (U+005B), which come before
false (f is U+0066), which come before
null (n is U+006E), which come before
true (t is U+0074), which come before
object ({ is U+007B).

Canonical form:

The canonical form of an object is { followed by the canonical form of its members in order from lesser to greater separated by , followed by }. The canonical form of an array is [ followed by the canonical form of its elements in sequence order separated by , followed by ].

The canonical form of a number is its xsd:decimal canonical form.

The canonical form of a string is " followed by the string with " replaced by ", \ replaced by , U+0008 replaced by \b, U+0009 replaced by \t, U+000A replaced by \n, U+000C replaced by \f, U+000D replaced by \r, and other code points between U+0000 through U+001F, inclusive, replaced by \uhhhh where hhhh is the lower-case four-digit hexadecimal numeral for the code point followed by ".

The canonical form of false is the string false, the canonical form of null is the string null, the canonical form of true is the string true.

This really needs to be JCS due to wide implementation in JSON-LD processors already.

gkellogg added the spec:substantive Issue or proposed change in the spec that changes its normative content label Sep 20, 2023

gkellogg mentioned this issue Sep 20, 2023

Moves datatype definitions to an appendix and adds rdf:JSON datatype #62

Merged

gkellogg mentioned this issue Sep 26, 2023

Updates rdf:JSON value space. #66

Merged

gkellogg closed this as completed in #66 Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value space of rdf:JSON datatype #65

Value space of rdf:JSON datatype #65

gkellogg commented Sep 20, 2023

afs commented Sep 21, 2023

pfps commented Sep 21, 2023

afs commented Sep 21, 2023 •

edited

Loading

gkellogg commented Sep 21, 2023

afs commented Sep 22, 2023

pfps commented Sep 22, 2023

afs commented Sep 22, 2023

gkellogg commented Sep 22, 2023

pfps commented Sep 22, 2023

afs commented Sep 22, 2023

pfps commented Sep 22, 2023

afs commented Sep 23, 2023

gkellogg commented Sep 23, 2023

pfps commented Sep 24, 2023

afs commented Sep 24, 2023

pfps commented Sep 24, 2023

domel commented Sep 24, 2023

afs commented Sep 24, 2023

gkellogg commented Sep 25, 2023

pfps commented Sep 27, 2023

gkellogg commented Sep 27, 2023

Value space of rdf:JSON datatype #65

Value space of rdf:JSON datatype #65

Comments

gkellogg commented Sep 20, 2023

afs commented Sep 21, 2023

pfps commented Sep 21, 2023

afs commented Sep 21, 2023 • edited Loading

gkellogg commented Sep 21, 2023

afs commented Sep 22, 2023

pfps commented Sep 22, 2023

afs commented Sep 22, 2023

gkellogg commented Sep 22, 2023

pfps commented Sep 22, 2023

afs commented Sep 22, 2023

pfps commented Sep 22, 2023

afs commented Sep 23, 2023

gkellogg commented Sep 23, 2023

pfps commented Sep 24, 2023

afs commented Sep 24, 2023

pfps commented Sep 24, 2023

domel commented Sep 24, 2023

afs commented Sep 24, 2023

gkellogg commented Sep 25, 2023

pfps commented Sep 27, 2023

gkellogg commented Sep 27, 2023

afs commented Sep 21, 2023 •

edited

Loading