Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support JSON Literals. #72

Merged
merged 8 commits into from Mar 26, 2019
Merged

Support JSON Literals. #72

merged 8 commits into from Mar 26, 2019

Conversation

gkellogg
Copy link
Member

@gkellogg gkellogg commented Mar 20, 2019

Uses JCS normatively for RDF output, with atrisk issue.

For w3c/json-ld-syntax#4.


Preview | Diff

@gkellogg
Copy link
Member Author

JCS has some failure modes for unrepresentable numbers (e.g., NaN, +-Infinity). Not sure if it's worth testing these, or calling it out in the algorithm.

Copy link
Member

@iherman iherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should not define this in the syntax document first, as well as formally define the RDF JSON datatype. The latter also means we will have to decide whether we put that as a separate document or include it in the syntax, what namespace we use (I would argue we would have to try to put this into the RDF namespace), etc. Those issues may influence this document.

Also, I am still not convinced by the role of JSON c14n. I could argue that, e.g., in example 22, the user may expect to see the JSON content verbatim in the generated Turtle, and this canonical version may come as a surprise.

@iherman
Copy link
Member

iherman commented Mar 21, 2019

Minor thingy: after example 22, "unnecssary"->"unnecessary".

Copy link
Contributor

@pchampin pchampin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appart from a few typos for which I'm about to push a commit, I think this is in good shape.

a <a>scalar</a>,
<span class="changed">or the <a>term definition</a> for <var>active property</var>
has a <a>type mapping</a> of <code>@json</code>,</span>
return that result.</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"that result" is ambiguous here. It originally meant "the result of the Value Compaction Algorithm, but now the algorithm is not called in all cases.

@pchampin
Copy link
Contributor

Also, I am still not convinced by the role of JSON c14n. I could argue that, e.g., in example 22, the user may expect to see the JSON content verbatim in the generated Turtle, and this canonical version may come as a surprise.

Implementations of the algorithm will usually not have access to the verbatim form. In Python for example, the algorithm gets a dict, where order and "unnecssary" spaces have already been tampered with... So even without c14n, the user will usually not get the JSON content verbatim in the generated Turtle.

@iherman
Copy link
Member

iherman commented Mar 21, 2019

Implementations of the algorithm will usually not have access to the verbatim form. In Python for example, the algorithm gets a dict, where order and "unnecssary" spaces have already been tampered with... So even without c14n, the user will usually not get the JSON content verbatim in the generated Turtle.

Yeah, that is probably true. I did not think of that.

@gkellogg
Copy link
Member Author

I wonder whether we should not define this in the syntax document first, as well as formally define the RDF JSON datatype. The latter also means we will have to decide whether we put that as a separate document or include it in the syntax, what namespace we use (I would argue we would have to try to put this into the RDF namespace), etc. Those issues may influence this document.

My process is typically to implement the feature, then describe the processing steps and the tests for it. Once we're in agreement with the general direction, then to do the syntax. Certainly, the datatype needs to be normatively described. We can bikeshed on rdf: vs jsonld: namespaces. Keeping it together with the JSON-LD syntax document seems reasonable to me. If RDF Concepts were a living doc, we could put it there.

Also, I am still not convinced by the role of JSON c14n. I could argue that, e.g., in example 22, the user may expect to see the JSON content verbatim in the generated Turtle, and this canonical version may come as a surprise.

As @pchampin says, we can't rely on the original syntactic representation, as it's completely lost when transforming into the internal representation (unlike XML/HTML when parsed into an infoset, where whitespace remains). Moreover, once numbers are parsed, any difference between, e.g., 0, 0.0, or 0.0e0, 0.0E+0, ... is lost. For this, we count on ES6 to describe serializations, which always uses E+- with no leading zeros, but doesn't use exponential notation if the value is within certain ranges. Whether or not we can rely on JCS, we'll need to spec something equivalent for basic interoperability, not to mention signing and querying.

@iherman
Copy link
Member

iherman commented Mar 22, 2019

I must admit I still fail to see why JSON c14n would be normatively necessary. As long as the API produces a valid JSON, whose meaning is identical to the input, this should be enough from the JSON-LD point of view. It is of course perfectly fine to refer to the c14n spec in the sense of recommending to implementation to use this if it is important to compare and/or sign the resulting graph or any other form of the JSON-LD content, but that is not the same as casting it into concrete for the API.

In any case, unless the JCS spec is as stable as it can be, we would have a problem referring to it normatively. (We should consider the relevant guidelines.) Also, normatively specing a JCS of our own, I believe, should be a no-no. Defining, essentially, a standard that is a generic JSON specification whose role goes way beyond JSON-LD when there is work happening elsewhere would create lots of turmoil. Copying and freezing in our document an existing standard work that is happening elsewhere is also a (big!) no-no.

I believe we will have to find a way to refer to JCS if we want, but not normatively.

@iherman
Copy link
Member

iherman commented Mar 22, 2019

This issue was discussed in a meeting.

  • RESOLVED: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized
View the transcript JSON datatype
Rob Sanderson: link: w3c/json-ld-syntax#4
Rob Sanderson: PR: #72
Rob Sanderson: we also have discussed the JSON datatype on github
… Gregg, you’ve been the most involved (as always)
… could you summarize?
Gregg Kellogg: the issue comes down to representation
… if you are going to describe both the lexical and value space
… somewhat like HTML
… the lexical space cannon be guaranteed
… the JSON literal quality is lost when its turned into a native representation
… you loose the original key ordering, key escaping, and lexical numerical representations
… so it seems we will need to canonicalize
… which has been referenced in the issue
… it’s sadly not as close to done as I’d hoped
… and we can’t count on it being final in time
… so, do we care if two implementations use the same canonicalization
… so we have done some things about do we use Integer or Doubles for numbers
… so when you’d turn the JSON literal into RDF (in the toRDF space), we do need to say something about that at least
… and the elimination of whitespace
… and the ordering of keys
… I think that can be done
… there’s a lot of detail in that, but we should be able to reference ECMAScript for this
… or we could do it ourselves
Rob Sanderson: last time we talked about the canonicalization issue
… we also talked about HTML being not easily canonicalizable
Gregg Kellogg: HTML is a little different
… they will preserve order, and whitespace
… so you do have the opportunity return to that result
Ivan Herman: well, attribute order and things are not covered
… this would be a problem if you were to attempt to sign an HTML document
Gregg Kellogg: if we weren’t in an era when signatures weren’t as important as they are now, then maybe we wouldn’t need to care about this so much
Rob Sanderson: so, is there a JSON-LD document that could include a JSON “native” data type that also needs to be signed
… so if the only use case is to import GeoJSON
… do we need to worry
Ivan Herman: I have spent time on this issue with others
… aside from the canonicalization problem
… if we do make a native JSON type, we will have to put it into some namespace–rdf: or jsonld:
Rob Sanderson: +1 to RDF namespace
Ivan Herman: if we do that, we’ll have to write the SWIG mailing list, to announce the new datatype, etc.
… we can do this as part of our document
… the other problem is
… I did put a reference in the issue for the rules we have to follow when we point to something normatively
… my first reading is that unfortunately, this JSON canonicalization specification cannot be referred to normatively
… the second problem is bringing our own canonicalization into our document
… if we do that, I can safely say the Director would say no to that
… so, we can’t just take an IETF spec and put it into a W3C spec
… all of these are admin problems
… But I am still not convinced that we need the canonicalization as a normative part of our spec
… we could say that someone else may do this and reference forthcoming work
… but when the issue is that we have a JSON portion we want to store in RDF
… we can state that the only expectation is that [the same processor will produce the same output]
… none of the arguments that I heard is that canonicalization needs to be normative
Pierre-Antoine Champin: http://tinyurl.com/y2gmzxf8
Pierre-Antoine Champin: I was wondering about this example
… there’s an Integer in the non-canonical form
… would that be canonicalized or not?
Gregg Kellogg: yes, that would be canonicalized
… I don’t know any processors that would properly serialize that with a leading zero
… if you’re going to the internal representation
… it is the number 42
… some might do 42.0
… or 42E+0
… that would be fine, but I don’t think most JSON serializers would do that
Pierre-Antoine Champin: for the moment, we know how to sign this thing
Dave Longley: I think this falls into the same category as HTML
… it’s a string in the JSON; it’s not native HTML
… or a native number in the example’s case
… if we’re storing stuff in a string, then store it as a string
… but people want a native JSON object in their JSON
Pierre-Antoine Champin: but if you remove the leading 0 you don’t get the same signature
… so I’m assuming that the signature is dealing with the order or absence of order in the object when signed
… so if the object was a native JSON object, then it would already benefit
… and regardless we already have this problem with other string-expressed literals
Rob Sanderson: if you instead make it value 42.0
… since no one really serializes as 042
… whatever you change here will change the signature
… even though it will canonicalize as something different
Dave Longley: I disagree
Rob Sanderson: what do you disagree with?
Ivan Herman: I think in these examples, the current JSON-LD specification doesn’t say anything about what you put in strings
… we don’t suggest any sort of mini-canonicalization for things like this
… having built-in canonicalization for the native JSON representation
… would be a departure from what we’ve done previously
Dave Longley: my response to all that is that we have very consistent rules about moving non-string data into strings
… so we do have those sorts of specifications
… from a native JSON value into a string
… this same thing would exist for native JSON objects
… for things that come in via a string, those will stay as whatever that string is
… so strings have no issue
… so if you take pchampin’s example, and change it to a real number: 42
Gregg Kellogg: 42, 42.0, 42.0E0, 4.2E+1 are all the same number
Dave Longley: and if you put that in the playground, check the nquads tab, you’ll find the same number
Ivan Herman: yep I acknowledge that
Rob Sanderson: maybe then it’s the playground which is at fault
… I put in several examples, and the signature changes for all of these different 42’s as an integer
Dave Longley: you’re looking at the RSA signature, so you’ll see it change constantly
… because that injects random data
… what you need to look at is the N-Quads or normalized tabs
… the data there stays the same
Gregg Kellogg: this is in the data round tripping section
Gregg Kellogg: so, imo, if we create a datatype for JSON
… before there is a canonicalization for it
… then we’re in danger of doing things too early
… ultimately we need to deal with a canonicalized JSON
Pierre-Antoine Champin: +1
Gregg Kellogg: so the best thing we can do right now is nothing
… and defer this until there is a canonicalized form
… otherwise whitespace, object ordering, etc are all variable
… and the literals really won’t be worth doing any lexical representation is important
… better not to do anything until a canonicalization spec exists
Ivan Herman: my take would be milder
… the GeoJSON example doesn’t care about canonicalization
Rob Sanderson: +1 to ivan
Ivan Herman: with the canonicalization things differed
… and state that this feature is not recommended
… so we differ it, and if/when the canonicalization becomes standard or whatever, then we at that point suggest that that spec gets used
Rob Sanderson: it would be better to have a JSON datatype and state that later we’ll do canonicalization
Dave Longley: let’s provide rules for how to produce the JSON string that match the draft – but that you can do something else and be very clear it’s preferred that everyone do the same thing
Rob Sanderson: so we should start with JSON datatypes, and just suggest that you can’t sign these
Jeff Mixter: +1 to ivan and azaroth
Gregg Kellogg: if we don’t do canonicalization now, we don’t seem to be prevented from doing it later
… if we end up as a living spec, then we could do it that way
… and we could also suggest that for testing purposes it is always canonicalized
Rob Sanderson: a warning or a note?
Proposed resolution: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized (Rob Sanderson)
Rob Sanderson: I’d suggest a warning
Gregg Kellogg: +1
Jeff Mixter: +1
Ivan Herman: +1
Rob Sanderson: +1
Simon Steyskal: +1
Pierre-Antoine Champin: +1
Tim Cole: +1
Dave Longley: +0
Benjamin Young: +0 still have concerns about eager misuse
David I. Lehn: +0.5
Jeff Mixter: I echo bigbluehat concerns but I also have very valid reasons to add JSON to RDF data.
Dave Longley: +1 to everything Benjamin is saying … but that we should really also have JSON literals … but they should also all be converted to the same strings in processors :)
David Newbury: +1
Resolution #3: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized
Dave Longley: JSON literals can be an escape hatch but ONLY an escape hatch.

@gkellogg
Copy link
Member Author

I updated the spec to only mention JCS, but described a canonical lexical form for the purposes of round-tripping. It's close to JCS, but uses our established conventions for serializing numbers, rather than what ES6 specifies.

@iherman suggested not specifying anything, as I understood from the call, but I don't see how we can remain entirely silent on this. Please suggest an edit to soften the language, if advisable.

Copy link
Member

@iherman iherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkellogg I am fine with the document as it is now.

@dlongley
Copy link
Contributor

@davidlehn,

It's close to JCS, but uses our established conventions for serializing numbers, rather than what ES6 specifies.

Can you look into how this would impact our JavaScript implementation? It sounds problematic. I think it's one thing to serialize numbers on their own when converting to xsd:integer, etc. but I doubt that, when serializing an entire JSON literal (that happens to contain numbers), we should deviate from ES6.

@gkellogg
Copy link
Member Author

You can look at the latest changes to the test files to see how differently numbers are treated. In general, I think the ES6/JCS serialization is more developer friendly, but it’s odd to have a difference.

The other main difference is that keys are sorted as UTF-16, not -8, which is a subtle difference.

I’m happy to go however people like, and if we say serialze numbers as in ES6, that’s fine with me.

@dlongley
Copy link
Contributor

I'd prefer to go with ES6/JCS ... otherwise we're likely to do something different from everyone else.

@gkellogg gkellogg merged commit 3220ddc into master Mar 26, 2019
@gkellogg gkellogg deleted the json-literals branch March 26, 2019 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants