-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming Profiles for JSON-LD to/from RDF #4
Comments
This question arises on a regular basis (on user mailing lists) for Apache Jena, and our only current response is to admit that JSON-LD currently isn't a good choice when processing would benefit from or demands streaming. |
from @rubensworks sent today via email:
|
This issue was discussed in a meeting.
View the transcript5.2. Streaming Profiles for JSON-LD to/from RDFRob Sanderson: ref: https://github.com/w3c/json-ld-api/issues/5 ajs6f>gkellogg: there are savings to be realized if one could spec a profile for streaming Gregg Kellogg: this profile would say, “to be streamed, a JSON_LD serialization would need to have the following characteristics Ivan Herman: analysis of the format with this in mind Ivan Herman: I’d say defer … this might be interesting enough that someone might publish something before this WG ends Gregg Kellogg: we could publish something that invites people to work on this Proposed resolution: Streaming is interesting, but not high priority for work given current participants ; highlight in a blog post (Rob Sanderson) Gregg Kellogg: +1 Adam Soroka: +1 Rob Sanderson: +1 Benjamin Young: +1 Simon Steyskal: +1 Ivan Herman: +1 David I. Lehn: +1 Jeff Mixter: +1 Harold Solbrig: +1 David Newbury: +1 Resolution #6: Streaming is interesting, but not high priority for work given current participants ; highlight in a blog post {: #resolution6 .resolution} |
Just to jot down some thoughts after discussing with @rubensworks, the main issue is to encourage/require a key order in JSON objects. To properly decode values in an object, When serializing JSON-LD, order keys which are keywords, or aliased to keywords before other keys, ordered lexicographically by the unaliased key followed by all other keys in the object ordered lexicographically. (There is some aesthetic value to ordering |
Can you guys give some examples? It would help to understand... |
Here's an example on the importance of Assuming a line-by-line parser, triples can be emitted immediately after each line in the following JSON-LD document:
However, if
(Source: https://github.com/rubensworks/jsonld-streaming-parser.js#how-it-works) |
Thanks @rubensworks. I have two questions, though:
|
Lexicographical ordering may be a bit too strict. I think the point @gkellogg intended to make is that
I personally don't see any benefits in handling the contents of an |
For keywords, you want to see |
as long as we don't introduce any new keywords that would somehow have an effect on this order, right? (not very likely, but still ;) )
imo @rubensworks suggested restrictions are also more "stable" and less ambiguous than relying on lexicographical ordering only |
For reference, I just finished implementing a streaming JSON-LD serializer. While the implementation of this was significantly easier than the implementation of the streaming parser, Concretely, it has the following restrictions:
Next to that, in order to make the resulting JSON-LD stream as compact as possible, the following guidelines regarding triple/quad order can be followed:
Since these findings about JSON-LD serialization (and the previous ones on parsing) may be beneficial for other people as well, I was wondering if the Best Practices Note may be a good place to summarize these findings. |
I have an implementation of a streaming writer that pretty much does the same thing: https://github.com/ruby-rdf/json-ld/blob/develop/lib/json/ld/streaming_writer.rb. Such list restrictions are a good argument for such structures to be more fundamental to RDF in the future. |
This issue was discussed in a meeting.
View the transcriptStreaming Profiles for JSON-LD to/from RDFRob Sanderson: link: https://github.com/w3c/json-ld-api/issues/5 Rob Sanderson: this came from the community group Ivan Herman: what does a profile mean? Gregg Kellogg: I reckon in the sense of serializing json-ld in a way that it’s easier for stream processors to deal with it … or how would you create json-ld from a stream … the best thing we can do is to provide requirements that should be followed Ivan Herman: so not like profiles in the http context Ruben Taelman: basically like gkellogg described … I’m more than happy to summarize this in the best practice document Dave Longley: +1 to doing this in best practices Simon Steyskal: I don’t think it should be normative. You can do what you want. But it’s perfectly fine for a best practices document and should be in there, giving guidelines on this. Gregg Kellogg: the one thing I’m not sure whether we can move to a bp document is something that allows one to require stream data (?) Ivan Herman: I would propose to leave it to best practice Tim Cole: I’m a little concerned that by not following gkellogg’s suggestions people will create json-ld that cannot be used properly by a streaming processor Adam Soroka: we frequently get questions about streaming json-ld … I second the concern timCole raised Benjamin Young: #3 Benjamin Young: a lot of the stuff I’m reading there is about key ordering … one potential option could be not to require ordering … but processors outputting a preferred ordering Ivan Herman: I hear bigbluehat’s argument which is perfectly valid, and maybe a future version of json-ld will have key ordering Rob Sanderson: +1 - unordered keys are ordered by necessity of a serialization Gregg Kellogg: serialization vs. data model wrt. ordering Tim Cole: +1 to ivan since it will provide experience to inform normalization Ivan Herman: I repeat what I just said, getting into a normative thing in that area is probably premature … or too much work Rob Sanderson: what’s the happy middle ground? key ordering for streaming? Benjamin Young: I would like to get as much as possible into the best practice document Ruben Taelman: scoped contexts shouldnt be an issue … as long as they are the first object Proposed resolution: Describe preferred key ordering for serialization over the wire to enable streaming parsers as a best practice (Rob Sanderson) Gregg Kellogg: +1 Adam Soroka: +1 Rob Sanderson: +1 Ivan Herman: +1 Dave Longley: +1 Benjamin Young: +1 Simon Steyskal: +1 David I. Lehn: +1 Ruben Taelman: +1 Gregg Kellogg: Scoped contexts might require that @type come after @id Pierre-Antoine Champin: +1 Tim Cole: +1 as long as leave defer for future David Newbury: +1 Resolution #7: Describe preferred key ordering for serialization over the wire to enable streaming parsers as a best practice |
Transferring this issue (used to be issue. no 5 in json-ld-syntax) to best practice repo |
Hm. Just thinking out loud, but I wonder if this would be even better placed as a Note separate from the BP document. Maybe, maybe not. In favor, I think it's a bit (or maybe a lot, depending on your POV) more advanced than other topics we expect to cover in the BP doc. Against, why multiply documents? We have several already… |
@BigBlueHat I just went through this issue and #5, and I can confirm that all information in here is summarized in #5, so we can safely close this one here. |
There have been some discussions on what it would take to be able to do a streaming parse of JSON-LD into Quads, and similarly to generate compliant JSON-LD from a stream of quads. Describing these as some kind of a profile would be useful for implementations that expect to work in a streaming environment, when it's not feasible to work on an entire document basis.
As currently stated, the JSON-LD to RDF algorithm requires expanding the document and creating a node map. A profile of JSON-LD which used a flattened array of node objects, where each node object could be independently expanded and no flattening is required could facilitate deserializing an arbitrarily long JSON-LD source to Quads. (Some simplifying restrictions on shared lists may be necessary). Outer document is an object, containing
@context
and@graph
only; obviously, this only will work for systems that can access key/values in order, and for systems that ensure that@context
comes lexically before@graph
in the output. Obviously, only implementations that can read and write JSON objects with key ordering intact will be able to take advantage of such streaming capability.Fo serializing RDF to JSON-LD, expectations on the grouping of quads with the same graph name and subject are necessary to reduce serialization cost, and marshaling components of RDF Lists is likely not feasible. Even if graph name/subject grouping is not maintained in the input, the resulting output will still represent a valid JSON-LD document, although it may require flattening for further processing. (Many triple stores will, in fact, generate statements/quads properly grouped, so this is likely not an issue in real world applications).
Original issue Streaming Profiles for JSON-LD to/from RDF #434.
The text was updated successfully, but these errors were encountered: