Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add note on streaming #3

Closed
rubensworks opened this issue Apr 4, 2019 · 10 comments
Closed

Add note on streaming #3

rubensworks opened this issue Apr 4, 2019 · 10 comments

Comments

@rubensworks
Copy link
Member

rubensworks commented Apr 4, 2019

Following #4, we can add guidelines for achieving streaming JSON-LD, and any restrictions it may bring.
Here is a summary of the issues discussed in #4.

Guidelines:

  1. The structure of a JSON-LD document to enable efficient streaming parsing to RDF.
    1. If there is an @context in a node, it should be the first key.
    2. If there is an @type in a node, and its value indicates a type-scoped context, it should come right after an @context (if there is one), or be the first key.
    3. If there is an @id in a node, it should be the first key if there is no @context or @type, or the second (or third) key if there is an @context or @type.
  2. The order in which RDF triples/quads should appear to enable efficient streaming serialization to JSON-LD.
    1. Quads with equal graphs should be grouped (Achieves grouping of @graph blocks).
    2. Quads with graph corresponding to the subject of triples/quads should be grouped (Achieves grouping of @graph and @id blocks).
    3. Triples with equal subjects should be grouped (Achieves grouping of @id blocks).
    4. Triples with equal predicates should be grouped (Achieves grouping of predicate arrays)

(triple stores may already do these kinds of grouping automatically)

Restrictions:

  1. Parsing
    • The spec allows @context to appear any in the document. As such, a strict parser may require buffering large portions of the stream. However, since most real-world JSON-LD documents place @context as first element, streaming parsers may reject any JSON-LD documents that have out-of-order @context's.
  2. Serialization
    • RDF lists are not converted to @list arrays, as you can only be certain that @list can be used once all triples have been read, which requires keeping the whole stream in-memory.
    • No deduplication of triples, as this would also require keeping the whole stream in-memory.
@rubensworks
Copy link
Member Author

Following a comment from @BigBlueHat, scoped @context are also allowed, as long as they appear as first key inside their object/scope.

@gkellogg
Copy link
Member

I think it's only @type which would be an issue for ordering and scoped contexts. Individual property's scoped contexts are handled when they are processed, and all context information will already have been processed.

So, the suggested ordering within an object (actually, node object or graph object), would be the following:

  1. @context – if any
  2. @id or alias – if any
  3. @type or alias – if any

No other properties (including @graph) need be ordered.

@rubensworks
Copy link
Member Author

I don't think @type should be order-dependent. @type essentially expands to rdf:type, which makes it processable like any other property/predicate (as long as @id comes first).

Unless I'm missing something @gkellogg?

@gkellogg
Copy link
Member

Scoped contexts can be triggered on @type as well as a individual property. If you don’t see @type early, you may misinterpret the properties.

@rubensworks
Copy link
Member Author

Ah I see, I wasn't aware of that, thanks for clearing that up! (Link to example for reference)

@ajs6f
Copy link
Member

ajs6f commented Feb 7, 2020

Is this issue distinct from #4?

@rubensworks
Copy link
Member Author

I would say this is the same as #4. (#4 was moved from another repo)

@iherman
Copy link
Member

iherman commented Feb 7, 2020

This issue was discussed in a meeting.

  • RESOLVED: add json-ld-streaming repo; automate all note repos with echidna to publish on each commit
  • ACTION: set up json-ld-streaming repo (Ivan Herman)
View the transcript Streaming best practice
Benjamin Young: #3
Ruben Taelman: historically issue 4 was earlier, but in a different tracker and then moved to BP repository
… I made 3 as a way to summarize the things that are needed to parse efficiently in a streaming manner
… I think that we can close 4, but I should go through it in more detail to make sure we don’t lose anything
ajs6f> +1 to closing #4
Ruben Taelman: that isn’t in #3
… It would be safe to write some text about those guidelines that I mention in #3, but Gregg appeared to agree with most of it except for the properties
… the special case where contexts are applied to specific properties, and we would need some more work
… that we should still look into
Benjamin Young: Are you up for doing some of the work for the doc?
Ruben Taelman: Can help with that, but next 2 weeks are very busy
… only be able to work on it after that, if that’s okay
Benjamin Young: Some time is better than never! Even just sections and rough pointers for where to flesh out and what you have in mind
… any content is great and it can be polished later
… in #4 Adam suggested that this could be a stand alone note, but I don’t think we have time to add a new note
… It would also need promotion
Adam Soroka: The impetus behind it was that it’s not part of the specs directly, but we do want to specify precisely
… in an ideal world it would be nice to have a separate formal note, but not a spec, and the BP doc would just refer to it
… but whatever mention in the BP doc should be informal rather than pseudo-normative
… don’t think we have time for a careful spec either, especially in a BP doc
Benjamin Young: Get what we can into the BP doc
Adam Soroka: Can potentially publish a note later. Can see what use people make of the notes in the BP doc
Benjamin Young: Ruben, you mentioned wanting to close #4
… should we leave that one open?
… You wanted to go through issue 4 to see what was there
Ruben Taelman: Assign it to me and I can see if we can close it
Benjamin Young: Happy to leave opening and closing to you
… can leave #3 open as the primary topic
… has a good looking outline
Ruben Taelman: about the note, I also think it’s valuable to have it at some point. Maybe not in the scope of this WG. Problem I have now is that when I discover JSON-LD docs I want to parse, I have to assume that they are not stream-enabled to be parsed
… so I parse them the normal way, and can’t use the optimizations for streaming parsing
… good to add a specific content type or similar to say that it can be parsed in a streaming way
Benjamin Young: Maybe a profile?
Rob Sanderson: it seems to ajs6f’s point about formality/informality of the BP…
… that if we wanted to have event a profile parameter/IRI, that it would raise it to a Note
Ivan Herman: +1 to Rob
Rob Sanderson: because we’d want to refer to it from the IANA profile
… so this does seem like something to consider
… I agree with rubensworks that if you don’t know you can stream it, you won’t
… so there needs to be something in the header that tells you it’s possible
… and perhaps we need to discuss priority of the notes
Benjamin Young: I think with priority of notes it’s who works on what
… if they show up then they’ve been prioritized :)
… if folks want to work on streaming, that’s great. We’re not stealing time from one to the other
… Have the time to discuss
… Ivan is this something we can do?
Ivan Herman: We can add as many notes as we can write
Ruben Taelman: What are the requirements for such a note?
… should be more extensive than a section in a BP doc
Benjamin Young: I think the biggest part is the profile parameters section, but start with whatever you have
… Is there more plumbing we should set up, Ivan?
Ivan Herman: In a note we can do what we want.
Ruben Taelman: Where should it live? Also in the BP repo?
Benjamin Young: Maybe we should use it as a notes repo
Ivan Herman: what do you guys want?
… don’t expect me to make the decision :) To make a new repo is 10 minutes max. You tell me
Adam Soroka: Was going to ask if we change it to a notes repo, then the BP is a note?
Benjamin Young: It’s a note already
… the only one we have content for
Ivan Herman: the only practical thing, if we want to use things like echidna, it makes it harder if there’s 5 notes in one repo
… but if we publish only once, it’s not a big deal
Rob Sanderson: a thought about publishing frequency
… for a note, having lots and lots of working drafts seems useful to draw attention to it and possibly get input
… so, I’d prefer we set it up with the best tooling to get drafts into the public
Ivan Herman: we can set up echidna that every commit to master is auto published
Benjamin Young: That sounds great
… a repo for each note
Ivan Herman: One for BP, one for CBOR. Now there’s a nice bikeshed question … what name for the repo?
Benjamin Young: json-ld-stream ?
Benjamin Young: our repos https://github.com/w3c?utf8=%E2%9C%93&q=json-ld-&type=&language=
Ivan Herman: +1
Tim Cole: +1
Pierre-Antoine Champin: +1
Proposed resolution: add json-ld-streaming repo; automate all note repos with echidna to publish on each commit (Benjamin Young)
Rob Sanderson: +1
Ivan Herman: +1
Adam Soroka: +1
Ruben Taelman: +1
Benjamin Young: +1
Tim Cole: +1
David I. Lehn: +1
Ivan Herman: first we have to go tyhrough the pedestrian way, and then we can publish each time
Benjamin Young: some time in the next couple of weeks we make a drafty working draft
Resolution #2: add json-ld-streaming repo; automate all note repos with echidna to publish on each commit
Ivan Herman: that’s something we have a plan to republish CR, and not via echidna but a new CR
… I have some time constraints about that. I sent a separate note, but I will have a trip to the US at the end of Feb and then a week out 10th of March
… so the timing might have some consequences
… that said, the whole procedure has been moved to github, so I believe that chairs can also initiate a FPWD by making an issue
… and FPWD is more automatic than anything else
… but still a discussion to have with the webmaster
Rob Sanderson: if it’s a matter of a week or so, then no need to rush
Benjamin Young: can always use GH previews for discussion in the mean time
… ivan do you want an action for the repo?
Action #1: set up json-ld-streaming repo (Ivan Herman)

@rubensworks
Copy link
Member Author

Everything mentioned here has now been written here: https://github.com/w3c/json-ld-streaming

So I suggest to close this issue.

@BigBlueHat
Copy link
Member

Works for me! Thanks, @rubensworks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

6 participants