Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using "JSON Lines" for large TDs #93

Closed
mmccool opened this issue Nov 9, 2020 · 10 comments
Closed

Consider using "JSON Lines" for large TDs #93

mmccool opened this issue Nov 9, 2020 · 10 comments

Comments

@mmccool
Copy link
Contributor

mmccool commented Nov 9, 2020

A protocol for returning JSON line-by-line (or rather chunk-by-chunk) which may be useful for returning large TDs. Suggested by @farshidtz

See https://jsonlines.org/

@danielpeintner
Copy link
Contributor

I am not sure if a line-by-line approach works out for JSON-LD processing (without any further requirements).

There are some strong requirements, e.g., that @context is necessary to do further processing.

{
    "id": "urn:dev:ops:32473-WoTLamp-1234",
    "title": "MyLampThing",
    "@type": "saref:LightSwitch", // this (and similar) lines are not interpretable till the context is known

    ... BIG CHUNK OF DATA ....

    "@context": [
        "http://www.w3.org/ns/td",
        { "saref": "https://w3id.org/saref#" }
    ]
}

@relu91
Copy link
Member

relu91 commented Nov 10, 2020

From the home page, it seems that it is not applicable to our use case (i.e. Big TDs). I quote:

  1. Each Line is a Valid JSON Value
    The most common values will be objects or arrays, but any JSON value is permitted.
    See json.org for more information about JSON values.

This means that the format is meant to be used with a list of JSON values, like a list of Objects, Arraries, or strings. It wouldn't work with a big JSON object.

@egekorkan
Copy link
Contributor

I also don't see the use case very well. Even outside of the JSON-LD related features, TD has interdependencies like the base value, securityDefinitions and to some extent the dependency between readOnly, writeOnly, observable and the forms. Also, any DataSchema term could be influenced by the other DataSchema terms, i.e. if there is "type":"number", there is a possibility of maximum appearing somewhere inside the interaction. Even worse for objects and arrays.

So I would say that processing a JSON Line document does not make a lot of sense but transmitting it chunk by chunk before processing makes sense. However, wouldn't it make more sense to rely on the transportation mechanism for that?

@relu91
Copy link
Member

relu91 commented Nov 10, 2020

So I would say that processing a JSON Line document does not make a lot of sense but transmitting it chunk by chunk before processing makes sense.

The point is that I think we had a misunderstanding. JSON lines do not seem to split big JSON objects, it will send it as a whole. For example:

{
/* super big TD */
}// send the whole object

While here:

{/* super big TD */} // send this first
{/* super big TD */} // then this one

However, wouldn't it make more sense to rely on the transportation mechanism for that?

Generally, speaking yes. HTTP can handle big files easily. However, originally, we thought that big TDs could occupy TDD resources and could cause DOS problems. Moreover, I am not sure that every protocol binding could handle big files. Does COAP have such capability?

Finally, I think this might be an optimization but we could leave it out the spec. I mean, it does not have the highest priority on my mind.

@egekorkan
Copy link
Contributor

I see. Just to answer the small question :)

Does COAP have such capability?

Yes -> https://tools.ietf.org/html/rfc7959 and w3c/wot-binding-templates#49

@farshidtz
Copy link
Member

farshidtz commented Nov 10, 2020

I agree, this was suggested in the wrong context. It does not solve the "super big TD" problem.

It can be used to deliver TDs one-by-one, as mentioned by @relu91:

{/* super big TD */} // send this first
{/* super big TD */} // then this one

allowing the clients to consume them one at a time and interrupt at any time, instead of:

[
  {/* super big TD */},
  {/* super big TD */}
]

This is similar to paginating with page size of one, except that the client doesn't need to make a new requests for consecutive TDs.

JSON Lines responses can be requested through content negotiation. The use cases are for e.g. when querying several TDs and stopping after you receive an expected TD or before you run out of memory.

@egekorkan
Copy link
Contributor

Ah I see, makes a lot of sense like this :)

@mmccool
Copy link
Contributor Author

mmccool commented Nov 30, 2020

Well, if I were designing a system to send TDs incrementally, I would do something like a recursive approach, e.g. send the JSON with elements down to some maximum depth, with detailed sub-elements replaced with references that would then be sent later. The problem with this is it's still hard to limit the max size of each chunk.

It might be easier to just encode the TD as a string or binary blob, and then just send that in chunks (which should be easy to define). Then the query would still return a JSON outer wrapper, but the TD itself would be encoded as a string value which would have to be unpacked.

Note that for signed and/or encrypted TDs we may have to deal with this use case anyway.

Returning chunked string-encoded TDs could be an option on the filter. If the consumer is not concerned about incoming size it could be dropped. On the server side though if someone tried to read a really large TD they might get an error if it exceeds some max size, but the error could indicate that that particular TD can only be read in "chunked" mode. If a query returns multiple TDs than if any TD exceeds the max size then the entire query would have to return that error.

@farshidtz
Copy link
Member

I propose closing this issue and continue the discussion on #117.

@farshidtz
Copy link
Member

From Discovery call:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants