Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link to (new) Wikipedia 'JSON Streaming' article #2

Open
timbunce opened this issue Sep 28, 2014 · 9 comments
Open

Link to (new) Wikipedia 'JSON Streaming' article #2

timbunce opened this issue Sep 28, 2014 · 9 comments

Comments

@timbunce
Copy link

I started writing an issue suggesting that you link to the Line Delimited JSON article on wikipedia, and perhaps help to clean it up a little.

The more I looked at it, however, the more I realized that it wasn't as good foundation.

So I ended up writing a new wikipedia article myself: JSON Streaming

I think it's much more informative and balanced (naturally). I'd be grateful if you'd review it and, if you're happy with the content, link to it from jsonlines.org. If you spot something that needs changing or adding, please go ahead and edit the article yourself. In fact doing that anyway, even in some small way, will help the article when the Wikipedians get around to reviewing it.

@wardi
Copy link
Owner

wardi commented Sep 28, 2014

@timbunce under Applications of concatenated JSON I just see a bunch of JSON libraries, what actual applications are using it?

I ask because concantenated JSON seems a little silly to me. If you want pretty-printed JSON and you use a streaming JSON parser, why not just stream a big JSON list (you shouldn't need a new format at all)

@timbunce
Copy link
Author

Applications isn't a good title for that section. Got a better suggestion?

why not just stream a big JSON list

(By 'list' I assume you don't mean wrapping the objects in a JSON array [ ... ].)

Concatenated JSON isn't a new format. It's just giving a name to streaming JSON without any delimiter at all:

$ echo '{"some":"thing\n"}[42]{"may":{"include":"nested","objects":["and","arrays"]}}' | jq .
{
  "some": "thing\n"
}
[
  42
]
{
  "may": {
    "include": "nested",
    "objects": [
      "and",
      "arrays"
    ]
  }
}

Does that clarify it?

@wardi
Copy link
Owner

wardi commented Sep 29, 2014

Yes, I mean wrapping the objects in a JSON array. Can a streaming JSON parser not give you one element at a time?

@timbunce
Copy link
Author

I've changed Applications to Applications and Tools.

The need for the artificial [ at the start is a problem. Imagine a publish-subscribe model such as ZeroMQ there's no simple way to add the artificial [ on connection. The JSON objects will simply start streaming in.

A good streaming JSON parser ought to be able to handle concatenated JSON, or be tricked into it by resetting the parser state when each top-level object is completed.

@wardi
Copy link
Owner

wardi commented Sep 29, 2014

So to me feels like a hack that's specific to certain json encoders/parsers. If you can't figure out the framing without parsing the content, it's not really framing.

@wardi
Copy link
Owner

wardi commented Sep 29, 2014

Also if we're talking about streaming why do we care about having something pretty-printed? That can be handled on the receiving end if someone is interested.

If we're talking about a format suitable for editing, it needs to be a complete file anyway, so a big JSON array seems to fit better.

@timbunce
Copy link
Author

The stream maybe already pretty-printed and out of the readers control.
The wikipedia page is simply aiming to explain the two main forms of JSON Streaming.
Is there anything you'd like to see added or changed?

@wardi
Copy link
Owner

wardi commented Oct 3, 2014

@timbunce yeah, just some real world examples of people using the pretty-printed form. To me pretty-printed concatenated json seems like a really hard format to deal with.

@timbunce
Copy link
Author

timbunce commented Oct 3, 2014

I don't think people would choose to use that form for data processing if they have a choice.
I've dealt with cases where I've a pile of files with pretty-printed json in each. Being able to just cat *.json | jq ... is great. And cat *.json | jq -c . is sufficient to turn the json back into 'jsonlines' form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants