Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard MIME content-type #19

Open
pavelnikolov opened this issue Jul 25, 2016 · 19 comments
Open

Standard MIME content-type #19

pavelnikolov opened this issue Jul 25, 2016 · 19 comments
Assignees

Comments

@pavelnikolov
Copy link

What do you think about adding new HTTP content-type for jsonlines data.
What about application/jsonl?

@jbaehr
Copy link

jbaehr commented Aug 9, 2019

I'd rather prefer application/json-lines otherwise it may look like a typo ;-)

In addition to the Media Type, a registered structured suffix may be interesting. In my eyes even more useful, to create media types like application/vnd.my-company.some-thing+json-lines.

See also:
https://www.iana.org/assignments/media-types/
https://www.iana.org/assignments/media-type-structured-suffix/

@wardi have you considered filing a registration for a json-lines Media Type and structured suffix at IANA?

@karmakaze
Copy link

karmakaze commented Sep 1, 2019

There is an IETF RFC 7464 for JSON Text Sequences that uses mime type: application/json-seq

It allows prefixing each JSON record with <RS> control character and requires ending each JSON record with <LF>.

Also see: https://en.wikipedia.org/wiki/JSON_streaming

@jbaehr
Copy link

jbaehr commented Nov 7, 2019

This seems like a duplicate of #9. The whole purpose of the Content-Type header is to communicate the media type.

@whlavina
Copy link

The lack of a definitive IANA Media Type for JSON Lines causes some difficulty for those of us using the format. In the interest of pushing the issue, I took the liberty of starting a conversation:
https://mailarchive.ietf.org/arch/msg/json/dWMWD0JDa2HiUYjWjLjrQExeIx4/

Perhaps someone here would like to join that thread?

Disclaimer: I am in no way affiliated with the IANA/IETF. I am merely interested in using the format, correctly.

@sp4ce
Copy link
Collaborator

sp4ce commented Dec 19, 2022

@whlavina the response from Tim Bray was the most helpful and it looks nothing had happened since then. I'll copy the interesting bit here for reference

to register a media type you need to link to a stable specification. The contents of https://jsonlines.org/ probably don’t qualify, so the conventional thing would be to write an Internet-Draft which AFAICT would be the same as json-seq only without the leading "ASCII Record Separator (0x1E)" but retaining the trailing \n.

@sp4ce
Copy link
Collaborator

sp4ce commented Apr 3, 2023

I am linking the relevant RFC to suggest new MIME type for standardisation:

https://www.rfc-editor.org/rfc/rfc6838.html

I propose working on adding the mime type application/jsonl into the standard tree (section 3.1). Adding to the standard tree seems the most convoluted, but also, I think this is where it would fit the best.

Among the two ways they list to get it added to the standard tree:

  1. in the case of registrations associated with IETF specifications,
    approved directly by the IESG, or

  2. registered by a recognized standards-related organization using
    the "Specification Required" IANA registration policy [RFC5226]
    (which implies Expert Review).

I think the second one is the most relevant, which leads to https://www.rfc-editor.org/rfc/rfc5226

https://www.iana.org/form/media-types

@sp4ce sp4ce self-assigned this Apr 3, 2023
@sp4ce sp4ce changed the title New content-type convention suggestion Official MIME content-type support Apr 11, 2023
@sp4ce sp4ce changed the title Official MIME content-type support Standard MIME content-type Apr 11, 2023
@frederikb
Copy link

Hi @sp4ce, good to see that someone is leading the way to an actual RFC!

I've noticied that AWS is (apparently) using JSON Lines for one of their products. I haven't seen a description of the actual output to know whether or not it is compatible with JSON Lines. In any case they are using the mime type application/jsonlines. Thoughts on application/jsonl vs. that one?

@tim-hitchins-ekkosense
Copy link

AWS Claim it's compatible with JSON Lines - it links to the JSON Lines homepage

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/S3DataExport.Output.html

dennisreimann added a commit to dennisreimann/btcpayserver that referenced this issue Nov 19, 2023
There's an [ongoing discussion](wardi/jsonlines#19) about what the MIME type for [JSONL](https://jsonlines.org/) files should be. Making it `application/jsonl` leads to the file being downloaded according to my testing, which prevents browsers from opening them in a new window and parsing them as JSON, which fixes btcpayserver#5488.
NicolasDorier pushed a commit to btcpayserver/btcpayserver that referenced this issue Nov 20, 2023
There's an [ongoing discussion](wardi/jsonlines#19) about what the MIME type for [JSONL](https://jsonlines.org/) files should be. Making it `application/jsonl` leads to the file being downloaded according to my testing, which prevents browsers from opening them in a new window and parsing them as JSON, which fixes #5488.
@dwaite
Copy link

dwaite commented Feb 25, 2024

If there's still interest in doing this, I would recommend an informational track internet-draft (I-D) to describe the jsonlines specification, with an IANA considerations section registering the media type. The idea is that drafts work towards RFCs work towards standards on a long evolutionary track of internet draft to RFC, and potentially to being an internet standard.

IETF wants to deal with immutable and permanently available documents, so you will likely need represent the encoding and parsing requirements authoritatively within the I-D itself, using IETF nomenclature. There's a lot of references to this available, and the JSON Text Sequences RFC is likely an excellent example.

I suspect there will be feedback that some areas are not needed. For example, your UTF-8 encoding rule does not have much left to it once you reference the JSON RFC. That RFC already mandates UTF-8 for everything other than closed ecosystems.At that point, you have to decide whether the application "advice" that they might want to escape the string to work on ASCII transports becomes something you might want to represent as an application note on the jsonlines site, and a discussion you have with the IETF more broadly - after all, it would also affect JSON and json sequence data over such transports.

Conversely, you may want to be quite a bit more specific for the sake of interoperability, such as whether applications MUST be able to consume \r\n line separators, and what application behavior is mandated/desired if invalid JSON text (including things like lines of just whitespace) are encountered within a stream. Variance in behaviors have led to a lot of security issues - imagine if your security compliance or logging components stopped reading a JSON lines sequence at a newline, while your application logic ignored the blank line and kept going.

@finwo
Copy link

finwo commented Feb 26, 2024

What's wrong with what ndjson is trying to implement? Their current standard is application/x-ndjson, which will likely move to application/ndjson in the future when there's more adoption.

https://bugzilla.mozilla.org/show_bug.cgi?id=1603986

@dwaite
Copy link

dwaite commented Feb 26, 2024

The x- prefix on a subtype is intended only for private use, e.g. for types with no expectation of interoperability between implementations. In that sense, your application/x-ndjson may conflict with other people's application/x-ndjson, such as presence or absence of a leading [ or of trailing ,, or even someone deciding they might as well send it in Big5 rather than UTF-8.

The lack of an immutable standard (like a RFC with a number) means that ndjson three years from now may make changes along lines like these for robustness, but implementations do not have a clear way to explain what they are compatible with.

There are plenty of commercial products which use vendor and x-prefixed media types, and which do not attempt to define fixed/robust/interoperable behavior. It is a matter of what this project is going for, which is why my first words were "If there's still interest in doing this".

In terms of ramifications, most SDOs (standard defining organizations) won't touch dependencies which do not have these and other formalisms, and may use things like publication in another SDO (like IETF) as a sign of that. That means ndjson/jsonlines may be used in public facing API, but a large category of interoperable standards work either wouldn't touch it, or will standardize their own similar effort.

@tim-hitchins-ekkosense
Copy link

which will likely move to application/ndjson in the future when there's more adoption

Well that's the problem, it might happen, at some point in the future. Given the usage of JSON lines in various commercial products, we're suggesting we do that formalisation now - or at least start the process very soon!

@wardi
Copy link
Owner

wardi commented Feb 26, 2024

I'd love to see this.

So do we copy-paste JSON-SEQ https://datatracker.ietf.org/doc/html/rfc7464 without the "ASCII Record Separator (0x1E)"? JSON-SEQ discusses detecting truncated records and continuing a fair bit, all of that could be removed in a new RFC.

Conversely, you may want to be quite a bit more specific for the sake of interoperability, such as whether applications MUST be able to consume \r\n line separators, and what application behavior is mandated/desired if invalid JSON text (including things like lines of just whitespace) are encountered within a stream. Variance in behaviors have led to a lot of security issues - imagine if your security compliance or logging components stopped reading a JSON lines sequence at a newline, while your application logic ignored the blank line and kept going.

Rule 3 in https://jsonlines.org/ mentions that a compliant parser will be able to consume \r\n because \r is ignored as surrounding whitespace by a json parser. Doesn't hurt to repeat it though.

Lines of only whitespace are already invalid by rule 2 in https://jsonlines.org/ , but again it doesn't hurt to make this clear.

To be specific let's say that any line that doesn't parse as valid JSON should be treated as an invalid record but still counts as a record for the purpose of numbering the lines.

@GabenGar
Copy link

Should it count as a record? The whole point of something called JSON Lines is that it stores lines of a well defined format called JSON, not arbitrary character sequences. Depending on the nature on malformed data in a line it might as well make all other lines after it invalid and blow up logs with parsing errors noise when the offender is a single line (a whole file).

@timtjtim
Copy link

So do we copy-paste JSON-SEQ

I think RFCs are copyrighted so to copy paste you would need permission of the original author

@whlavina
Copy link

I'm glad to see continued discussion and forward movement. It's interesting to see that YAML just recently (this month) gained IANA media type registration... 22 years after the format was first created. If YAML can do it, JSON Lines can, too! If there's any need for help with the process, maybe we could ask the folks who pushed the YAML RFC?

@tim-hitchins-ekkosense
Copy link

Here's the guidelines on how to write an Internet Draft

https://authors.ietf.org/en/home

@darrelmiller
Copy link

@whlavina You folks are welcome to come join the HTTPAPI mailing list https://datatracker.ietf.org/wg/httpapi/about/ and we can chat about a path to registering this media type. This is where the YAML media type registration RFC was created and we are working towards the OpenAPI one also.

There is ongoing discussion about allowing mediatype registrations to happen in the standards tree without necessarily going through the process of writing an RFC for the format. https://www.ietf.org/archive/id/draft-ietf-mediaman-standards-tree-00.html Although, this format might be simple enough that an RFC would straightforward.

@ferdnyc
Copy link

ferdnyc commented Jul 18, 2024

There is ongoing discussion about allowing mediatype registrations to happen in the standards tree without necessarily going through the process of writing an RFC for the format. https://www.ietf.org/archive/id/draft-ietf-mediaman-standards-tree-00.html Although, this format might be simple enough that an RFC would straightforward.

As of last month, that (expired) draft is replaced by https://www.ietf.org/archive/id/draft-ietf-mediaman-standards-tree-01.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

16 participants