Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undocumented and non-standardised default Content-Type application/warc-record #92

Open
JustAnotherArchivist opened this issue Aug 16, 2019 · 2 comments

Comments

@JustAnotherArchivist
Copy link
Contributor

warcio uses a default Content-Type value for WARC records of application/warc-record. This MIME type is not documented or specified anywhere; the WARC spec only mentions application/warc as the MIME type for WARC files and application/warc-fields for warcinfo and metadata records (though it is ambiguous on whether that is required or recommended).

@ikreymer
Copy link
Member

ikreymer commented Mar 1, 2020

Not sure what would be a better option here.. It is a fallback if no other Content-Type is specified and/or its a non-standard record. application/warc-fields is for the warcinfo style fields, which this is not. and application/warc makes sense for the content-type for the WARC itself, but not for the payload of the record.. I suppose it could be application/octet-stream but that would imply that its binary.

@JustAnotherArchivist
Copy link
Contributor Author

The Content-Type header is optional, so omitting it would be one option. application/octet-stream also seems sensible to me. WARC is a byte-oriented file format, so any payload must also be a collection of bytes. While the underlying data could be bit-based, it must be padded to bytes, which makes the container an octet-stream again. The WARC specification also mentions:

If the media type remains unknown, the reader should treat it as type “application/octet-stream”.

Personally, I think omitting the header would be the best option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants