Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact JSON in .zarray #704

Closed
andreasg123 opened this issue Feb 20, 2021 · 9 comments · Fixed by #1952
Closed

Compact JSON in .zarray #704

andreasg123 opened this issue Feb 20, 2021 · 9 comments · Fixed by #1952

Comments

@andreasg123
Copy link

JSON output is deliberately made human-readable with much whitespace. That produces large .zarray files with string arrays and categorize. In one small example with about 150 different strings, the human-readable .zarray was 3837 bytes and the compact version was 1284 bytes. With a larger variety of strings, the difference would be larger.

As Zarr is a storage format that isn't intended for human readability, I would like to propose to write JSON with indent=None, separators=(",", ":").

@joshmoore
Copy link
Member

@andreasg123 : I assume it's safe to take silence as no objections to opening a PR ;) All the best. ~Josh

@shoyer
Copy link
Contributor

shoyer commented Mar 5, 2021

I'll take a contrarian perspective: I don't mind the difference between storing/downloading/uploading a 3 KB and a 1 KB metadata file (or even 30 KB vs 10 KB), and I like readable human JSON. This is a tiny little bit of data compared to even a single array chunk.

@rabernat
Copy link
Contributor

rabernat commented Mar 5, 2021

Yeah I have to admit that I'm also 👎 on the idea of compactifying the json.

For reference, the typical size of our zarr stores is 1 GB - 1 TB. If you're making many tiny zarr stores, you might not be using zarr in a optimal way.

@manzt
Copy link
Member

manzt commented Mar 5, 2021

Just wanted to +1 @shoyer. Quickly inspecting array metadata is just a curl or cat away. If JSON array metadata is comparable to the chunk size, zarr might not be a good fit as a format.

@andreasg123
Copy link
Author

As I wrote when I opened the issue, this is mostly an issue with categorized string arrays. The application that I have in mind would store 100,000s or millions of string labels with 100s or 1000s of different strings. As those string labels also have x/y coordinates, Zarr seems to be a good way to store them.

As Zarr.js doesn't support filters, this has become less of an issue for me (need to have a separate mapping file anyway). If you want to support such an application, one could write compact JSON if there are more labels than a threshold, maybe 100, and keep the human-readable format otherwise.

@rabernat
Copy link
Contributor

rabernat commented Mar 5, 2021

Good points. Perhaps we could have an option for this. Similar to xarray's option machinery. Like zarr.set_options(compact_json=True).

@Kirill888
Copy link

I think having an option to produce compact json would be useful in some scenarios.

One such scenario is when you have separate backends for data and metadata. Some of the common high capacity backends are http based and have large latencies, and so there is a lot of value in separating data and metadata, or duplicating metadata into some cache. Keeping metadata in some sort of memory backend, like redis, improves overall latency. When you have that separation, size of the metadata payload starts to matter a lot more.

I assume it's easy enough to add a "compact step" outside of zarr module, but it would be good to have it as built-in option.

@will-moore
Copy link

When testing v3 I'm really missing the fact that the JSON it's writing isn't currently formatted. I keep having to format it in my editor every time I want to inspect it.
Is it planned to add back JSON formatting into v3?
Thanks!

@d-v-b
Copy link
Contributor

d-v-b commented Jun 5, 2024

good point @will-moore; #1952 should address this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants