Compact JSON in .zarray #704

andreasg123 · 2021-02-20T03:09:13Z

JSON output is deliberately made human-readable with much whitespace. That produces large .zarray files with string arrays and categorize. In one small example with about 150 different strings, the human-readable .zarray was 3837 bytes and the compact version was 1284 bytes. With a larger variety of strings, the difference would be larger.

As Zarr is a storage format that isn't intended for human readability, I would like to propose to write JSON with indent=None, separators=(",", ":").

The text was updated successfully, but these errors were encountered:

joshmoore · 2021-03-05T16:19:46Z

@andreasg123 : I assume it's safe to take silence as no objections to opening a PR ;) All the best. ~Josh

shoyer · 2021-03-05T17:13:19Z

I'll take a contrarian perspective: I don't mind the difference between storing/downloading/uploading a 3 KB and a 1 KB metadata file (or even 30 KB vs 10 KB), and I like readable human JSON. This is a tiny little bit of data compared to even a single array chunk.

rabernat · 2021-03-05T17:17:18Z

Yeah I have to admit that I'm also 👎 on the idea of compactifying the json.

For reference, the typical size of our zarr stores is 1 GB - 1 TB. If you're making many tiny zarr stores, you might not be using zarr in a optimal way.

manzt · 2021-03-05T17:19:55Z

Just wanted to +1 @shoyer. Quickly inspecting array metadata is just a curl or cat away. If JSON array metadata is comparable to the chunk size, zarr might not be a good fit as a format.

andreasg123 · 2021-03-05T17:30:51Z

As I wrote when I opened the issue, this is mostly an issue with categorized string arrays. The application that I have in mind would store 100,000s or millions of string labels with 100s or 1000s of different strings. As those string labels also have x/y coordinates, Zarr seems to be a good way to store them.

As Zarr.js doesn't support filters, this has become less of an issue for me (need to have a separate mapping file anyway). If you want to support such an application, one could write compact JSON if there are more labels than a threshold, maybe 100, and keep the human-readable format otherwise.

rabernat · 2021-03-05T17:33:35Z

Good points. Perhaps we could have an option for this. Similar to xarray's option machinery. Like zarr.set_options(compact_json=True).

Kirill888 · 2023-11-30T09:36:37Z

I think having an option to produce compact json would be useful in some scenarios.

One such scenario is when you have separate backends for data and metadata. Some of the common high capacity backends are http based and have large latencies, and so there is a lot of value in separating data and metadata, or duplicating metadata into some cache. Keeping metadata in some sort of memory backend, like redis, improves overall latency. When you have that separation, size of the metadata payload starts to matter a lot more.

I assume it's easy enough to add a "compact step" outside of zarr module, but it would be good to have it as built-in option.

will-moore · 2024-06-05T16:35:47Z

When testing v3 I'm really missing the fact that the JSON it's writing isn't currently formatted. I keep having to format it in my editor every time I want to inspect it.
Is it planned to add back JSON formatting into v3?
Thanks!

d-v-b · 2024-06-05T17:15:45Z

good point @will-moore; #1952 should address this

d-v-b mentioned this issue Jun 5, 2024

[v3] add json indentation to config #1952

Merged

6 tasks

jhamman closed this as completed in #1952 Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact JSON in .zarray #704

Compact JSON in .zarray #704

andreasg123 commented Feb 20, 2021

joshmoore commented Mar 5, 2021

shoyer commented Mar 5, 2021

rabernat commented Mar 5, 2021 •

edited

Loading

manzt commented Mar 5, 2021 •

edited

Loading

andreasg123 commented Mar 5, 2021

rabernat commented Mar 5, 2021

Kirill888 commented Nov 30, 2023

will-moore commented Jun 5, 2024

d-v-b commented Jun 5, 2024

Compact JSON in .zarray #704

Compact JSON in .zarray #704

Comments

andreasg123 commented Feb 20, 2021

joshmoore commented Mar 5, 2021

shoyer commented Mar 5, 2021

rabernat commented Mar 5, 2021 • edited Loading

manzt commented Mar 5, 2021 • edited Loading

andreasg123 commented Mar 5, 2021

rabernat commented Mar 5, 2021

Kirill888 commented Nov 30, 2023

will-moore commented Jun 5, 2024

d-v-b commented Jun 5, 2024

rabernat commented Mar 5, 2021 •

edited

Loading

manzt commented Mar 5, 2021 •

edited

Loading