-
-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact JSON in .zarray #704
Comments
@andreasg123 : I assume it's safe to take silence as no objections to opening a PR ;) All the best. ~Josh |
I'll take a contrarian perspective: I don't mind the difference between storing/downloading/uploading a 3 KB and a 1 KB metadata file (or even 30 KB vs 10 KB), and I like readable human JSON. This is a tiny little bit of data compared to even a single array chunk. |
Yeah I have to admit that I'm also 👎 on the idea of compactifying the json. For reference, the typical size of our zarr stores is 1 GB - 1 TB. If you're making many tiny zarr stores, you might not be using zarr in a optimal way. |
Just wanted to +1 @shoyer. Quickly inspecting array metadata is just a |
As I wrote when I opened the issue, this is mostly an issue with categorized string arrays. The application that I have in mind would store 100,000s or millions of string labels with 100s or 1000s of different strings. As those string labels also have x/y coordinates, Zarr seems to be a good way to store them. As Zarr.js doesn't support filters, this has become less of an issue for me (need to have a separate mapping file anyway). If you want to support such an application, one could write compact JSON if there are more labels than a threshold, maybe 100, and keep the human-readable format otherwise. |
Good points. Perhaps we could have an option for this. Similar to xarray's option machinery. Like |
I think having an option to produce compact json would be useful in some scenarios. One such scenario is when you have separate backends for data and metadata. Some of the common high capacity backends are http based and have large latencies, and so there is a lot of value in separating data and metadata, or duplicating metadata into some cache. Keeping metadata in some sort of memory backend, like redis, improves overall latency. When you have that separation, size of the metadata payload starts to matter a lot more. I assume it's easy enough to add a "compact step" outside of |
When testing v3 I'm really missing the fact that the JSON it's writing isn't currently formatted. I keep having to format it in my editor every time I want to inspect it. |
good point @will-moore; #1952 should address this |
JSON output is deliberately made human-readable with much whitespace. That produces large
.zarray
files with string arrays and categorize. In one small example with about 150 different strings, the human-readable.zarray
was 3837 bytes and the compact version was 1284 bytes. With a larger variety of strings, the difference would be larger.As Zarr is a storage format that isn't intended for human readability, I would like to propose to write JSON with
indent=None, separators=(",", ":")
.The text was updated successfully, but these errors were encountered: