-
-
Notifications
You must be signed in to change notification settings - Fork 278
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big data support #924
Comments
Thanks for sharing. This sounds like a perfect application for Zarr.
The metadata size is independent of the array size, as you can see from the spec. Arbitrarily large arrays can be stored in Zarr. This is a fundamental goal of the project.
Zarr has no concept of coordinates. Just groups and arrays. Perhaps you're thinking of Xarray?
Can you clarify what you mean by "query"? Zarr-python supports accessing arrays via numpy-style indexing as described in the docs. The speed at which data are returned will likely depend entirely on your storage system.
This has me worried. There are very few storage media that are happy with so many files / objects. How do you plan to store you data. More details would be helpful. What are the explicit lat, lon, time dimensions and chunk sizes you have in mind? |
Thanks, for your answer. I probably was not clear in my questions as I mixed concepts from zarr and from xarray.
|
Yes. If you have a 3D Zarr Array and use numpy indexing to retrieve a value by position, e.g.
I don't understand what you mean by "map coordinates". Can you clarify? Why do you think xarray will have to read 3 million records? Can you say more about the access pattern you have in mind? |
If your data are on an irregular grid (or for any other reason you need to look up values by something other than the index in the array), you'll need to use xarray, which can read from a zarr array with particular metadata, although IIRC it stores its coordinate indices in the zarr metadata i.e. JSON, so depending on how often it has to deserialise 24MB of coordinates, there might be some issues there). If your data aren't on a grid at all, I don't think zarr or xarray can help you. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Hello,
We plan to generate a super large data array (lat, lon, time) from the 56.000 Sentinel-2 tiles (per year). The longitude variable would hold about 3 million coordinates and about 2 billion chunks.
Thank you very much for your support.
The text was updated successfully, but these errors were encountered: