You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a very large (~1 Pb) zarr store on Google Cloud Storage here, containing output from a reanalysis (weather model fields like air temperature, wind velocity, etc). I am using xarray and generally following this guidance to append to the zarr store along the time dimension. The existing dataset has ~30 years of data at 3 hour frequency and 1/4 degree resolution, and I just want to append ~two more months of data, so appending a relatively small amount of data to a huge existing zarr store.
I tried this locally with a small example and it was fine (given as an example below), but I can't seem to do it with this large dataset. I'm executing the initial append step (highlighted as problematic down below) from a c2-standard-60 node (240 GB RAM) on google cloud, but it never properly completes the task, and the node eventually becomes unresponsive. Any tips on how to do something like this would be very helpful, and please let me know if I should post this somewhere else. Thanks in advance!
Steps to reproduce
importxarrayasxr# this is the existing datasetds=xr.open_zarr(
"gcs://noaa-ufs-gefsv13replay/ufs-hr1/0.25-degree/03h-freq/zarr/fv3.zarr",
storage_options={"token": "anon"},
)
# grab just 2 time stamps of the data, store locally# this is an example to mimic the existing dataset on GCSds[["tmp"]].isel(time=slice(2)).to_zarr("test.zarr")
# now get the next two time stamps and append# this is the step that never completes for the real thingxds=ds[["tmp"]].isel(time=slice(2, 4)).load();
(np.nan*xds).to_zarr("test.zarr", append_dim="time", compute=False) # <- this is the operation that never completes# this is what I'll eventually do to actually fill the appended container with valuesforiinrange(2,4):
region= {
"time":slice(i, i+1),
"pfull": slice(None, None),
"grid_yt": slice(None, None),
"grid_xt": slice(None, None),
}
xds.isel(time=[i-2]).to_zarr("test.zarr", region=region)
Additional output
No response
The text was updated successfully, but these errors were encountered:
Zarr version
v2.16.1
Numcodecs version
v0.12.0
Python Version
3.11.6
Operating System
Linux
Installation
using conda
Description
I have a very large (~1 Pb) zarr store on Google Cloud Storage here, containing output from a reanalysis (weather model fields like air temperature, wind velocity, etc). I am using xarray and generally following this guidance to append to the zarr store along the time dimension. The existing dataset has ~30 years of data at 3 hour frequency and 1/4 degree resolution, and I just want to append ~two more months of data, so appending a relatively small amount of data to a huge existing zarr store.
I tried this locally with a small example and it was fine (given as an example below), but I can't seem to do it with this large dataset. I'm executing the initial append step (highlighted as problematic down below) from a c2-standard-60 node (240 GB RAM) on google cloud, but it never properly completes the task, and the node eventually becomes unresponsive. Any tips on how to do something like this would be very helpful, and please let me know if I should post this somewhere else. Thanks in advance!
Steps to reproduce
Additional output
No response
The text was updated successfully, but these errors were encountered: