You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to analyze the memory usage of zarr while using google cloud as storage backend. I am noticing that there is spike in memory usage while writing to a zarr archive.
defmain():
withopen("foo", "rb") asf:
data=pickle.load(f). # this data is about 0.5Glog_resources('After reading')
bucket="..."store=GCSStore(bucket)
root=zarr.group(store=store)
log_resources('After root')
name="foo"z=root.create(name, shape=data.shape, chunks=False, dtype=data.dtype, overwrite=True)
z[:] =data.valueslog_resources('After writing')
if__name__=="__main__":
main()
This produces following output: (all the numbers are in GB)
(After reading ) cur usage: 0.598495232 max usage: 0.584468
(After root ) cur usage: 0.60073984 max usage: 0.58666
(After writing ) cur usage: 0.604459008 max usage: 1.48456
I think this is because of the extra memory allocated while zarr is doing compression of data, but I am not sure about that. Note that I have disabled chunking in the above call to create.
Could you confirm that extra memory allocated is because of compression? and it is upper bounded by chunk size.
Thanks.
The text was updated successfully, but these errors were encountered:
Hello,
I am trying to analyze the memory usage of zarr while using google cloud as storage backend. I am noticing that there is spike in memory usage while writing to a zarr archive.
This produces following output: (all the numbers are in GB)
I think this is because of the extra memory allocated while zarr is doing compression of data, but I am not sure about that. Note that I have disabled chunking in the above call to
create
.Could you confirm that extra memory allocated is because of compression? and it is upper bounded by chunk size.
Thanks.
The text was updated successfully, but these errors were encountered: