Memory usage while using zarr #689

skgbanga · 2021-01-13T15:42:01Z

Hello,

I am trying to analyze the memory usage of zarr while using google cloud as storage backend. I am noticing that there is spike in memory usage while writing to a zarr archive.

def main():
    with open("foo", "rb") as f:
        data = pickle.load(f).   # this data is about 0.5G
    log_resources('After reading')

    bucket = "..."
    store = GCSStore(bucket)
    root = zarr.group(store=store)
    log_resources('After root')

    name = "foo"
    z = root.create(name, shape=data.shape, chunks=False, dtype=data.dtype, overwrite=True)
    z[:] = data.values
    log_resources('After writing')


if __name__ == "__main__":
    main()

This produces following output: (all the numbers are in GB)

(After reading  ) cur usage: 0.598495232 max usage: 0.584468
(After root     ) cur usage: 0.60073984 max usage: 0.58666
(After writing  ) cur usage: 0.604459008 max usage: 1.48456

I think this is because of the extra memory allocated while zarr is doing compression of data, but I am not sure about that. Note that I have disabled chunking in the above call to create.

Could you confirm that extra memory allocated is because of compression? and it is upper bounded by chunk size.

Thanks.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage while using zarr #689

Memory usage while using zarr #689

skgbanga commented Jan 13, 2021

Memory usage while using zarr #689

Memory usage while using zarr #689

Comments

skgbanga commented Jan 13, 2021