Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage while using zarr #689

Open
skgbanga opened this issue Jan 13, 2021 · 0 comments
Open

Memory usage while using zarr #689

skgbanga opened this issue Jan 13, 2021 · 0 comments

Comments

@skgbanga
Copy link

Hello,

I am trying to analyze the memory usage of zarr while using google cloud as storage backend. I am noticing that there is spike in memory usage while writing to a zarr archive.

def main():
    with open("foo", "rb") as f:
        data = pickle.load(f).   # this data is about 0.5G
    log_resources('After reading')

    bucket = "..."
    store = GCSStore(bucket)
    root = zarr.group(store=store)
    log_resources('After root')

    name = "foo"
    z = root.create(name, shape=data.shape, chunks=False, dtype=data.dtype, overwrite=True)
    z[:] = data.values
    log_resources('After writing')


if __name__ == "__main__":
    main()

This produces following output: (all the numbers are in GB)

(After reading  ) cur usage: 0.598495232 max usage: 0.584468
(After root     ) cur usage: 0.60073984 max usage: 0.58666
(After writing  ) cur usage: 0.604459008 max usage: 1.48456

I think this is because of the extra memory allocated while zarr is doing compression of data, but I am not sure about that. Note that I have disabled chunking in the above call to create.

Could you confirm that extra memory allocated is because of compression? and it is upper bounded by chunk size.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant