Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use numcodecs #139

Merged
merged 5 commits into from
Apr 24, 2017
Merged

Use numcodecs #139

merged 5 commits into from
Apr 24, 2017

Conversation

alimanfoo
Copy link
Member

@alimanfoo alimanfoo commented Mar 2, 2017

This PR addresses #74 by removing the zarr.codecs module and using numcodecs instead, which is now a dependency.

A couple of issues came up from initial work that need to be resolved upstream: https://github.com/alimanfoo/numcodecs/issues/28, https://github.com/alimanfoo/numcodecs/issues/29.

Also I really should create a data fixture (#138) prior to these changes, then rebase and run tests, to confirm that data created prior to this PR can still be read afterwards.

TODO:

  • Bring back test coverage to 100%
  • Reinstate tests with Quantize filter
  • Remove aliasing of 'gzip' codec
  • Rebase with data fixtures (needs PR to address Data fixture #138)

@alimanfoo alimanfoo added this to the v2.2 milestone Mar 2, 2017
zarr/core.py Outdated
@@ -506,8 +506,8 @@ def __setitem__(self, item, value):
>>> z = zarr.zeros(100000000, chunks=1000000, dtype='i4')
>>> z
Array((100000000,), int32, chunks=(1000000,), order=C)
nbytes: 381.5M; nbytes_stored: 301; ratio: 1328903.7; initialized: 0/100
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
nbytes: 381.5M; nbytes_stored: 325; ratio: 1230769.2; initialized: 0/100
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ellipsis here.

zarr/core.py Outdated
@@ -528,8 +528,8 @@ def __setitem__(self, item, value):
>>> z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
nbytes: 381.5M; nbytes_stored: 347; ratio: 1152737.8; initialized: 0/100
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ellipsis here.

zarr/core.py Outdated
store: dict
>>> z.append(np.vstack([a, a]), axis=1)
(20000, 2000)
>>> z
Array((20000, 2000), int32, chunks=(1000, 100), order=C)
nbytes: 152.6M; nbytes_stored: 7.5M; ratio: 20.3; initialized: 400/400
compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
compressor: Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
store: dict
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify these examples output in line with tutorial.

@alimanfoo
Copy link
Member Author

cc @jakirkham

@jakirkham
Copy link
Member

Do we already have all the codecs from Zarr in Numcodecs or do a few still need to transition?

@alimanfoo
Copy link
Member Author

All codecs are ported except Quantize which I forgot (https://github.com/alimanfoo/numcodecs/issues/28).

@alimanfoo
Copy link
Member Author

I've upgraded the numcodecs dependency to 0.2.0, which now includes Quantize and aliases the Zlib codec to 'gzip' for compatibility with h5py. So now back to full feature parity with Zarr when codecs were bundled.

I still want to create a data fixture before merging this, as a way to verify that this PR retains compatibility with data saved with the previous version. I'll do that then rebase this PR.

@alimanfoo alimanfoo mentioned this pull request Apr 6, 2017
@alimanfoo
Copy link
Member Author

I've rebased this on master with the new data fixture, all tests pass which gives some confidence that switching to use numcodecs will not break ability to read any previously written data.

@alimanfoo alimanfoo merged commit 696cb9c into master Apr 24, 2017
@alimanfoo alimanfoo deleted the use-numcodecs branch April 24, 2017 23:17
@alimanfoo alimanfoo mentioned this pull request Apr 24, 2017
@alimanfoo alimanfoo changed the title WIP Use numcodecs Use numcodecs Apr 24, 2017
@alimanfoo alimanfoo mentioned this pull request Oct 24, 2017
4 tasks
@alimanfoo alimanfoo added enhancement New features or improvements release notes done Automatically applied to PRs which have release notes. labels Nov 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements release notes done Automatically applied to PRs which have release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants