Skip to content

Commit

Permalink
Merge pull request #42 from alimanfoo/filters
Browse files Browse the repository at this point in the history
Implementation of filters.
  • Loading branch information
alimanfoo committed Sep 2, 2016
2 parents 4529842 + e4c2213 commit 976c291
Show file tree
Hide file tree
Showing 36 changed files with 5,824 additions and 2,454 deletions.
3 changes: 3 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[run]
omit = zarr/meta_v1.py

2 changes: 1 addition & 1 deletion c-blosc
Submodule c-blosc updated 60 files
+13 −9 ANNOUNCE.rst
+5 −0 CMakeLists.txt
+25 −16 LICENSES/BLOSC.txt
+5 −5 README_HEADER.rst
+16 −0 RELEASE_NOTES.rst
+2 −2 blosc/CMakeLists.txt
+44 −31 blosc/blosc.c
+4 −4 blosc/blosc.h
+3 −4 blosc/shuffle.c
+0 −68 internal-complibs/zstd-0.7.4/README.md
+0 −77 internal-complibs/zstd-0.7.4/common/error_public.h
+0 −162 internal-complibs/zstd-0.7.4/compress/.debug/zstd_stats.h
+0 −113 internal-complibs/zstd-0.7.4/dictBuilder/zdict.h
+0 −140 internal-complibs/zstd-0.7.4/legacy/zstd_legacy.h
+13 −9 internal-complibs/zstd-1.0.0/LICENSE
+22 −43 internal-complibs/zstd-1.0.0/Makefile
+61 −0 internal-complibs/zstd-1.0.0/README.md
+0 −0 internal-complibs/zstd-1.0.0/common/bitstream.h
+29 −38 internal-complibs/zstd-1.0.0/common/entropy_common.c
+11 −32 internal-complibs/zstd-1.0.0/common/error_private.h
+59 −0 internal-complibs/zstd-1.0.0/common/error_public.h
+1 −1 internal-complibs/zstd-1.0.0/common/fse.h
+6 −11 internal-complibs/zstd-1.0.0/common/fse_decompress.c
+1 −1 internal-complibs/zstd-1.0.0/common/huf.h
+36 −43 internal-complibs/zstd-1.0.0/common/mem.h
+30 −17 internal-complibs/zstd-1.0.0/common/xxhash.c
+71 −35 internal-complibs/zstd-1.0.0/common/xxhash.h
+29 −35 internal-complibs/zstd-1.0.0/common/zbuff.h
+24 −32 internal-complibs/zstd-1.0.0/common/zstd_common.c
+68 −76 internal-complibs/zstd-1.0.0/common/zstd_internal.h
+1 −1 internal-complibs/zstd-1.0.0/compress/fse_compress.c
+25 −54 internal-complibs/zstd-1.0.0/compress/huf_compress.c
+43 −51 internal-complibs/zstd-1.0.0/compress/zbuff_compress.c
+722 −631 internal-complibs/zstd-1.0.0/compress/zstd_compress.c
+82 −224 internal-complibs/zstd-1.0.0/compress/zstd_opt.h
+29 −32 internal-complibs/zstd-1.0.0/decompress/huf_decompress.c
+36 −78 internal-complibs/zstd-1.0.0/decompress/zbuff_decompress.c
+604 −385 internal-complibs/zstd-1.0.0/decompress/zstd_decompress.c
+0 −0 internal-complibs/zstd-1.0.0/dictBuilder/divsufsort.c
+5 −5 internal-complibs/zstd-1.0.0/dictBuilder/divsufsort.h
+149 −189 internal-complibs/zstd-1.0.0/dictBuilder/zdict.c
+111 −0 internal-complibs/zstd-1.0.0/dictBuilder/zdict.h
+259 −0 internal-complibs/zstd-1.0.0/legacy/zstd_legacy.h
+31 −62 internal-complibs/zstd-1.0.0/legacy/zstd_v01.c
+12 −32 internal-complibs/zstd-1.0.0/legacy/zstd_v01.h
+35 −207 internal-complibs/zstd-1.0.0/legacy/zstd_v02.c
+12 −32 internal-complibs/zstd-1.0.0/legacy/zstd_v02.h
+34 −205 internal-complibs/zstd-1.0.0/legacy/zstd_v03.c
+12 −32 internal-complibs/zstd-1.0.0/legacy/zstd_v03.h
+44 −115 internal-complibs/zstd-1.0.0/legacy/zstd_v04.c
+13 −33 internal-complibs/zstd-1.0.0/legacy/zstd_v04.h
+37 −126 internal-complibs/zstd-1.0.0/legacy/zstd_v05.c
+9 −31 internal-complibs/zstd-1.0.0/legacy/zstd_v05.h
+15 −226 internal-complibs/zstd-1.0.0/legacy/zstd_v06.c
+9 −31 internal-complibs/zstd-1.0.0/legacy/zstd_v06.h
+4,745 −0 internal-complibs/zstd-1.0.0/legacy/zstd_v07.c
+174 −0 internal-complibs/zstd-1.0.0/legacy/zstd_v07.h
+0 −0 internal-complibs/zstd-1.0.0/libzstd.pc.in
+245 −133 internal-complibs/zstd-1.0.0/zstd.h
+2 −2 tests/Makefile
2 changes: 1 addition & 1 deletion docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ API reference
api/core
api/hierarchy
api/storage
api/compressors
api/codecs
api/sync
27 changes: 27 additions & 0 deletions docs/api/codecs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Compressors and filters (``zarr.codecs``)
=========================================
.. module:: zarr.codecs

This module contains compressor and filter classes for use with Zarr.

Other codecs can be registered dynamically with Zarr. All that is required
is to implement a class that provides the same interface as the classes listed
below, and then to add the class to the ``codec_registry``. See the source
code of this module for details.

.. autoclass:: Codec

.. automethod:: encode
.. automethod:: decode
.. automethod:: get_config
.. automethod:: from_config

.. autoclass:: Blosc
.. autoclass:: Zlib
.. autoclass:: BZ2
.. autoclass:: LZMA
.. autoclass:: Delta
.. autoclass:: FixedScaleOffset
.. autoclass:: Quantize
.. autoclass:: PackBits
.. autoclass:: Categorize
23 changes: 0 additions & 23 deletions docs/api/compressors.rst

This file was deleted.

1 change: 1 addition & 0 deletions docs/api/core.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ The Array class (``zarr.core``)
.. automethod:: __setitem__
.. automethod:: resize
.. automethod:: append
.. automethod:: view
2 changes: 2 additions & 0 deletions docs/api/storage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ can be used as a Zarr array store.
.. autoclass:: DictStore
.. autoclass:: DirectoryStore
.. autoclass:: ZipStore

.. autofunction:: migrate_1to2
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Highlights
* Read an array concurrently from multiple threads or processes.
* Write to an array concurrently from multiple threads or processes.
* Organize arrays into hierarchies via groups.
* Use filters to preprocess data and improve compression.

Status
------
Expand Down
21 changes: 18 additions & 3 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,28 @@ Support has been added for organizing arrays into hierarchies via groups. See
the tutorial section on :ref:`tutorial_groups` and the :mod:`zarr.hierarchy`
API docs for more information.

To accommodate support for hierarchies the Zarr format has been modified. See
the :ref:`spec_v2` for more information.
Filters
~~~~~~~

Support has been added for configuring filters to preprocess chunk data prior
to compression. See the tutorial section on :ref:`tutorial_filters` and the
:mod:`zarr.filters` API docs for more information.

Other changes
~~~~~~~~~~~~~

* The bundled Blosc library has been upgraded to version 1.10.2.
To accommodate support for hierarchies and filters, the Zarr metadata format
has been modified. See the :ref:`spec_v2` for more information. To migrate an
array stored using Zarr version 1.x, use the :func:`zarr.storage.migrate_1to2`
function.

The bundled Blosc library has been upgraded to version 1.10.2.

Acknowledgments
~~~~~~~~~~~~~~~

Thanks to Matthew Rocklin (mrocklin_), Stephan Hoyer (shoyer_) and
Francesc Alted (FrancescAlted_) for contributions and comments.

.. _release_1.1.0:

Expand Down
61 changes: 41 additions & 20 deletions docs/spec/v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Zarr storage specification version 2
====================================

This document provides a technical specification of the protocol and format
used for storing a Zarr array. The key words "MUST", "MUST NOT", "REQUIRED",
used for storing Zarr arrays. The key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in `RFC 2119
<https://www.ietf.org/rfc/rfc2119.txt>`_.
Expand Down Expand Up @@ -56,42 +56,47 @@ chunks
dtype
A string or list defining a valid data type for the array. See also
the subsection below on data type encoding.
compression
A string identifying the primary compression library used to compress
each chunk of the array.
compression_opts
An integer, string or dictionary providing options to the primary
compression library.
compressor
A JSON object identifying the primary compression codec and providing
configuration parameters, or ``null`` if no compressor is to be used.
The object MUST contain an ``"id"`` key identifying the codec to be used.
fill_value
A scalar value providing the default value to use for uninitialized
portions of the array.
portions of the array, or ``null`` if no fill_value is to be used.
order
Either "C" or "F", defining the layout of bytes within each chunk of the
array. "C" means row-major order, i.e., the last dimension varies fastest;
"F" means column-major order, i.e., the first dimension varies fastest.
filters
A list of JSON objects providing codec configurations, or ``null`` if no
filters are to be applied. Each codec configuration object MUST contain a
``"id"`` key identifying the codec to be used.

Other keys MUST NOT be present within the metadata object.

For example, the JSON object below defines a 2-dimensional array of 64-bit
little-endian floating point numbers with 10000 rows and 10000 columns, divided
into chunks of 1000 rows and 1000 columns (so there will be 100 chunks in total
arranged in a 10 by 10 grid). Within each chunk the data are laid out in C
contiguous order, and each chunk is compressed using the Blosc compression
library::
contiguous order. Each chunk is encoded using a delta filter and compressed
using the Blosc compression library prior to storage::

{
"chunks": [
1000,
1000
],
"compression": "blosc",
"compression_opts": {
"clevel": 5,
"compressor": {
"id": "blosc",
"cname": "lz4",
"clevel": 5,
"shuffle": 1
},
"dtype": "<f8",
"fill_value": null,
"fill_value": "NaN",
"filters": [
{"id": "delta", "dtype": "<f8", "astype": "<f4"}
],
"order": "C",
"shape": [
10000,
Expand Down Expand Up @@ -142,7 +147,6 @@ Positive Infinity ``"Infinity"``
Negative Infinity ``"-Infinity"``
================= ===============


Chunks
~~~~~~

Expand Down Expand Up @@ -176,6 +180,16 @@ array dimension is not exactly divisible by the length of the corresponding
chunk dimension then some chunks will overhang the edge of the array. The
contents of any chunk region falling outside the array are undefined.

Filters
~~~~~~~

Optionally a sequence of one or more filters can be used to transform chunk
data prior to compression. When storing data, filters are applied in the order
specified in array metadata to encode data, then the encoded data are passed to
the primary compressor. When retrieving data, stored chunk data are
decompressed by the primary compressor then decoded using filters in the
reverse order.

Hierarchies
-----------

Expand Down Expand Up @@ -279,7 +293,7 @@ Create an array::
>>> import zarr
>>> store = zarr.DirectoryStore('example')
>>> a = zarr.create(shape=(20, 20), chunks=(10, 10), dtype='i4',
... fill_value=42, compression='zlib', compression_opts=1,
... fill_value=42, compressor=zarr.Zlib(level=1),
... store=store, overwrite=True)

No chunks are initialized yet, so only the ".zarray" and ".zattrs" keys
Expand All @@ -297,10 +311,13 @@ Inspect the array metadata::
10,
10
],
"compression": "zlib",
"compression_opts": 1,
"compressor": {
"id": "zlib",
"level": 1
},
"dtype": "<i4",
"fill_value": 42,
"filters": null,
"order": "C",
"shape": [
20,
Expand Down Expand Up @@ -452,6 +469,10 @@ Changes in version 2
* Added support for storing multiple arrays in the same store and organising
arrays into hierarchies using groups.
* Array metadata is now stored under the ".zarray" key instead of the "meta"
key
key.
* Custom attributes are now stored under the ".zattrs" key instead of the
"attrs" key
"attrs" key.
* Added support for filters.
* Changed encoding of "fill_value" field within array metadata.
* Changed encoding of compressor information within array metadata to be
consistent with representation of filter information.

0 comments on commit 976c291

Please sign in to comment.