Skip to content

Commit

Permalink
Merge pull request #36 from alimanfoo/blosc_upgrade_20160721
Browse files Browse the repository at this point in the history
upgrade c-blosc to 1.10.0; change default c-blosc compressor to lz4
  • Loading branch information
alimanfoo committed Jul 22, 2016
2 parents d103d5b + 0aeb1b8 commit 701b893
Show file tree
Hide file tree
Showing 18 changed files with 2,069 additions and 643 deletions.
2 changes: 1 addition & 1 deletion c-blosc
Submodule c-blosc updated 67 files
+7 −11 ANNOUNCE.rst
+32 −3 CMakeLists.txt
+5 −5 README.rst
+2 −1 README_HEADER.rst
+53 −2 RELEASE_NOTES.rst
+1 −2 RELEASING.rst
+10 −4 appveyor.yml
+88 −0 appveyor/run_with_env.cmd
+12 −0 bench/CMakeLists.txt
+6 −3 bench/bench.c
+17 −10 bench/plot-speeds.py
+21 −0 blosc/CMakeLists.txt
+128 −54 blosc/blosc.c
+14 −9 blosc/blosc.h
+1 −0 blosc/config.h.in
+1 −1 cmake/FindLZ4.cmake
+10 −0 cmake/FindZstd.cmake
+14 −15 examples/many_compressors.c
+2 −2 examples/multithread.c
+26 −0 internal-complibs/zstd-0.7.4/LICENSE
+136 −0 internal-complibs/zstd-0.7.4/Makefile
+68 −0 internal-complibs/zstd-0.7.4/README.md
+414 −0 internal-complibs/zstd-0.7.4/common/bitstream.h
+231 −0 internal-complibs/zstd-0.7.4/common/entropy_common.c
+125 −0 internal-complibs/zstd-0.7.4/common/error_private.h
+77 −0 internal-complibs/zstd-0.7.4/common/error_public.h
+628 −0 internal-complibs/zstd-0.7.4/common/fse.h
+331 −0 internal-complibs/zstd-0.7.4/common/fse_decompress.c
+228 −0 internal-complibs/zstd-0.7.4/common/huf.h
+377 −0 internal-complibs/zstd-0.7.4/common/mem.h
+854 −0 internal-complibs/zstd-0.7.4/common/xxhash.c
+273 −0 internal-complibs/zstd-0.7.4/common/xxhash.h
+197 −0 internal-complibs/zstd-0.7.4/common/zbuff.h
+475 −0 internal-complibs/zstd-0.7.4/common/zstd.h
+91 −0 internal-complibs/zstd-0.7.4/common/zstd_common.c
+238 −0 internal-complibs/zstd-0.7.4/common/zstd_internal.h
+162 −0 internal-complibs/zstd-0.7.4/compress/.debug/zstd_stats.h
+807 −0 internal-complibs/zstd-0.7.4/compress/fse_compress.c
+577 −0 internal-complibs/zstd-0.7.4/compress/huf_compress.c
+327 −0 internal-complibs/zstd-0.7.4/compress/zbuff_compress.c
+3,074 −0 internal-complibs/zstd-0.7.4/compress/zstd_compress.c
+1,046 −0 internal-complibs/zstd-0.7.4/compress/zstd_opt.h
+894 −0 internal-complibs/zstd-0.7.4/decompress/huf_decompress.c
+294 −0 internal-complibs/zstd-0.7.4/decompress/zbuff_decompress.c
+1,362 −0 internal-complibs/zstd-0.7.4/decompress/zstd_decompress.c
+1,913 −0 internal-complibs/zstd-0.7.4/dictBuilder/divsufsort.c
+67 −0 internal-complibs/zstd-0.7.4/dictBuilder/divsufsort.h
+1,045 −0 internal-complibs/zstd-0.7.4/dictBuilder/zdict.c
+113 −0 internal-complibs/zstd-0.7.4/dictBuilder/zdict.h
+140 −0 internal-complibs/zstd-0.7.4/legacy/zstd_legacy.h
+2,178 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v01.c
+100 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v01.h
+3,748 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v02.c
+99 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v02.h
+3,389 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v03.c
+99 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v03.h
+4,056 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v04.c
+148 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v04.h
+4,325 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v05.c
+171 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v05.h
+4,581 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v06.c
+185 −0 internal-complibs/zstd-0.7.4/legacy/zstd_v06.h
+14 −0 internal-complibs/zstd-0.7.4/libzstd.pc.in
+1 −0 tests/CMakeLists.txt
+7 −1 tests/Makefile
+2 −0 tests/print_versions.c
+1 −0 tests/test_compressor.c
15 changes: 15 additions & 0 deletions docs/release.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,21 @@
Release notes
=============

.. _release_1.1.0:

1.1.0
-----

* The bundled Blosc library has been upgraded to version 1.10.0. The 'zstd'
internal compression library is now available within Blosc. See the tutorial
section on :ref:`tutorial_compression` for an example.
* When using the Blosc compressor, the default internal compression library
is now 'lz4'.
* The default number of internal threads for the Blosc compressor has been
increased to a maximum of 8 (previously 4).
* Added convenience functions :func:`zarr.blosc.list_compressors` and
:func:`zarr.blosc.get_nthreads`.

.. _release_1.0.0:

1.0.0
Expand Down
67 changes: 37 additions & 30 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ example::
>>> z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
>>> z
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 317; ratio: 1261829.7; initialized: 0/100
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 313; ratio: 1277955.3; initialized: 0/100
store: builtins.dict

The code above creates a 2-dimensional array of 32-bit integers with
Expand All @@ -44,7 +44,7 @@ scalar value::
>>> z[:] = 42
>>> z
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 2.2M; ratio: 170.4; initialized: 100/100
store: builtins.dict

Expand Down Expand Up @@ -92,8 +92,8 @@ enabling persistence of data between sessions. For example::
... chunks=(1000, 1000), dtype='i4', fill_value=0)
>>> z1
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 317; ratio: 1261829.7; initialized: 0/100
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 313; ratio: 1277955.3; initialized: 0/100
store: zarr.storage.DirectoryStore

The array above will store its configuration metadata and all
Expand All @@ -116,8 +116,8 @@ Check that the data have been written and can be read again::
>>> z2 = zarr.open('example.zarr', mode='r')
>>> z2
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 2.3M; ratio: 163.8; initialized: 100/100
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 2.3M; ratio: 163.9; initialized: 100/100
store: zarr.storage.DirectoryStore
>>> np.all(z1[:] == z2[:])
True
Expand All @@ -135,8 +135,8 @@ can be increased or decreased in length. For example::
>>> z.resize(20000, 10000)
>>> z
zarr.core.Array((20000, 10000), float64, chunks=(1000, 1000), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
nbytes: 1.5G; nbytes_stored: 5.9M; ratio: 259.9; initialized: 100/200
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 1.5G; nbytes_stored: 5.7M; ratio: 268.5; initialized: 100/200
store: builtins.dict

Note that when an array is resized, the underlying data are not
Expand All @@ -151,20 +151,20 @@ which can be used to append data to any axis. E.g.::
>>> z = zarr.array(a, chunks=(1000, 100))
>>> z
zarr.core.Array((10000, 1000), int32, chunks=(1000, 100), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
nbytes: 38.1M; nbytes_stored: 2.0M; ratio: 19.3; initialized: 100/100
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 38.1M; nbytes_stored: 1.9M; ratio: 20.0; initialized: 100/100
store: builtins.dict
>>> z.append(a)
>>> z
zarr.core.Array((20000, 1000), int32, chunks=(1000, 100), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
nbytes: 76.3M; nbytes_stored: 4.0M; ratio: 19.3; initialized: 200/200
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 76.3M; nbytes_stored: 3.8M; ratio: 20.0; initialized: 200/200
store: builtins.dict
>>> z.append(np.vstack([a, a]), axis=1)
>>> z
zarr.core.Array((20000, 2000), int32, chunks=(1000, 100), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
nbytes: 152.6M; nbytes_stored: 7.9M; ratio: 19.3; initialized: 400/400
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 152.6M; nbytes_stored: 7.6M; ratio: 20.0; initialized: 400/400
store: builtins.dict

.. _tutorial_compress:
Expand All @@ -188,17 +188,24 @@ functions. For example::

>>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000),
... chunks=(1000, 1000), compression='blosc',
... compression_opts=dict(cname='lz4', clevel=3, shuffle=2))
... compression_opts=dict(cname='zstd', clevel=3, shuffle=2))
>>> z
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
compression: blosc; compression_opts: {'clevel': 3, 'cname': 'lz4', 'shuffle': 2}
nbytes: 381.5M; nbytes_stored: 17.6M; ratio: 21.7; initialized: 100/100
compression: blosc; compression_opts: {'clevel': 3, 'cname': 'zstd', 'shuffle': 2}
nbytes: 381.5M; nbytes_stored: 3.1M; ratio: 121.1; initialized: 100/100
store: builtins.dict

The array above will use Blosc as the primary compressor, using the
LZ4 algorithm (compression level 3) internally within Blosc, and with
Zstandard algorithm (compression level 3) internally within Blosc, and with
the bitshuffle filter applied.

A list of the internal compression libraries available within Blosc can be
obtained via::

>>> from zarr import blosc
>>> blosc.list_compressors()
['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd']

In addition to Blosc, other compression libraries can also be
used. Zarr comes with support for zlib, BZ2 and LZMA compression, via
the Python standard library. For example, here is an array using zlib
Expand Down Expand Up @@ -270,8 +277,8 @@ array with thread synchronization::
... synchronizer=zarr.ThreadSynchronizer())
>>> z
zarr.sync.SynchronizedArray((10000, 10000), int32, chunks=(1000, 1000), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 317; ratio: 1261829.7; initialized: 0/100
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 313; ratio: 1277955.3; initialized: 0/100
store: builtins.dict; synchronizer: zarr.sync.ThreadSynchronizer

This array is safe to read or write within a multi-threaded program.
Expand All @@ -285,8 +292,8 @@ provided that all processes have access to a shared file system. E.g.::
... synchronizer=synchronizer)
>>> z
zarr.sync.SynchronizedArray((10000, 10000), int32, chunks=(1000, 1000), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 317; ratio: 1261829.7; initialized: 0/100
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 313; ratio: 1277955.3; initialized: 0/100
store: zarr.storage.DirectoryStore; synchronizer: zarr.sync.ProcessSynchronizer

This array is safe to read or write from multiple processes.
Expand Down Expand Up @@ -350,13 +357,13 @@ data. E.g.::
>>> a = np.arange(100000000, dtype='i4').reshape(10000, 10000).T
>>> zarr.array(a, chunks=(1000, 1000))
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 26.1M; ratio: 14.6; initialized: 100/100
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 26.3M; ratio: 14.5; initialized: 100/100
store: builtins.dict
>>> zarr.array(a, chunks=(1000, 1000), order='F')
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=F)
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 10.0M; ratio: 38.0; initialized: 100/100
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
nbytes: 381.5M; nbytes_stored: 9.5M; ratio: 40.1; initialized: 100/100
store: builtins.dict

In the above example, Fortran order gives a better compression ratio. This
Expand Down Expand Up @@ -460,12 +467,12 @@ Configuring Blosc

The Blosc compressor is able to use multiple threads internally to
accelerate compression and decompression. By default, Zarr allows
Blosc to use up to 4 internal threads. The number of Blosc threads can
be changed, e.g.::
Blosc to use up to 8 internal threads. The number of Blosc threads can
be changed to increase or decrease this number, e.g.::

>>> from zarr import blosc
>>> blosc.set_nthreads(2)
4
8

When a Zarr array is being used within a multi-threaded program, Zarr
automatically switches to using Blosc in a single-threaded
Expand Down
216 changes: 137 additions & 79 deletions notebooks/.ipynb_checkpoints/dask_copy-checkpoint.ipynb

Large diffs are not rendered by default.

0 comments on commit 701b893

Please sign in to comment.