- Fix structured arrays that contain objects By :user: Attila Bergou <abergou>; :issue: 806
- Mark the fact that some tests that require
fsspec
, without compromising the code coverage score. ByBen Williams <benjaminhwilliams>
;823
. - Only inspect alternate node type if desired isn't present. By
Trevor Manz <manzt>
;696
.
- Correct conda-forge deployment of Zarr by fixing some Zarr tests. By
Ben Williams <benjaminhwilliams>
;821
.
- Correct conda-forge deployment of Zarr. By
Josh Moore <joshmoore>
;XXX
.
This release of Zarr Python is the first release of Zarr to not support Python 3.6.
- Update ABSStore for compatibility with newer azure.storage.blob. By
Tom Augspurger <TomAugspurger>
;759
. - Pathlib support. By
Chris Barnes <clbarnes>
;768
.
- Clarify that arbitrary key/value pairs are OK for attributes. By
Stephan Hoyer <shoyer>
;751
. - Clarify how to manually convert a DirectoryStore to a ZipStore. By
pmav99 <pmav99>
;763
.
- Fix dimension_separator support. By
Josh Moore <joshmoore>
;775
. - Extract ABSStore to zarr._storage.absstore. By
Josh Moore <joshmoore>
;781
. - avoid NumPy 1.21.0 due to numpy/numpy#19325 By
Greggory Lee <grlee77>
;791
.
- Drop 3.6 builds. By
Josh Moore <joshmoore>
;774
,778
. - Fix build with Sphinx 4. By
Elliott Sales de Andrade <QuLogic>
;799
. - TST: add missing assert in test_hexdigest. By
Greggory Lee <grlee77>
;801
.
- FSStore: default to normalize_keys=False By
Josh Moore <joshmoore>
;755
. - ABSStore: compatibility with
azure.storage.python>=12
ByTom Augspurger <tomaugspurger>
;618
- Add section on rechunking to tutorial By
David Baddeley <David-Baddeley>
;730
.
- Expand FSStore tests and fix implementation issues By
Davis Bennett <d-v-b>
;709
.
- Updated ipytree warning for jlab3 By
Ian Hunt-Isaak <ianhi>
;721
. - b170a48a - (issue-728, copy-nested) Updated ipytree warning for jlab3 (#721) (3 weeks ago) <Ian Hunt-Isaak>
- Activate dependabot By
Josh Moore <joshmoore>
;734
. - Update Python classifiers (Zarr is stable!) By
Josh Moore <joshmoore>
;731
.
- raise an error if create_dataset's dimension_separator is inconsistent By
Gregory R. Lee <grlee77>
;724
.
- Introduce optional dimension_separator .zarray key for nested chunks. By
Josh Moore <joshmoore>
;715
,716
.
- Update Array to respect FSStore's key_separator (#718) By
Gregory R. Lee <grlee77>
;718
.
- Start stop for iterator (islice()) By
Sebastian Grill <yetyetanotherusername>
;621
. - Add capability to partially read and decompress chunks By
Andrew Fulton <andrewfulton9>
;667
.
- Make DirectoryStore __setitem__ resilient against antivirus file locking By
Eric Younkin <ericgyounkin>
;698
. - Compare test data's content generally By
John Kirkham <jakirkham>
;436
. - Fix dtype usage in zarr/meta.py By
Josh Moore <joshmoore>
;700
. - Fix FSStore key_seperator usage By
Josh Moore <joshmoore>
;669
. - Simplify text handling in DB Store By
John Kirkham <jakirkham>
;670
. - GitHub Actions migration By
Matthias Bussonnier <Carreau>
;641
,671
,674
,676
,677
,678
,679
,680
,682
,684
,685
,686
,687
,695
,706
.
- Minor build fix By
Matthias Bussonnier <Carreau>
;666
.
This release of Zarr Python is the first release of Zarr to not support Python 3.5.
- End Python 3.5 support. By
Chris Barnes <clbarnes>
;602
. - Fix
open_group/open_array
to allow opening of read-only store withmode='r'
269
- Add Array tests for FSStore. By
Andrew Fulton <andrewfulton9>
; :issue: 644. - fix a bug in which
attrs
would not be copied on the root when usingcopy_all
;613
- Fix
FileNotFoundError
with dask/s3fs649
- Fix flaky fixture in test_storage.py
652
- Fix FSStore getitems fails with arrays that have a 0 length shape dimension
644
- Use async to fetch/write result concurrently when possible.
536
, See this comment for some performance analysis showing order of magnitude faster response in some benchmark.
See this link <https://github.com/zarr-developers/zarr-python/milestone/11?closed=1> for the full list of closed and merged PR tagged with the 2.6 milestone.
Add ability to partially read and decompress arrays, see
667
. It is only available to chunks stored using fs-spec and using bloc as a compressor.For certain analysis case when only a small portion of chunks is needed it can be advantageous to only access and decompress part of the chunks. Doing partial read and decompression add high latency to many of the operation so should be used only when the subset of the data is small compared to the full chunks and is stored contiguously (that is to say either last dimensions for C layout, firsts for F). Pass
partial_decompress=True
as argument when creating anArray
, or when usingopen_array
. No option exists yet to apply partial read and decompress on a per-operation basis.
This release will be the last to support Python 3.5, next version of Zarr will be Python 3.6+.
- DirectoryStore now uses os.scandir, which should make listing large store faster,
563
- Remove a few remaining Python 2-isms. By
Poruri Sai Rahul <rahulporuri>
;393
. - Fix minor bug in N5Store. By
gsakkis
,550
. - Improve error message in Jupyter when trying to use the
ipytree
widget withoutipytree
installed. ByZain Patel <mzjp2>
;537
- Add typing informations to many of the core functions
589
- Explicitly close stores during testing. By
Elliott Sales de Andrade <QuLogic>
;442
- Many of the convenience functions to emit errors (
err_*
fromzarr.errors
have been replaced byValueError
subclasses. The correspondingerr_*
function have been removed.590
,614
) - Improve consistency of terminology regarding arrays and datasets in the documentation. By
Josh Moore <joshmoore>
;571
. - Added support for generic URL opening by
fsspec
, where the URLs have the form "protocol://[server]/path" or can be chained URls with "::" separators. The additional argumentstorage_options
is passed to the backend, see thefsspec
docs. ByMartin Durant <martindurant>
;546
- Added support for fetching multiple items via
getitems
method of a store, if it exists. This allows for concurrent fetching of data blocks from stores that implement this; presently HTTP, S3, GCS. Currently only applies to reading. ByMartin Durant <martindurant>
;606
- Efficient iteration expanded with option to pass start and stop index via
array.islice
. BySebastian Grill <yetyetanotherusername>
,615
.
- Add key normalization option for
DirectoryStore
,NestedDirectoryStore
,TempStore
, andN5Store
. ByJames Bourbeau <jrbourbeau>
;459
. - Add
recurse
keyword toGroup.array_keys
andGroup.arrays
methods. ByJames Bourbeau <jrbourbeau>
;458
. - Use uniform chunking for all dimensions when specifying
chunks
as an integer. Also adds support for specifying-1
to chunk across an entire dimension. ByJames Bourbeau <jrbourbeau>
;456
. - Rename
DictStore
toMemoryStore
. ByJames Bourbeau <jrbourbeau>
;455
. - Rewrite
.tree()
pretty representation to useipytree
. Allows it to work in both the Jupyter Notebook and JupyterLab. ByJohn Kirkham <jakirkham>
;450
. - Do not rename Blosc parameters in n5 backend and add blocksize parameter, compatible with n5-blosc. By
axtimwalde
,485
. - Update
DirectoryStore
to create files with more permissive permissions. ByEduardo Gonzalez <eddienko>
andJames Bourbeau <jrbourbeau>
;493
- Use
math.ceil
for scalars. ByJohn Kirkham <jakirkham>
;500
. - Ensure contiguous data using
astype
. ByJohn Kirkham <jakirkham>
;513
. - Refactor out
_tofile
/_fromfile
fromDirectoryStore
. ByJohn Kirkham <jakirkham>
;503
. - Add
__enter__
/__exit__
methods toGroup
forh5py.File
compatibility. ByChris Barnes <clbarnes>
;509
.
- Fix Sqlite Store Wrong Modification. By
Tommy Tran <potter420>
;440
. - Add intermediate step (using
zipfile.ZipInfo
object) to write insideZipStore
to solve too restrictive permission issue. ByRaphael Dussin <raphaeldussin>
;505
. - Fix '/' prepend bug in
ABSStore
. ByShikhar Goenka <shikharsg>
;525
.
- Fix hyperlink in
README.md
. ByAnderson Banihirwe <andersy005>
;531
. - Replace "nuimber" with "number". By
John Kirkham <jakirkham>
;512
. - Fix azure link rendering in tutorial. By
James Bourbeau <jrbourbeau>
;507
. - Update
README
file to be more detailed. ByZain Patel <mzjp2>
;495
. - Import blosc from numcodecs in tutorial. By
James Bourbeau <jrbourbeau>
;491
. - Adds logo to docs. By
James Bourbeau <jrbourbeau>
;462
. - Fix N5 link in tutorial. By
James Bourbeau <jrbourbeau>
;480
. - Fix typo in code snippet. By
Joe Jevnik <llllllllll>
;461
. - Fix URLs to point to zarr-python By
John Kirkham <jakirkham>
;453
.
- Add documentation build to CI. By
James Bourbeau <jrbourbeau>
;516
. - Use
ensure_ndarray
in a few more places. ByJohn Kirkham <jakirkham>
;506
. - Support Python 3.8. By
John Kirkham <jakirkham>
;499
. - Require Numcodecs 0.6.4+ to use text handling functionality from it. By
John Kirkham <jakirkham>
;497
. - Updates tests to use
pytest.importorskip
. ByJames Bourbeau <jrbourbeau>
;492
- Removed support for Python 2. By
jhamman
;393
,470
. - Upgrade dependencies in the test matrices and resolve a compatibility issue with testing against the Azure Storage Emulator. By
alimanfoo
;468
,467
. - Use
unittest.mock
on Python 3. ByElliott Sales de Andrade <QuLogic>
;426
. - Drop
decode
fromConsolidatedMetadataStore
. ByJohn Kirkham <jakirkham>
;452
.
- Use
scandir
inDirectoryStore
'sgetsize
method. ByJohn Kirkham <jakirkham>
;431
.
- Add and use utility functions to simplify reading and writing JSON. By
John Kirkham <jakirkham>
;429
,430
. - Fix
collections
'sDeprecationWarning
s. ByJohn Kirkham <jakirkham>
;432
. - Fix tests on big endian machines. By
Elliott Sales de Andrade <QuLogic>
;427
.
- Makes
azure-storage-blob
optional for testing. ByJohn Kirkham <jakirkham>
;419
,420
.
- New storage backend, backed by Azure Blob Storage, class
zarr.storage.ABSStore
. All data is stored as block blobs. ByShikhar Goenka <shikarsg>
,Tim Crone <tjcrone>
andZain Patel <mzjp2>
;345
. - Add "consolidated" metadata as an experimental feature: use
zarr.convenience.consolidate_metadata
to copy all metadata from the various metadata keys within a dataset hierarchy under a single key, andzarr.convenience.open_consolidated
to use this single key. This can greatly cut down the number of calls to the storage backend, and so remove a lot of overhead for reading remote data. ByMartin Durant <martindurant>
,Alistair Miles <alimanfoo>
,Ryan Abernathey <rabernat>
,268
,332
,338
. - Support has been added for structured arrays with sub-array shape and/or nested fields. By
Tarik Onalan <onalant>
,111
,296
. - Adds the SQLite-backed
zarr.storage.SQLiteStore
class enabling an SQLite database to be used as the backing store for an array or group. ByJohn Kirkham <jakirkham>
,368
,365
. - Efficient iteration over arrays by decompressing chunkwise. By
Jerome Kelleher <jeromekelleher>
,398
,399
. - Adds the Redis-backed
zarr.storage.RedisStore
class enabling a Redis database to be used as the backing store for an array or group. ByJoe Hamman <jhamman>
,299
,372
. - Adds the MongoDB-backed
zarr.storage.MongoDBStore
class enabling a MongoDB database to be used as the backing store for an array or group. ByNoah D Brenowitz <nbren12>
,Joe Hamman <jhamman>
,299
,372
,401
. - New storage class for N5 containers. The
zarr.n5.N5Store
has been added, which useszarr.storage.NestedDirectoryStore
to support reading and writing from and to N5 containers. ByJan Funke <funkey>
andJohn Kirkham <jakirkham>
.
- The implementation of the
zarr.storage.DirectoryStore
class has been modified to ensure that writes are atomic and there are no race conditions where a chunk might appear transiently missing during a write operation. Bysbalmer <sbalmer>
,327
,263
. - Avoid raising in
zarr.storage.DirectoryStore
's__setitem__
when file already exists. ByJustin Swaney <jmswaney>
,272
,318
. - The required version of the Numcodecs package has been upgraded to 0.6.2, which has enabled some code simplification and fixes a failing test involving msgpack encoding. By
John Kirkham <jakirkham>
,361
,360
,352
,355
,324
. - Failing tests related to pickling/unpickling have been fixed. By
Ryan Williams <ryan-williams>
,273
,308
. - Corrects handling of
NaT
indatetime64
andtimedelta64
in various compressors (byJohn Kirkham <jakirkham>
;344
). - Ensure
DictStore
contains onlybytes
to facilitate comparisons and protect against writes. ByJohn Kirkham <jakirkham>
,350
. - Test and fix an issue (w.r.t. fill values) when storing complex data to
Array
. ByJohn Kirkham <jakirkham>
,363
. - Always use a
tuple
when indexing a NumPyndarray
. ByJohn Kirkham <jakirkham>
,376
. - Ensure when
Array
uses adict
-based chunk store that it only containsbytes
to facilitate comparisons and protect against writes. Drop the copy for the no filter/compressor case as this handles that case. ByJohn Kirkham <jakirkham>
,359
.
- Simplify directory creation and removal in
DirectoryStore.rename
. ByJohn Kirkham <jakirkham>
,249
. - CI and test environments have been upgraded to include Python 3.7, drop Python 3.4, and upgrade all pinned package requirements.
Alistair Miles <alimanfoo>
,308
. - Start using pyup.io to maintain dependencies.
Alistair Miles <alimanfoo>
,326
. - Configure flake8 line limit generally.
John Kirkham <jakirkham>
,335
. - Add missing coverage pragmas.
John Kirkham <jakirkham>
,343
,355
. - Fix missing backslash in docs.
John Kirkham <jakirkham>
,254
,353
. - Include tests for stores'
popitem
andpop
methods. ByJohn Kirkham <jakirkham>
,378
,380
. - Include tests for different compressors, endianness, and attributes. By
John Kirkham <jakirkham>
,378
,380
. - Test validity of stores' contents. By
John Kirkham <jakirkham>
,359
,408
.
- Advanced indexing. The
Array
class has several new methods and properties that enable a selection of items in an array to be retrieved or updated. See thetutorial_indexing
tutorial section for more information. There is also a notebook with extended examples and performance benchmarks.78
,89
,112
,172
. - New package for compressor and filter codecs. The classes previously defined in the
zarr.codecs
module have been factored out into a separate package called Numcodecs. The Numcodecs package also includes several new codec classes not previously available in Zarr, including compressor codecs for Zstd and LZ4. This change is backwards-compatible with existing code, as all codec classes defined by Numcodecs are imported into thezarr.codecs
namespace. However, it is recommended to import codecs from the new package, see the tutorial sections ontutorial_compress
andtutorial_filters
for examples. With contributions byJohn Kirkham <jakirkham>
;74
,102
,120
,123
,139
. - New storage class for DBM-style databases. The
zarr.storage.DBMStore
class enables any DBM-style database such as gdbm, ndbm or Berkeley DB, to be used as the backing store for an array or group. See the tutorial section ontutorial_storage
for some examples.133
,186
. - New storage class for LMDB databases. The
zarr.storage.LMDBStore
class enables an LMDB "Lightning" database to be used as the backing store for an array or group.192
. - New storage class using a nested directory structure for chunk files. The
zarr.storage.NestedDirectoryStore
has been added, which is similar to the existingzarr.storage.DirectoryStore
class but nests chunk files for multidimensional arrays into sub-directories.155
,177
. - New tree() method for printing hierarchies. The
Group
class has a newzarr.hierarchy.Group.tree
method which enables a tree representation of a group hierarchy to be printed. Also provides an interactive tree representation when used within a Jupyter notebook. See thetutorial_diagnostics
tutorial section for examples. ByJohn Kirkham <jakirkham>
;82
,140
,184
. - Visitor API. The
Group
class now implements the h5py visitor API, see docs for thezarr.hierarchy.Group.visit
,zarr.hierarchy.Group.visititems
andzarr.hierarchy.Group.visitvalues
methods. ByJohn Kirkham <jakirkham>
,92
,122
. - Viewing an array as a different dtype. The
Array
class has a newzarr.core.Array.astype
method, which is a convenience that enables an array to be viewed as a different dtype. ByJohn Kirkham <jakirkham>
,94
,96
. - New open(), save(), load() convenience functions. The function
zarr.convenience.open
provides a convenient way to open a persistent array or group, using either aDirectoryStore
orZipStore
as the backing store. The functionszarr.convenience.save
andzarr.convenience.load
are also available and provide a convenient way to save an entire NumPy array to disk and load back into memory later. See the tutorial sectiontutorial_persist
for examples.104
,105
,141
,181
. - IPython completions. The
Group
class now implements__dir__()
and_ipython_key_completions_()
which enables tab-completion for group members to be used in any IPython interactive environment.170
. - New info property; changes to __repr__. The
Group
andArray
classes have a newinfo
property which can be used to print diagnostic information, including compression ratio where available. See the tutorial section ontutorial_diagnostics
for examples. The string representation (__repr__
) of these classes has been simplified to ensure it is cheap and quick to compute in all circumstances.83
,115
,132
,148
. - Chunk options. When creating an array,
chunks=False
can be specified, which will result in an array with a single chunk only. Alternatively,chunks=True
will trigger an automatic chunk shape guess. Seetutorial_chunks
for more on thechunks
parameter.106
,107
,183
. - Zero-dimensional arrays and are now supported; by
Prakhar Goel <newt0311>
,154
,161
. - Arrays with one or more zero-length dimensions are now fully supported; by
Prakhar Goel <newt0311>
,150
,154
,160
. - The .zattrs key is now optional and will now only be created when the first custom attribute is set;
121
,200
. - New Group.move() method supports moving a sub-group or array to a different location within the same hierarchy. By
John Kirkham <jakirkham>
,191
,193
,196
. - ZipStore is now thread-safe;
194
,192
. - New Array.hexdigest() method computes an
Array
's hash withhashlib
. ByJohn Kirkham <jakirkham>
,98
,203
. - Improved support for object arrays. In previous versions of Zarr, creating an array with
dtype=object
was possible but could under certain circumstances lead to unexpected errors and/or segmentation faults. To make it easier to properly configure an object array, a newobject_codec
parameter has been added to array creation functions. See the tutorial section ontutorial_objects
for more information and examples. Also, runtime checks have been added in both Zarr and Numcodecs so that segmentation faults are no longer possible, even with a badly configured array. This API change is backwards compatible and previous code that created an object array and provided an object codec via thefilters
parameter will continue to work, however a warning will be raised to encourage use of theobject_codec
parameter.208
,212
. - Added support for datetime64 and timedelta64 data types;
85
,215
. - Array and group attributes are now cached by default to improve performance with slow stores, e.g., stores accessing data via the network;
220
,218
,204
. - New LRUStoreCache class. The class
zarr.storage.LRUStoreCache
has been added and provides a means to locally cache data in memory from a store that may be slow, e.g., a store that retrieves data from a remote server via the network;223
. - New copy functions. The new functions
zarr.convenience.copy
andzarr.convenience.copy_all
provide a way to copy groups and/or arrays between HDF5 and Zarr, or between two Zarr groups. Thezarr.convenience.copy_store
provides a more efficient way to copy data directly between two Zarr stores.87
,113
,137
,217
.
- Fixed bug where
read_only
keyword argument was ignored when creating an array;151
,179
. - Fixed bugs when using a
ZipStore
opened in 'w' mode;158
,182
. - Fill values can now be provided for fixed-length string arrays;
165
,176
. - Fixed a bug where the number of chunks initialized could be counted incorrectly;
97
,174
. - Fixed a bug related to the use of an ellipsis (...) in indexing statements;
93
,168
,172
. - Fixed a bug preventing use of other integer types for indexing;
143
,147
.
- Some changes have been made to the
spec_v2
document to clarify ambiguities and add some missing information. These changes do not break compatibility with any of the material as previously implemented, and so the changes have been made in-place in the document without incrementing the document version number. See the section onspec_v2_changes
in the specification document for more information. - A new
tutorial_indexing
section has been added to the tutorial. - A new
tutorial_strings
section has been added to the tutorial (135
,175
). - The
tutorial_chunks
tutorial section has been reorganised and updated. - The
tutorial_persist
andtutorial_storage
tutorial sections have been updated with new examples (100
,101
,103
). - A new tutorial section on
tutorial_pickle
has been added (91
). - A new tutorial section on
tutorial_datetime
has been added. - A new tutorial section on
tutorial_diagnostics
has been added. - The tutorial sections on
tutorial_sync
andtutorial_tips_blosc
have been updated to provide information about how to avoid program hangs when using the Blosc compressor with multiple processes (199
,201
).
- A data fixture has been included in the test suite to ensure data format compatibility is maintained;
83
,146
. - The test suite has been migrated from nosetests to pytest;
189
,225
. - Various continuous integration updates and improvements;
118
,124
,125
,126
,109
,114
,171
. - Bump numcodecs dependency to 0.5.3, completely remove nose dependency,
237
. - Fix compatibility issues with NumPy 1.14 regarding fill values for structured arrays,
222
,238
,239
.
Code was contributed to this release by Alistair Miles <alimanfoo>
, John
Kirkham <jakirkham>
and Prakhar Goel <newt0311>
.
Documentation was contributed to this release by Mamy Ratsimbazafy <mratsim>
and Charles Noyes <CSNoyes>
.
Thank you to John Kirkham <jakirkham>
, Stephan Hoyer <shoyer>
, Francesc Alted <FrancescAlted>
, and Matthew Rocklin <mrocklin>
for code reviews and/or comments on pull requests.
- Resolved an issue where calling
hasattr
on aGroup
object erroneously returned aKeyError
. ByVincent Schut <vincentschut>
;88
,95
.
- Resolved an issue with
zarr.creation.array
where dtype was given as None (80
).
- Resolved an issue when no compression is used and chunks are stored in memory (
79
).
Various minor improvements, including: Group
objects support member access via dot notation (__getattr__
); fixed metadata caching for Array.shape
property and derivatives; added Array.ndim
property; fixed Array.__array__
method arguments; fixed bug in pickling Array
state; fixed bug in pickling ThreadSynchronizer
.
- Group objects now support member deletion via
del
statement (65
). - Added
zarr.storage.TempStore
class for convenience to provide storage via a temporary directory (59
). - Fixed performance issues with
zarr.storage.ZipStore
class (66
). - The Blosc extension has been modified to return bytes instead of array objects from compress and decompress function calls. This should improve compatibility and also provides a small performance increase for compressing high compression ratio data (
55
). - Added
overwrite
keyword argument to array and group creation methods on thezarr.hierarchy.Group
class (71
). - Added
cache_metadata
keyword argument to array creation methods. - The functions
zarr.creation.open_array
andzarr.hierarchy.open_group
now accept any store as first argument (56
).
The bundled Blosc library has been upgraded to version 1.11.1.
Support has been added for organizing arrays into hierarchies via groups. See the tutorial section on tutorial_groups
and the zarr.hierarchy
API docs for more information.
Support has been added for configuring filters to preprocess chunk data prior to compression. See the tutorial section on tutorial_filters
and the zarr.codecs
API docs for more information.
To accommodate support for hierarchies and filters, the Zarr metadata format has been modified. See the spec_v2
for more information. To migrate an array stored using Zarr version 1.x, use the zarr.storage.migrate_1to2
function.
The bundled Blosc library has been upgraded to version 1.11.0.
Thanks to Matthew Rocklin <mrocklin>
, Stephan Hoyer <shoyer>
and Francesc Alted <FrancescAlted>
for contributions and comments.
- The bundled Blosc library has been upgraded to version 1.10.0. The 'zstd' internal compression library is now available within Blosc. See the tutorial section on
tutorial_compress
for an example. - When using the Blosc compressor, the default internal compression library is now 'lz4'.
- The default number of internal threads for the Blosc compressor has been increased to a maximum of 8 (previously 4).
- Added convenience functions
zarr.blosc.list_compressors
andzarr.blosc.get_nthreads
.
This release includes a complete re-organization of the code base. The major version number has been bumped to indicate that there have been backwards-incompatible changes to the API and the on-disk storage format. However, Zarr is still in an early stage of development, so please do not take the version number as an indicator of maturity.
The main motivation for re-organizing the code was to create an abstraction layer between the core array logic and data storage (21
). In this release, any object that implements the MutableMapping
interface can be used as an array store. See the tutorial sections on tutorial_persist
and tutorial_storage
, the spec_v1
, and the zarr.storage
module documentation for more information.
Please note also that the file organization and file name conventions used when storing a Zarr array in a directory on the file system have changed. Persistent Zarr arrays created using previous versions of the software will not be compatible with this version. See the zarr.storage
API docs and the spec_v1
for more information.
An abstraction layer has also been created between the core array logic and the code for compressing and decompressing array chunks. This release still bundles the c-blosc library and uses Blosc as the default compressor, however other compressors including zlib, BZ2 and LZMA are also now supported via the Python standard library. New compressors can also be dynamically registered for use with Zarr. See the tutorial sections on tutorial_compress
and tutorial_tips_blosc
, the spec_v1
, and the zarr.compressors
module documentation for more information.
The synchronization code has also been refactored to create a layer of abstraction, enabling Zarr arrays to be used in parallel computations with a number of alternative synchronization methods. For more information see the tutorial section on tutorial_sync
and the zarr.sync
module documentation.
NumPy is no longer a build dependency for the zarr.blosc
Cython extension, so setup.py will run even if NumPy is not already installed, and should automatically install NumPy as a runtime dependency. Manual installation of NumPy prior to installing Zarr is still recommended, however, as the automatic installation of NumPy may fail or be sub-optimal on some platforms.
Some optimizations have been made within the zarr.blosc
extension to avoid unnecessary memory copies, giving a ~10-20% performance improvement for multi-threaded compression operations.
The zarr.blosc
extension now automatically detects whether it is running within a single-threaded or multi-threaded program and adapts its internal behaviour accordingly (27
). There is no need for the user to make any API calls to switch Blosc between contextual and non-contextual (global lock) mode. See also the tutorial section on tutorial_tips_blosc
.
The internal code for managing chunks has been rewritten to be more efficient. Now no state is maintained for chunks outside of the array store, meaning that chunks do not carry any extra memory overhead not accounted for by the store. This negates the need for the "lazy" option present in the previous release, and this has been removed.
The memory layout within chunks can now be set as either "C" (row-major) or "F" (column-major), which can help to provide better compression for some data (7
). See the tutorial section on tutorial_chunks_order
for more information.
A bug has been fixed within the __getitem__
and __setitem__
machinery for slicing arrays, to properly handle getting and setting partial slices.
Thanks to Matthew Rocklin <mrocklin>
, Stephan Hoyer <shoyer>
, Francesc Alted <FrancescAlted>
, Anthony Scopatz <scopatz>
and Martin Durant <martindurant>
for contributions and comments.