Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N5FSStore #793

Merged
merged 45 commits into from
Sep 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
d26923a
Drop skip_if_nested_chunks from test_storage.py
joshmoore Jun 14, 2021
c06476d
Add failing nested test
joshmoore Jun 14, 2021
ce8b2f0
Make DirectoryStore dimension_separator aware
joshmoore Jun 14, 2021
e183566
Migrate key logic to core rather than storage
joshmoore Jun 14, 2021
449a67f
Fix linting in new test
joshmoore Jun 14, 2021
10c874e
Merge 'origin/master' into fix-dstore
joshmoore Jun 16, 2021
2e4f4d7
Extend the test suite for dim_sep
joshmoore Jun 17, 2021
8660fa5
add n5fsstore and tests
d-v-b Jul 5, 2021
3b341ed
resolve merge conflicts with main
d-v-b Jul 5, 2021
bb1121c
slightly smarter kwarg interception
d-v-b Jul 6, 2021
be8f37f
remove outdated unittest ref and fix the name of a test func
d-v-b Jul 6, 2021
95b2573
fix massive string block and fix default key_separator kwarg for FSStore
d-v-b Jul 6, 2021
ceba78d
flake8
d-v-b Jul 6, 2021
02ea91c
promote n5store to toplevel import and fix examples in docstring
d-v-b Jul 6, 2021
cb62c10
Merge branch 'master' into fix-dstore
joshmoore Aug 17, 2021
68adca5
Try fsspec 2021.7 (see #802)
joshmoore Aug 17, 2021
f2f75b7
Revert "Try fsspec 2021.7 (see #802)"
joshmoore Aug 17, 2021
930a821
Merge branch 'master' into n5fsstore
d-v-b Aug 17, 2021
a57b3bc
Add missing core tests for N5FSStore, and rchanges required for makin…
d-v-b Aug 17, 2021
9bb058f
Merge branch 'fix-dstore' of https://github.com/joshmoore/zarr-python…
d-v-b Aug 17, 2021
ee9cdbc
tmp: debug
joshmoore Aug 18, 2021
a853a29
uncomment N5 chunk ordering test
d-v-b Aug 18, 2021
7d3c879
more commented tests get uncommented
d-v-b Aug 18, 2021
f3ecd79
add dimension_separator to array metadata adaptor
d-v-b Aug 18, 2021
2d3d286
Merge branch 'fix-dstore' into pr-793+773
joshmoore Aug 19, 2021
5a105eb
Revert "tmp: debug"
joshmoore Aug 19, 2021
51b3109
Attempt failed: keeping '.' and switching
joshmoore Aug 19, 2021
aa75c98
Revert "Attempt failed: keeping '.' and switching"
joshmoore Aug 19, 2021
3daea7c
regex: attempt failed due to slight diff in files
joshmoore Aug 19, 2021
ce8a79e
Revert "regex: attempt failed due to slight diff in files"
joshmoore Aug 19, 2021
985c2a4
N5: use "." internally for dimension separation
joshmoore Aug 19, 2021
51836df
move FSSpec import guard
d-v-b Aug 19, 2021
3c5da2f
remove os.path.sep concatenation in listdir that was erroring a test,…
d-v-b Aug 19, 2021
eea4aaa
resolve merge conflicts in favor of upstream
d-v-b Sep 13, 2021
b8fe803
resolve merge conflicts in favor of upstream
d-v-b Sep 13, 2021
8fec1d6
make listdir implementation for n5fsstore look more like fsstore's li…
d-v-b Sep 14, 2021
46ebb44
Update hexdigest tests for N5Stores to account for the presence of th…
d-v-b Sep 15, 2021
864773d
Add tests for dimension_separator in array meta for N5Stores
d-v-b Sep 15, 2021
3b56155
N5FSStore: try to increase code coverage
joshmoore Sep 17, 2021
b0f6d33
flake8
d-v-b Sep 17, 2021
82ce89f
add chunk nesting test to N5FSStore test suite
d-v-b Sep 17, 2021
267c744
Merge branch 'master' into n5fsstore
joshmoore Sep 17, 2021
2b85410
make array_meta_key, group_meta_key, attrs_key private
d-v-b Sep 17, 2021
8bd6c41
Merge branch 'n5fsstore' of https://github.com/d-v-b/zarr-python into…
d-v-b Sep 17, 2021
aa4a723
N5FSStore: Remove ImportError test
joshmoore Sep 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion zarr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
zeros_like)
from zarr.errors import CopyError, MetadataError
from zarr.hierarchy import Group, group, open_group
from zarr.n5 import N5Store
from zarr.n5 import N5Store, N5FSStore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concerned me that it would need protecting by try/except block. In testing it, I realized FSStore only throws on __init__ and therefore N5FSStore could be less conservative. I've pushed:

aa4a723

from zarr.storage import (ABSStore, DBMStore, DictStore, DirectoryStore,
LMDBStore, LRUStoreCache, MemoryStore, MongoDBStore,
NestedDirectoryStore, RedisStore, SQLiteStore,
Expand Down
294 changes: 291 additions & 3 deletions zarr/n5.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
from numcodecs.registry import get_codec, register_codec

from .meta import ZARR_FORMAT, json_dumps, json_loads
from .storage import NestedDirectoryStore, _prog_ckey, _prog_number
from .storage import FSStore
from .storage import NestedDirectoryStore, _prog_ckey, _prog_number, normalize_storage_path
from .storage import array_meta_key as zarr_array_meta_key
from .storage import attrs_key as zarr_attrs_key
from .storage import group_meta_key as zarr_group_meta_key
Expand Down Expand Up @@ -281,12 +282,298 @@ def _contains_attrs(self, path):
return len(attrs) > 0


class N5FSStore(FSStore):
"""Implentation of the N5 format (https://github.com/saalfeldlab/n5) using `fsspec`,
which allows storage on a variety of filesystems. Based on `zarr.N5Store`.
Parameters
----------
path : string
Location of directory to use as the root of the storage hierarchy.
normalize_keys : bool, optional
If True, all store keys will be normalized to use lower case characters
(e.g. 'foo' and 'FOO' will be treated as equivalent). This can be
useful to avoid potential discrepancies between case-senstive and
case-insensitive file system. Default value is False.

Examples
--------
Store a single array::

>>> import zarr
>>> store = zarr.N5FSStore('data/array.n5', auto_mkdir=True)
>>> z = zarr.zeros((10, 10), chunks=(5, 5), store=store, overwrite=True)
>>> z[...] = 42

Store a group::

>>> store = zarr.N5FSStore('data/group.n5', auto_mkdir=True)
>>> root = zarr.group(store=store, overwrite=True)
>>> foo = root.create_group('foo')
>>> bar = foo.zeros('bar', shape=(10, 10), chunks=(5, 5))
>>> bar[...] = 42

Notes
-----
This is an experimental feature.
Safe to write in multiple threads or processes.

Be advised that the `_dimension_separator` property of this store
(and arrays it creates) is ".", but chunks saved by this store will
in fact be "/" separated, as proscribed by the N5 format.

This is counter-intuitive (to say the least), but not arbitrary.
Chunks in N5 format are stored with reversed dimension order
relative to Zarr chunks: a chunk of a 3D Zarr array would be stored
on a file system as `/0/1/2`, but in N5 the same chunk would be
stored as `/2/1/0`. Therefore, stores targeting N5 must intercept
chunk keys and flip the order of the dimensions before writing to
storage, and this procedure requires chunk keys with "." separated
dimensions, hence the Zarr arrays targeting N5 have the deceptive
"." dimension separator.
"""
_array_meta_key = 'attributes.json'
_group_meta_key = 'attributes.json'
_attrs_key = 'attributes.json'

def __init__(self, *args, **kwargs):
if 'dimension_separator' in kwargs:
kwargs.pop('dimension_separator')
warnings.warn('Keyword argument `dimension_separator` will be ignored')
dimension_separator = "."
super().__init__(*args, dimension_separator=dimension_separator, **kwargs)

def _swap_separator(self, key):
segments = list(key.split('/'))
if segments:
last_segment = segments[-1]
if _prog_ckey.match(last_segment):
coords = list(last_segment.split('.'))
last_segment = '/'.join(coords[::-1])
segments = segments[:-1] + [last_segment]
key = '/'.join(segments)
return key

def _normalize_key(self, key):
if is_chunk_key(key):
key = invert_chunk_coords(key)

key = normalize_storage_path(key).lstrip("/")
if key:
*bits, end = key.split("/")

if end not in (self._array_meta_key, self._group_meta_key, self._attrs_key):
end = end.replace(".", "/")
key = "/".join(bits + [end])
return key.lower() if self.normalize_keys else key

def __getitem__(self, key):
if key.endswith(zarr_group_meta_key):

key = key.replace(zarr_group_meta_key, self._group_meta_key)
value = group_metadata_to_zarr(self._load_n5_attrs(key))

return json_dumps(value)

elif key.endswith(zarr_array_meta_key):

key = key.replace(zarr_array_meta_key, self._array_meta_key)
value = array_metadata_to_zarr(self._load_n5_attrs(key))

return json_dumps(value)

elif key.endswith(zarr_attrs_key):

key = key.replace(zarr_attrs_key, self._attrs_key)
value = attrs_to_zarr(self._load_n5_attrs(key))

if len(value) == 0:
raise KeyError(key)
else:
return json_dumps(value)

elif is_chunk_key(key):
key = self._swap_separator(key)

return super().__getitem__(key)

def __setitem__(self, key, value):
if key.endswith(zarr_group_meta_key):

key = key.replace(zarr_group_meta_key, self._group_meta_key)

n5_attrs = self._load_n5_attrs(key)
n5_attrs.update(**group_metadata_to_n5(json_loads(value)))

value = json_dumps(n5_attrs)

elif key.endswith(zarr_array_meta_key):

key = key.replace(zarr_array_meta_key, self._array_meta_key)

n5_attrs = self._load_n5_attrs(key)
n5_attrs.update(**array_metadata_to_n5(json_loads(value)))

value = json_dumps(n5_attrs)

elif key.endswith(zarr_attrs_key):

key = key.replace(zarr_attrs_key, self._attrs_key)

n5_attrs = self._load_n5_attrs(key)
zarr_attrs = json_loads(value)

for k in n5_keywords:
if k in zarr_attrs.keys():
raise ValueError(
"Can not set attribute %s, this is a reserved N5 keyword" % k
)

# replace previous user attributes
for k in list(n5_attrs.keys()):
if k not in n5_keywords:
del n5_attrs[k]

# add new user attributes
n5_attrs.update(**zarr_attrs)

value = json_dumps(n5_attrs)

elif is_chunk_key(key):
key = self._swap_separator(key)

super().__setitem__(key, value)

def __delitem__(self, key):

if key.endswith(zarr_group_meta_key): # pragma: no cover
key = key.replace(zarr_group_meta_key, self._group_meta_key)
elif key.endswith(zarr_array_meta_key): # pragma: no cover
key = key.replace(zarr_array_meta_key, self._array_meta_key)
elif key.endswith(zarr_attrs_key): # pragma: no cover
key = key.replace(zarr_attrs_key, self._attrs_key)
elif is_chunk_key(key):
key = self._swap_separator(key)

super().__delitem__(key)

def __contains__(self, key):
if key.endswith(zarr_group_meta_key):

key = key.replace(zarr_group_meta_key, self._group_meta_key)
if key not in self:
return False
# group if not a dataset (attributes do not contain 'dimensions')
return "dimensions" not in self._load_n5_attrs(key)

elif key.endswith(zarr_array_meta_key):

key = key.replace(zarr_array_meta_key, self._array_meta_key)
# array if attributes contain 'dimensions'
return "dimensions" in self._load_n5_attrs(key)

elif key.endswith(zarr_attrs_key):

key = key.replace(zarr_attrs_key, self._attrs_key)
return self._contains_attrs(key)

elif is_chunk_key(key):
key = self._swap_separator(key)

return super().__contains__(key)

def __eq__(self, other):
return isinstance(other, N5FSStore) and self.path == other.path

def listdir(self, path=None):
if path is not None:
path = invert_chunk_coords(path)

# We can't use NestedDirectoryStore's listdir, as it requires
# array_meta_key to be present in array directories, which this store
# doesn't provide.
children = super().listdir(path=path)
if self._is_array(path):

# replace n5 attribute file with respective zarr attribute files
children.remove(self._array_meta_key)
children.append(zarr_array_meta_key)
if self._contains_attrs(path):
children.append(zarr_attrs_key)

# special handling of directories containing an array to map
# inverted nested chunk keys back to standard chunk keys
new_children = []
root_path = self.dir_path(path)
for entry in children:
entry_path = os.path.join(root_path, entry)
if _prog_number.match(entry) and self.fs.isdir(entry_path):
for file_name in self.fs.find(entry_path):
file_path = os.path.join(root_path, file_name)
rel_path = file_path.split(root_path)[1]
new_child = rel_path.lstrip('/').replace('/', ".")
new_children.append(invert_chunk_coords(new_child))
else:
new_children.append(entry)
return sorted(new_children)

elif self._is_group(path):

# replace n5 attribute file with respective zarr attribute files
children.remove(self._group_meta_key)
children.append(zarr_group_meta_key)
if self._contains_attrs(path): # pragma: no cover
children.append(zarr_attrs_key)
return sorted(children)
else:
return children

def _load_n5_attrs(self, path):
try:
s = super().__getitem__(path)
return json_loads(s)
except KeyError:
return {}

def _is_group(self, path):

if path is None:
attrs_key = self._attrs_key
else:
attrs_key = os.path.join(path, self._attrs_key)

n5_attrs = self._load_n5_attrs(attrs_key)
return len(n5_attrs) > 0 and "dimensions" not in n5_attrs

def _is_array(self, path):

if path is None:
attrs_key = self._attrs_key
else:
attrs_key = os.path.join(path, self._attrs_key)

return "dimensions" in self._load_n5_attrs(attrs_key)

def _contains_attrs(self, path):

if path is None:
attrs_key = self._attrs_key
else:
if not path.endswith(self._attrs_key):
attrs_key = os.path.join(path, self._attrs_key)
else: # pragma: no cover
attrs_key = path

attrs = attrs_to_zarr(self._load_n5_attrs(attrs_key))
return len(attrs) > 0


def is_chunk_key(key):
rv = False
segments = list(key.split('/'))
if segments:
last_segment = segments[-1]
return _prog_ckey.match(last_segment)
return False # pragma: no cover
rv = _prog_ckey.match(last_segment)
return rv


def invert_chunk_coords(key):
Expand Down Expand Up @@ -373,6 +660,7 @@ def array_metadata_to_zarr(array_metadata):
array_metadata['fill_value'] = 0 # also if None was requested
array_metadata['order'] = 'C'
array_metadata['filters'] = []
array_metadata['dimension_separator'] = '.'

compressor_config = array_metadata['compressor']
compressor_config = compressor_config_to_zarr(compressor_config)
Expand Down
19 changes: 12 additions & 7 deletions zarr/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -1065,22 +1065,28 @@ class FSStore(MutableMapping):
Separator placed between the dimensions of a chunk.
storage_options : passed to the fsspec implementation
"""
_array_meta_key = array_meta_key
_group_meta_key = group_meta_key
_attrs_key = attrs_key

_META_KEYS = (attrs_key, group_meta_key, array_meta_key)

def __init__(self, url, normalize_keys=False, key_separator=None,
def __init__(self, url, normalize_keys=True, key_separator=None,
mode='w',
exceptions=(KeyError, PermissionError, IOError),
dimension_separator=None,
**storage_options):
import fsspec
self.normalize_keys = normalize_keys

protocol, _ = fsspec.core.split_protocol(url)
# set auto_mkdir to True for local file system
if protocol in (None, "file") and not storage_options.get("auto_mkdir"):
storage_options["auto_mkdir"] = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️


self.map = fsspec.get_mapper(url, **storage_options)
self.fs = self.map.fs # for direct operations
self.path = self.fs._strip_protocol(url)
self.mode = mode
self.exceptions = exceptions

# For backwards compatibility. Guaranteed to be non-None
if key_separator is not None:
dimension_separator = key_separator
Expand All @@ -1091,7 +1097,6 @@ def __init__(self, url, normalize_keys=False, key_separator=None,

# Pass attributes to array creation
self._dimension_separator = dimension_separator

if self.fs.exists(self.path) and not self.fs.isdir(self.path):
raise FSPathExistNotDir(url)

Expand All @@ -1100,7 +1105,7 @@ def _normalize_key(self, key):
if key:
*bits, end = key.split('/')

if end not in FSStore._META_KEYS:
if end not in (self._array_meta_key, self._group_meta_key, self._attrs_key):
end = end.replace('.', self.key_separator)
key = '/'.join(bits + [end])

Expand Down Expand Up @@ -1178,7 +1183,7 @@ def listdir(self, path=None):
if self.key_separator != "/":
return children
else:
if array_meta_key in children:
if self._array_meta_key in children:
# special handling of directories containing an array to map nested chunk
# keys back to standard chunk keys
new_children = []
Expand Down