Try to converge on internal types consistency. #574

Carreau · 2020-07-01T16:30:28Z

While it is convenient for user on high level API to be able to pass
arround many types (array, bytes, or even dictionary, list, strings),
It can make it relatively hard to follow internally if the types are not
consistant.

In particular this can relatively complicate weaving the v3 spec into
the core of Zarr and add a number of conditional if we are not sure of
the type we get.

This try to be a little more consistent in what the implementation
emits right now this try to make sure that all the core only try to
converge toward bytes, so that store users can expect to alway get bytes
back from Zarr-Python stores, and as much as possible give bytes back,
and mostly affects the ConsolidatedMetadataStore which used to – unlike
other store – return objects instead of bytes, as well as the tests
directly testing the stores.

We of course can't be too strict as some existing store may have stored
some values as not-bytes (pickle), and we still want to support this for
the time being,but with this changes, we can add many assert isinstance(x, bytes_like) in many places ans still have the test suite
to pass.

In particular in all the decode/encode metadata functions, and Group
class.

One extra change in test is the removal of an old conditional import of
h5py in the test suite which is unnecessary since introduction of the
pytest's importorskip for h5py tests.

[Description of PR]

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/tutorial.rst
Changes documented in docs/release.rst
AppVeyor and Travis CI passes
Test coverage is 100% (Coveralls passes)

While it is convenient for user on high level API to be able to pass arround many types (array, bytes, or even dictionary, list, strings), It can make it relatively hard to follow internally if the types are not consistant. In particular this can relatively complicate weaving the v3 spec into the core of Zarr and add a number of conditional if we are not sure of the type we get. This try to be a little more consistent in what the implementation _emits_ right now this try to make sure that all the core only try to converge toward bytes, so that store users can expect to alway get bytes back from Zarr-Python stores, and as much as possible give bytes back, and mostly affects the ConsolidatedMetadataStore which used to – unlike other store – return objects instead of bytes, as well as the tests directly testing the stores. We of course can't be too strict as some existing store may have stored some values as not-bytes (pickle), and we still want to support this for the time being,but with this changes, we can add many `assert isinstance(x, bytes_like)` in many places ans still have the test suite to pass. In particular in all the decode/encode metadata functions, and Group class. One extra change in test is the removal of an old conditional import of h5py in the test suite which is unnecessary since introduction of the pytest's `importorskip` for h5py tests.

Carreau · 2020-07-01T16:34:16Z

Extra notes, returning object with Consolidated metadata mean that you actually get reference to mutable objects and have a risk of modifying the internal state of the consolidated store if you don't pay attention.

Would it be ok in a separate PR to add a "strict" mode (opt-in), that would trigger a bunch of assert isinstance(xxx, bytes_likes) or similar in a couple of places to help with type stability ?

jakirkham · 2020-07-02T17:07:00Z

Sorry for not raising this yesterday. I'm a bit unsure about this as we have uses cases where we would like to handle other non-bytes objects like mmap ( #377 ) and CuPy ndarrays ( zarr-developers/numcodecs#212 ) ( #501 ).

Carreau · 2020-07-06T15:21:04Z

I think it would be ok to ensure only bytes-like in python. My main problem is to poke at it to figure out whether it's already decoded or not.

alimanfoo

Hi @Carreau, in general agree with spirit of trying to achieve type consistency, but a couple of points worth some discussion here.

alimanfoo · 2020-07-09T23:19:28Z

zarr/core.py

@@ -1489,7 +1489,7 @@ def _set_basic_selection_zd(self, selection, value, fields=None):
            chunk[selection] = value

        # encode and store
-        cdata = self._encode_chunk(chunk)
+        cdata = ensure_bytes(self._encode_chunk(chunk))


N.B., ensuring bytes may introduce a memory copy, which may be unnecessary. Memory copies can noticeably affect performance. Ensuring we have something that supports the buffer interface would be better.

alimanfoo · 2020-07-09T23:31:07Z

zarr/storage.py

+        self.meta_store = {}
+        for k, v in meta["metadata"].items():
+            self.meta_store[k] = json_dumps(v)


FWIW this was originally how this worked, but this introduces an extra unnecessary JSON encode/decode round trip into the processing of consolidated metadata. In practice that may not be a performance issue, but it seemed awkward to have this extra JSON encode/decode if it wasn't necessary. Hence I introduced a special case which avoided the necessity for this, see the parse_metadata function in the original PR.

joshmoore · 2022-05-04T11:38:21Z

I assume this is now handled by #898 et al. cc: @grlee77

alimanfoo reviewed Jul 9, 2020

View reviewed changes

Carreau marked this pull request as draft September 9, 2020 17:13

joshmoore closed this May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to converge on internal types consistency. #574

Try to converge on internal types consistency. #574

Carreau commented Jul 1, 2020

Carreau commented Jul 1, 2020

jakirkham commented Jul 2, 2020

Carreau commented Jul 6, 2020

alimanfoo left a comment

alimanfoo Jul 9, 2020

alimanfoo Jul 9, 2020

joshmoore commented May 4, 2022

Try to converge on internal types consistency. #574

Try to converge on internal types consistency. #574

Conversation

Carreau commented Jul 1, 2020

Carreau commented Jul 1, 2020

jakirkham commented Jul 2, 2020

Carreau commented Jul 6, 2020

alimanfoo left a comment

Choose a reason for hiding this comment

alimanfoo Jul 9, 2020

Choose a reason for hiding this comment

alimanfoo Jul 9, 2020

Choose a reason for hiding this comment

joshmoore commented May 4, 2022