Skip to content

Commit

Permalink
Add links to numcodecs docs in tutorial (#1535)
Browse files Browse the repository at this point in the history
* Fix numcodecs links

* Add release note
  • Loading branch information
dstansby committed Oct 31, 2023
1 parent d756626 commit cce501a
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 8 deletions.
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,7 @@ def setup(app):
intersphinx_mapping = {
"python": ("https://docs.python.org/", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"numcodecs": ("https://numcodecs.readthedocs.io/en/stable/", None),
}


Expand Down
8 changes: 8 additions & 0 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ Release notes
Unreleased
----------

Docs
~~~~

* Add links to ``numcodecs`` docs in the tutorial.
By :user:`David Stansby <dstansby>` :issue:`1535`.

Maintenance
~~~~~~~~~~~

Expand All @@ -33,6 +39,8 @@ Maintenance
* Allow ``black`` code formatter to be run with any Python version.
By :user:`David Stansby <dstansby>` :issue:`1549`.



.. _release_2.16.1:

2.16.1
Expand Down
17 changes: 9 additions & 8 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1175,8 +1175,9 @@ A fixed-length unicode dtype is also available, e.g.::
For variable-length strings, the ``object`` dtype can be used, but a codec must be
provided to encode the data (see also :ref:`tutorial_objects` below). At the time of
writing there are four codecs available that can encode variable length string
objects: :class:`numcodecs.VLenUTF8`, :class:`numcodecs.JSON`, :class:`numcodecs.MsgPack`.
and :class:`numcodecs.Pickle`. E.g. using ``VLenUTF8``::
objects: :class:`numcodecs.vlen.VLenUTF8`, :class:`numcodecs.json.JSON`,
:class:`numcodecs.msgpacks.MsgPack`. and :class:`numcodecs.pickles.Pickle`.
E.g. using ``VLenUTF8``::

>>> import numcodecs
>>> z = zarr.array(text_data, dtype=object, object_codec=numcodecs.VLenUTF8())
Expand All @@ -1201,8 +1202,8 @@ is a short-hand for ``dtype=object, object_codec=numcodecs.VLenUTF8()``, e.g.::
'Helló, világ!', 'Zdravo svete!', 'เฮลโลเวิลด์'], dtype=object)

Variable-length byte strings are also supported via ``dtype=object``. Again an
``object_codec`` is required, which can be one of :class:`numcodecs.VLenBytes` or
:class:`numcodecs.Pickle`. For convenience, ``dtype=bytes`` (or ``dtype=str`` on Python
``object_codec`` is required, which can be one of :class:`numcodecs.vlen.VLenBytes` or
:class:`numcodecs.pickles.Pickle`. For convenience, ``dtype=bytes`` (or ``dtype=str`` on Python
2.7) can be used as a short-hand for ``dtype=object, object_codec=numcodecs.VLenBytes()``,
e.g.::

Expand All @@ -1218,7 +1219,7 @@ e.g.::
b'\xe0\xb9\x80\xe0\xb8\xae\xe0\xb8\xa5\xe0\xb9\x82\xe0\xb8\xa5\xe0\xb9\x80\xe0\xb8\xa7\xe0\xb8\xb4\xe0\xb8\xa5\xe0\xb8\x94\xe0\xb9\x8c'], dtype=object)

If you know ahead of time all the possible string values that can occur, you could
also use the :class:`numcodecs.Categorize` codec to encode each unique string value as an
also use the :class:`numcodecs.categorize.Categorize` codec to encode each unique string value as an
integer. E.g.::

>>> categorize = numcodecs.Categorize(greetings, dtype=object)
Expand All @@ -1245,7 +1246,7 @@ The best codec to use will depend on what type of objects are present in the arr

At the time of writing there are three codecs available that can serve as a general
purpose object codec and support encoding of a mixture of object types:
:class:`numcodecs.JSON`, :class:`numcodecs.MsgPack`. and :class:`numcodecs.Pickle`.
:class:`numcodecs.json.JSON`, :class:`numcodecs.msgpacks.MsgPack`. and :class:`numcodecs.pickles.Pickle`.

For example, using the JSON codec::

Expand All @@ -1258,7 +1259,7 @@ For example, using the JSON codec::
array([42, 'foo', list(['bar', 'baz', 'qux']), {'a': 1, 'b': 2.2}, None], dtype=object)

Not all codecs support encoding of all object types. The
:class:`numcodecs.Pickle` codec is the most flexible, supporting encoding any type
:class:`numcodecs.pickles.Pickle` codec is the most flexible, supporting encoding any type
of Python object. However, if you are sharing data with anyone other than yourself, then
Pickle is not recommended as it is a potential security risk. This is because malicious
code can be embedded within pickled data. The JSON and MsgPack codecs do not have any
Expand All @@ -1270,7 +1271,7 @@ Ragged arrays

If you need to store an array of arrays, where each member array can be of any length
and stores the same primitive type (a.k.a. a ragged array), the
:class:`numcodecs.VLenArray` codec can be used, e.g.::
:class:`numcodecs.vlen.VLenArray` codec can be used, e.g.::

>>> z = zarr.empty(4, dtype=object, object_codec=numcodecs.VLenArray(int))
>>> z
Expand Down

0 comments on commit cce501a

Please sign in to comment.