# Object arrays

See [#212](https://github.com/alimanfoo/zarr/pull/212) for more information.

In [2]:
import zarr

zarr.__version__

'2.2.0a2.dev82+dirty'

In [3]:
import numcodecs

numcodecs.__version__

'0.5.0'

## API changes in Zarr version 2.2

Creation of an object array requires providing new ``object_codec`` argument:

In [4]:
z = zarr.empty(10, chunks=5, dtype=object, object_codec=numcodecs.MsgPack())
z

<zarr.core.Array (10,) object>

To maintain backwards compatibility with previously-created data, the object codec is treated as a filter and inserted as the first filter in the chain:

In [5]:
z.info

0,1
Type,zarr.core.Array
Data type,object
Shape,"(10,)"
Chunk shape,"(5,)"
Order,C
Read-only,False
Filter [0],MsgPack(encoding='utf-8')
Compressor,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)"
Store type,builtins.dict
No. bytes,80


In [6]:
z[0] = 'foo'
z[1] = b'bar'  # msgpack doesn't support bytes objects correctly
z[2] = 1
z[3] = [2, 4, 6, 'baz']
z[4] = {'a': 'b', 'c': 'd'}
a = z[:]
a

array(['foo', 'bar', 1, list([2, 4, 6, 'baz']), {'a': 'b', 'c': 'd'}, None,
       None, None, None, None], dtype=object)

If no ``object_codec`` is provided, a ``ValueError`` is raised:

In [7]:
z = zarr.empty(10, chunks=5, dtype=object)

ValueError: missing object_codec for object array

For API backward-compatibility, if object codec is provided via filters, issue a warning but don't raise an error.

In [8]:
z = zarr.empty(10, chunks=5, dtype=object, filters=[numcodecs.MsgPack()])



If a user tries to subvert the system and create an object array with no object codec, a runtime check is added to ensure no object arrays are passed down to the compressor (which could lead to nasty errors and/or segfaults):

In [9]:
z = zarr.empty(10, chunks=5, dtype=object, object_codec=numcodecs.MsgPack())
z._filters = None  # try to live dangerously, manually wipe filters

In [10]:
z[0] = 'foo'

RuntimeError: cannot write object array without object codec

Here is another way to subvert the system, wiping filters **after** storing some data. To cover this case a runtime check is added to ensure no object arrays are handled inappropriately during decoding (which could lead to nasty errors and/or segfaults).

In [11]:
from numcodecs.tests.common import greetings

z = zarr.array(greetings, chunks=5, dtype=object, object_codec=numcodecs.MsgPack())
z[:]

array(['¡Hola mundo!', 'Hej Världen!', 'Servus Woid!', 'Hei maailma!',
       'Xin chào thế giới', 'Njatjeta Botë!', 'Γεια σου κόσμε!', 'こんにちは世界',
       '世界，你好！', 'Helló, világ!', 'Zdravo svete!', 'เฮลโลเวิลด์'], dtype=object)

In [12]:
z._filters = []  # try to live dangerously, manually wipe filters
z[:]

RuntimeError: cannot read object array without object codec