You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When attempting to open a group in parallel for the first time in parallel, there appears to be a synchronization issue in the init_group function. It first checks for contains_group, and if False, initializes the group metadata. The process synchronizers are not used in this section, so it is possible to get into a race condition where the initial check comes up False but another process writes the metadata between the intial check and initializing metadata. The above code should produce:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "test_zarr_sync.py", line 7, in _par_create_group
zarr.open('test.zarr', mode='a', synchronizer=sync, cache_attrs=False)
File "<venv_path>/lib/python3.7/site-packages/zarr/convenience.py", line 89, in open
return open_group(store, mode=mode, **kwargs)
File "<venv_path>/lib/python3.7/site-packages/zarr/hierarchy.py", line 1171, in open_group
init_group(store, path=path, chunk_store=chunk_store)
File "<venv_path>/lib/python3.7/site-packages/zarr/storage.py", line 473, in init_group
chunk_store=chunk_store)
File "<venv_path>/lib/python3.7/site-packages/zarr/storage.py", line 492, in _init_group_metadata
raise ContainsGroupError(path)
zarr.errors.ContainsGroupError: path '' contains a group
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "test_zarr_sync.py", line 11, in <module>
pool.map(_par_create_group, [0]*100)
File "/usr/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
zarr.errors.ContainsGroupError: path "path '' contains a group" contains a group
Version and installation information
Please provide the following:
Value of zarr.__version__: 2.5.0
Value of numcodecs.__version__: 0.7.2
Version of Python interpreter: 3.7
Operating system (Linux/Windows/Mac): LInux
How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): pip
The text was updated successfully, but these errors were encountered:
yousefmoazzam
added a commit
to DiamondLightSource/httomo
that referenced
this issue
Apr 5, 2024
Creation of `Group` objects via `zarr.open()` and `zarr.open_group()`
suffers a race condition regarding no synchronisation between processes
when attempting to initialise group metadata (see
zarr-developers/zarr-python#658).
Therefore, `Group` creation via `open_group()` has been replaced in
favour of:
- the rank 0 process being the only one to create any groups (see
auxiliary data creation code changes)
- passing a `DirectoryStore` object to `zarr.open_array()`, which does
not exhibit the same synchronisation issues
Minimal, reproducible code sample, a copy-pastable example if possible
Problem description
When attempting to open a group in parallel for the first time in parallel, there appears to be a synchronization issue in the
init_group
function. It first checks forcontains_group
, and if False, initializes the group metadata. The process synchronizers are not used in this section, so it is possible to get into a race condition where the initial check comes up False but another process writes the metadata between the intial check and initializing metadata. The above code should produce:Version and installation information
Please provide the following:
zarr.__version__
:2.5.0
numcodecs.__version__
:0.7.2
3.7
LInux
pip
The text was updated successfully, but these errors were encountered: