Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in storage.py for NestedDirectoryStore writes #272

Closed
jmswaney opened this issue Jul 6, 2018 · 4 comments · Fixed by #318
Closed

Race condition in storage.py for NestedDirectoryStore writes #272

jmswaney opened this issue Jul 6, 2018 · 4 comments · Fixed by #318
Labels
bug Potential issues with the zarr-python library
Milestone

Comments

@jmswaney
Copy link
Contributor

jmswaney commented Jul 6, 2018

I've encountered a FileExistsError when writing in parallel to a NestedDirectoryStore that says:

FileExistsError: [Errno 17] File exists: '/media/jswaney/Drive/Justin/coregistration/whole_brain_tde/fixed/zarr/4/10'

The error points to the os.makedirs(dir_path) line in __setitem__ for the DirectoryStore class:

dir_path, file_name = os.path.split(file_path)
if os.path.isfile(dir_path):
    raise KeyError(key)
if not os.path.exists(dir_path):
    try:
        os.makedirs(dir_path)  # Error raised here
    except Exception:
        raise KeyError(key)

I was doing the chunk compression with 12 workers, so I think one of them created the subdirectory while another had already checked that it didn't exist. When it got to os.makedirs, it gave the expected error. A quick fix would just be to catch this error, but a better fix might be to just make all the directories for a NestedDirectoryStore ahead of time.

@alimanfoo
Copy link
Member

alimanfoo commented Jul 6, 2018 via email

@alimanfoo alimanfoo added this to the v2.3 milestone Oct 19, 2018
@jmswaney
Copy link
Contributor Author

It looks like there are a couple options for the Python 2 vs 3 problem listed here. They mention the exist_ok=True option in the context of the pathlib module which has been backported to Python 2 in the pathlib2 module, but it might be weird to use this OO-style path handling just for this though. Another option is to catch the general OSError and just throw an error if the directory still doesn't exist.

jmswaney added a commit to jmswaney/zarr that referenced this issue Oct 25, 2018
@alimanfoo alimanfoo added the bug Potential issues with the zarr-python library label Nov 20, 2018
@jakirkham
Copy link
Member

It sounds like we are already interested in using pathlib. ( #261 ) Maybe using the pathlib-based solution will work for us and we can add pathlib2 as a requirement.

@jakirkham
Copy link
Member

Note: Work on this is currently being pursued in PR ( #318 ).

jakirkham pushed a commit that referenced this issue Mar 19, 2019
* #272 fix race condition to make folders in directory store

* #272 remove FileNotFoundError for python 2 support

* Change OSError to KeyError to make tests pass

* Only catch `OSError` if its `errno` equal `EEXIST`

On Python 3, we would only want to catch the race case here by checking
for the `FileExistsError` and suppressing raising for that case. However
there is no `FileExistsError` on Python 2. Only `OSError` exists on
Python 2/3. However that catches some other errors that we might not
want to suppress here. So we check the `errno` of the `OSError` instance
to verify it is caused by an existing file/directory represented by the
`EEXIST` error number. This amounts to catching the `FileExistsError` in
a Python 2/3 friendly way. All other `OSErrors` we raise as before.

* Drop generic `Exception` case in `DirectoryStore`

As we now properly handle the existing file case properly, go ahead and
drop the generic exception case. This should now raise some errors that
were hidden previously.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants