-
-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocess write fails with lost data - ignoring ProcessSynchronizer? #701
Comments
@choosehappy : can you share how you are launching this across your cluster? |
On the command line:
I've also confirmed the behavior in windows which has python 3.7.5 (deleting "data" directory in between runs via windows explorer). note the first time it works as expected (suggesting no code error to me), but the other 2 times it does not, leaving me to believe it is some type of race condition
happy to provide any additional details! |
btw, to avoid confusion, my laptop is less powerful than the Linux machine, so I changed the # of items for the Linux machine:
for the windows laptop:
|
For those interested, I've hacked around this bug by using an array of multiprocessing locks, and requiring obtaining the lock before writing to the zarr file. in this manner, ProcessSynchronizer aren't needed would be interested in hearing if anyone sees any issues with this besides a bit of a performance hit?
|
I think i am following the documentation correctly, but this does not appear to work as expected. After a few hours of debugging I'm stumped, but it does not look like "This array is safe to read or write from multiple processes." as the documentation suggests (https://zarr.readthedocs.io/en/stable/tutorial.html Parallel computing and synchronization section)
It looks like each process is not aware of how long the current array is, writes wherever it thinks is the last piece, potentially overwriting existing data. Overall it appears like the existence of synchronizer is completely ignored?
Any thoughts or comments would be greatly appreciated
Problem description
This is a bare example, I create a ProcessSynchronizerand open an array stored locally (Linux) with that processor on a server and launch 32 processes.
Each of 10 process tries to add 100 integers to the array.
The expected result is 1000 integers, between 1 and 1000
What I receive is highly variably, and most of the time much less than 1000 integeters, for example in this case, 500 integers resulted:
deleting the data directory and running again results in only 400 values:
Version and installation information
Please provide the following:
zarr.__version__
'2.6.1'
numcodecs.__version__
'0.7.3'
Python 3.8.5
Linux
pip3 install into base o/s
The text was updated successfully, but these errors were encountered: