Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent writes to the same chunk using DirectoryStore #328

Open
alimanfoo opened this issue Nov 9, 2018 · 4 comments
Open

Concurrent writes to the same chunk using DirectoryStore #328

alimanfoo opened this issue Nov 9, 2018 · 4 comments

Comments

@alimanfoo
Copy link
Member

Currently, DirectoryStore (and sub-classes) implement a form of atomic write, which means that when a key/value is being set (e.g., data for a chunk are being written) the data will first be written to a temporary file, and then if the write is successful, the temporary file will be moved into place. The rationale for this design is that data should never be left in a half-written state, i.e., a write either succeeds completely, or fails completely.

Also, DirectoryStore currently aims to be safe to use with multiple threads and/or processes. A question arises, what should happen if two threads or two processes attempt to write to the same key at the same time?

Before discussing that question, it is worth saying that in general, if there is a chance that two threads or processes might write to the same key at the same time, then that is something the user should be trying to avoid. Even if DirectoryStore uses atomic writes, if two write operations occur concurrently, then some data will get lost - one of the concurrent writes will get overwritten by the other. The user can do two things to avoid this. Either (1) they craft their own code carefully to ensure write operations are fully aligned with chunk boundaries, so two threads or processes are never writing to the same chunk, or (2) they use the support provided in zarr for synchronisation (locking). Zarr decouples storage from synchronisation, which allows different methods of synchronisation (locking) to be used with different types of storage. In other words, synchronisation is not the responsibility of the storage layer.

That said, it can (and does) happen that a user who is relatively new to Zarr might not yet be aware of these issues, and so might try to run a program that does not align writes with chunk boundaries, and does not implement any synchronisation. A specific example of that is #277. Ironically, there is currently a bug (#263) in the atomic writing implementation, which means that a race condition can occur during concurrent writes to the same chunks, which generates an error. In the case of #277, that bug was in fact a boon, because it caused an error that ultimately led to the user realising that synchronisation issues were occurring, which then led them to rework their own code.

However, the atomic write bug (#263) is in the process of being fixed (#327). When it is fixed, that will mean that in a situation like #277, the user would not get any error messages, and write operations would silently get lost. Presumably the problem would take much longer to surface, because the user will need to inspect the data to realise something has gone wrong. It might even be so subtle as to go unnoticed, which could be very bad for obvious reasons.

This is causing me some concern, and making me think that DirectoryStore should at least fail (i.e., generate an exception) if an attempt is made by two processes to concurrently write to the same key. That way, a user would realise as in #277 that something was wrong, and they need to implement a solution.

I also wonder if a possible solution might be that, instead of each atomic write opening a completely new temporary file, whether there should be just one temporary file for each key to which data are initially written, and this file is opened in exclusive mode (i.e., mode='x') so that if two concurrent threads or processes attempt to write an error is generated.

I.e., the code currently from this line would become something like:

        temp_path = file_path + '.partial'
        try:
            open(temp_path, mode='xb') as f:
                f.write(value)

            # move temporary file into place
            os.replace(temp_path, file_path)

        finally:
            # clean up if temp file still exists for whatever reason
            if os.path.exists(temp_path):
                os.remove(temp_path)

This also relates to #325 as there is discussion there about how to open a file for writing to.

@jakirkham
Copy link
Member

Thanks for writing this up, @alimanfoo. Agree with everything you have laid out.

It's worth pointing out that some filesystems may be slow to update changes. So there could be deadlocks or worse collisions.

Should we start thinking about this in the case of other stores?

AFAIK Python 2 doesn't have mode="x". So we may need to use something else for compatibility (though there may be other motivating factors). One option if we want to lock the actual file would be portalocker. Alternatively we could use fasteners per usual to solve this with a different file. There are a bunch of other similar packages that could be worth looking at as well.

@alimanfoo
Copy link
Member Author

alimanfoo commented Nov 9, 2018 via email

@martindurant
Copy link
Member

On a bit of a tangent, the filesystem-spec project has the concept of transactions, which use move in the case of LocalFileSystem once the transaction is completed; this is semi-atomic and single-process, so not really along the lines of what's being talked about here. People may find it interesting to investigate the code I have there. But that mechanism, or something like it, could be an inherent piece of functionality of the storage layer, not of the zarr-specific key-value mapping class.

My point is, it's worth considering how this discussion could be extended to arbitrary storage backends, not just local file. Does it make sense to talk about sync mechanisms and races with lower-latency systems such as S3?

@sbalmer
Copy link
Contributor

sbalmer commented Nov 13, 2018

I like the idea of providing a detection help as long as it's reasonably cheap. In my view it needn't work reliably on all platforms. Just having it would be a plus.

As for the implementation: Wouldn't locking the entire store (for example using a lock on .zattr) be less expensive and detect the situation with certainty? For example with the error message: "Only one instance can be open in write-mode. If you want to write concurrently, provide a synchronizer." (There could be a NullSynchronizer for the people that know their writes are unrelated or don't care about consistency.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants