Concurrent writes to the same chunk using DirectoryStore #328

alimanfoo · 2018-11-09T13:06:39Z

Currently, DirectoryStore (and sub-classes) implement a form of atomic write, which means that when a key/value is being set (e.g., data for a chunk are being written) the data will first be written to a temporary file, and then if the write is successful, the temporary file will be moved into place. The rationale for this design is that data should never be left in a half-written state, i.e., a write either succeeds completely, or fails completely.

Also, DirectoryStore currently aims to be safe to use with multiple threads and/or processes. A question arises, what should happen if two threads or two processes attempt to write to the same key at the same time?

Before discussing that question, it is worth saying that in general, if there is a chance that two threads or processes might write to the same key at the same time, then that is something the user should be trying to avoid. Even if DirectoryStore uses atomic writes, if two write operations occur concurrently, then some data will get lost - one of the concurrent writes will get overwritten by the other. The user can do two things to avoid this. Either (1) they craft their own code carefully to ensure write operations are fully aligned with chunk boundaries, so two threads or processes are never writing to the same chunk, or (2) they use the support provided in zarr for synchronisation (locking). Zarr decouples storage from synchronisation, which allows different methods of synchronisation (locking) to be used with different types of storage. In other words, synchronisation is not the responsibility of the storage layer.

That said, it can (and does) happen that a user who is relatively new to Zarr might not yet be aware of these issues, and so might try to run a program that does not align writes with chunk boundaries, and does not implement any synchronisation. A specific example of that is #277. Ironically, there is currently a bug (#263) in the atomic writing implementation, which means that a race condition can occur during concurrent writes to the same chunks, which generates an error. In the case of #277, that bug was in fact a boon, because it caused an error that ultimately led to the user realising that synchronisation issues were occurring, which then led them to rework their own code.

However, the atomic write bug (#263) is in the process of being fixed (#327). When it is fixed, that will mean that in a situation like #277, the user would not get any error messages, and write operations would silently get lost. Presumably the problem would take much longer to surface, because the user will need to inspect the data to realise something has gone wrong. It might even be so subtle as to go unnoticed, which could be very bad for obvious reasons.

This is causing me some concern, and making me think that DirectoryStore should at least fail (i.e., generate an exception) if an attempt is made by two processes to concurrently write to the same key. That way, a user would realise as in #277 that something was wrong, and they need to implement a solution.

I also wonder if a possible solution might be that, instead of each atomic write opening a completely new temporary file, whether there should be just one temporary file for each key to which data are initially written, and this file is opened in exclusive mode (i.e., mode='x') so that if two concurrent threads or processes attempt to write an error is generated.

I.e., the code currently from this line would become something like:

        temp_path = file_path + '.partial'
        try:
            open(temp_path, mode='xb') as f:
                f.write(value)

            # move temporary file into place
            os.replace(temp_path, file_path)

        finally:
            # clean up if temp file still exists for whatever reason
            if os.path.exists(temp_path):
                os.remove(temp_path)

This also relates to #325 as there is discussion there about how to open a file for writing to.

The text was updated successfully, but these errors were encountered:

jakirkham · 2018-11-09T17:30:37Z

Thanks for writing this up, @alimanfoo. Agree with everything you have laid out.

It's worth pointing out that some filesystems may be slow to update changes. So there could be deadlocks or worse collisions.

Should we start thinking about this in the case of other stores?

AFAIK Python 2 doesn't have mode="x". So we may need to use something else for compatibility (though there may be other motivating factors). One option if we want to lock the actual file would be portalocker. Alternatively we could use fasteners per usual to solve this with a different file. There are a bunch of other similar packages that could be worth looking at as well.

alimanfoo · 2018-11-09T17:50:43Z

Thanks for writing this up, @alimanfoo <https://github.com/alimanfoo>. Agree with everything you have laid out. It's worth pointing out that some filesystems may be slow to update changes. So there could be deadlocks or worse collisions

Yeah, I'm guessing that relying on opening files in exclusive mode would through up a bunch of other issues, although I have limited experience at this level. E.g., one question I had is whether NFS or other distributed file systems support it or not. We might want to offer it as an option at least initially to gain some experience.

Should we start thinking about this in the case of other stores?

Possibly, not sure what can be done though. In some cases nothing I'm sure, e.g., cloud object stores. People using those stores will just need to understand they either align their writes or use some form of locking outside the store. In other cases I think stores may already implement their own internal locking, e.g., bdb.

AFAIK Python 2 doesn't have mode="x". So we may need to use something else for compatibility (though there may be other motivating factors). One option if we want to lock the actual file would be portalocker <https://github.com/WoLpH/portalocker>. Alternatively we could use fasteners per usual to solve this with a different file. There are a bunch of other similar packages that could be worth looking at as well.

FWIW I think it would be good to keep the separation between synchronization (locking) and storage. I.e., we don't try to implement any locking within file-system based stores. Instead we continue to provide a mechanism by which users can add a locking layer if they need to, which is decoupled from what type of storage they are using. That way they are free to use different types of locking with different types of stores. Main problem I believe for file systems is that a bunch of them don't support locking, so you couldn't rely on it. And anyway if you know you are working with a multithreaded app, you might prefer to use thread locks rather than file locks for performance, for example. So I think the question is not how to implement synchronization within a store, but rather whether there are any opportunities within a store to be defensive and raise errors if writes do collide. —

…

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#328 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAq8QjXazuUmH3U571-iQP0NlRCAfFsrks5utbu9gaJpZM4YWi-h> .

martindurant · 2018-11-09T18:30:54Z

On a bit of a tangent, the filesystem-spec project has the concept of transactions, which use move in the case of LocalFileSystem once the transaction is completed; this is semi-atomic and single-process, so not really along the lines of what's being talked about here. People may find it interesting to investigate the code I have there. But that mechanism, or something like it, could be an inherent piece of functionality of the storage layer, not of the zarr-specific key-value mapping class.

My point is, it's worth considering how this discussion could be extended to arbitrary storage backends, not just local file. Does it make sense to talk about sync mechanisms and races with lower-latency systems such as S3?

sbalmer · 2018-11-13T10:26:48Z

I like the idea of providing a detection help as long as it's reasonably cheap. In my view it needn't work reliably on all platforms. Just having it would be a plus.

As for the implementation: Wouldn't locking the entire store (for example using a lock on .zattr) be less expensive and detect the situation with certainty? For example with the error message: "Only one instance can be open in write-mode. If you want to write concurrently, provide a synchronizer." (There could be a NullSynchronizer for the people that know their writes are unrelated or don't care about consistency.)

This was referenced Nov 9, 2018

Set file permissions accoding to umask #325

Closed

avoid race condition during chunk write #327

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent writes to the same chunk using DirectoryStore #328

Concurrent writes to the same chunk using DirectoryStore #328

alimanfoo commented Nov 9, 2018

jakirkham commented Nov 9, 2018

alimanfoo commented Nov 9, 2018 via email

martindurant commented Nov 9, 2018

sbalmer commented Nov 13, 2018

Concurrent writes to the same chunk using DirectoryStore #328

Concurrent writes to the same chunk using DirectoryStore #328

Comments

alimanfoo commented Nov 9, 2018

jakirkham commented Nov 9, 2018

alimanfoo commented Nov 9, 2018 via email

martindurant commented Nov 9, 2018

sbalmer commented Nov 13, 2018