Potential synchronization optimization #386

alimanfoo · 2019-01-09T08:20:14Z

I get the impression that most people who are doing concurrent writes into a zarr array are managing writes so they align with chunk boundaries, and therefore don't need to use any synchronization. However, this may not be possible or easy in all circumstances, and so zarr has a synchronization API and two synchronizer implementations, one based on thread locking and one based on file locking. The synchronization is done on a per chunk basis, i.e., locks are used to synchronize writes to any given chunk, but different chunks are protected by different locks and so may written concurrently.

If a zarr array is instantiated with a synchronizer, currently a lock is obtained for any write to a chunk. However, some writes will completely overwrite (replace) a chunk, whereas some writes may only partially update the content of a chunk. In the main use cases that zarr aims to satisfy, data are being written concurrently to an array, and each concurrent writer is writing to a separate region of the array. For any chunk that falls completely within a region being written, and thus which will be completely replaced, there will never be any contention between workers. The only time there could be contention is for chunks that only partially overlap a region being written, and thus which be being partially updated.

Thus it would be possible to reduce the number of locks being used, by detecting for each chunk whether it is being partially updated or fully replaced, and only acquiring a lock if it is being partially updated. We already detect whether a chunk write is full or partial because this determines whether or not we need to read the chunk before writing (partial) or can just overwrite it (full). So we'd just need to make use of this information when deciding whether or not to acquire a lock.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential synchronization optimization #386

Potential synchronization optimization #386

alimanfoo commented Jan 9, 2019

Potential synchronization optimization #386

Potential synchronization optimization #386

Comments

alimanfoo commented Jan 9, 2019