race condition for… file metadata? #66

niacdoial · 2021-06-14T20:13:33Z

look, I know how weird this looks, but hear me out.
I somehow managed to get a race condition(?) where portalocker.open('file/path', 'a') would not actually get me at the end of the file I opened.
There was a lot of moving parts in this, with potential culprits such as:

an NFS filesytem
accessing a file from different machines within a cluster
an EOL'd kernel on every machine of said cluster (!)

here are a few small working examples.

script1.py

import portalocker
with portalocker.Lock('file/path','a') as f:
    f.write('prelude,')
    print(f.tell())
    input('start script2 and continue')
    f.write('AAAA')

script 2, variant a:

import portalocker
with portalocker.Lock('file/path','a') as f:
    print(f.tell())
    f.write('BBBB')

script 2, variant b:

import portalocker
from time import sleep
with portalocker.Lock('file/path','a') as f:
    print(f.tell())
    print(f.seek(0,2))
    sleep(5)
    print(f.tell())
    print(f.seek(0,2))

I emptied the file between all tests.

test 1: running script1 on node1, then quickly script2a on node2:
script1 prints 8\nstart script2 and continue, script2 prints 0, and the file contains BBBBude,AAAA

test 2: running script1 on node1, then quickly script2b on node2:
script2 prints 0\n0\n0\n12. (although the results can change with the delay set in sleep())

test 3: running script1 on node1, waiting, and running script2b on node2:
script2 prints 8\n8\n8\n12 or 8\n8\n12\n12.

test 4: running script1 on node1, waiting, and running script2b on node2:
script2 prints 8\n12\n12\n12?

I have yet to make tests where one or more of those scripts run on the controller node (which is the target of the NFS mount involved)

I am aware this might not be an issue on your side specifically, but like… yeah.
I hope I'm not wasting your time with this.

The text was updated successfully, but these errors were encountered:

wolph · 2021-06-15T16:09:13Z

With NFS... doesn't sound too weird, NFS and locking are rarely friends in my experience ;)

The thing with locks on unix systems is that they're (by default at least) advisory locks. Both clients need to choose to lock and the underlying filesystem needs to support it and the filesystem needs to be mounted with locking enabled. And with NFS, that last one is often the culprit. I should note that with NFS the locking can be enforced instead of being advisory.

So... common pitfalls with NFS locking:

The mount flags, most people disable NFS locking due to performance reasons (it can be really slow). You can check if the (mount) command includes the nolock parameter
The NFS version of all clients needs to be the same. The locking system between NFS 3 and NFS 4 is different and won't react to each other
It could simply be a bug because of old kernels or other things... Look at D10 in the NFS FAQ: http://nfs.sourceforge.net/
Perhaps this library is not using the correct locking flags for NFS. I honestly never tried because NFS locks are generally too slow to be usable in my experience.

In any case, I would start by looking at the server version, client versions and mount flags. See if they all match up and if nolock is not set as a mount flag.

niacdoial · 2021-06-15T19:05:52Z

yep, everything matches up, and the closest thing to a nolock option is local_lock=none, which… actually has the opposite effects for my purposes?

also the locking itself seems to work, because the various script2s always hang before I press enter on script1?

also also, I tried adding fh.flush()\n os.fsync(fh.fileno()) to script1, without any effect. I have no idea how I didn't see this part of the docs before, kinda disappointed by myself right now.

wolph · 2021-06-16T14:24:07Z

Yes, local_lock=none should be the right flag to have.

I'm not really sure what else to try honestly... it sounds like you're doing everything right.

wolph · 2023-08-29T00:41:41Z

I'm not sure if it's entirely relevant for you, but for cross-system locking I've also implemented a redis based locking system: https://github.com/wolph/portalocker#redis-locks

It works across multiple threads, systems, etc... in a completely safe manner

github-actions bot added the Stale label Aug 21, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 29, 2023

wolph mentioned this issue Dec 6, 2023

lock on NFS mount not working #92

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

race condition for… file metadata? #66

race condition for… file metadata? #66

niacdoial commented Jun 14, 2021

wolph commented Jun 15, 2021

niacdoial commented Jun 15, 2021

wolph commented Jun 16, 2021

wolph commented Aug 29, 2023

race condition for… file metadata? #66

race condition for… file metadata? #66

Comments

niacdoial commented Jun 14, 2021

wolph commented Jun 15, 2021

niacdoial commented Jun 15, 2021

wolph commented Jun 16, 2021

wolph commented Aug 29, 2023