-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
race condition for… file metadata? #66
Comments
With NFS... doesn't sound too weird, NFS and locking are rarely friends in my experience ;) The thing with locks on unix systems is that they're (by default at least) advisory locks. Both clients need to choose to lock and the underlying filesystem needs to support it and the filesystem needs to be mounted with locking enabled. And with NFS, that last one is often the culprit. I should note that with NFS the locking can be enforced instead of being advisory. So... common pitfalls with NFS locking:
In any case, I would start by looking at the server version, client versions and mount flags. See if they all match up and if |
yep, everything matches up, and the closest thing to a also the locking itself seems to work, because the various script2s always hang before I press enter on script1? also also, I tried adding |
Yes, I'm not really sure what else to try honestly... it sounds like you're doing everything right. |
I'm not sure if it's entirely relevant for you, but for cross-system locking I've also implemented a redis based locking system: https://github.com/wolph/portalocker#redis-locks It works across multiple threads, systems, etc... in a completely safe manner |
look, I know how weird this looks, but hear me out.
I somehow managed to get a race condition(?) where portalocker.open('file/path', 'a') would not actually get me at the end of the file I opened.
There was a lot of moving parts in this, with potential culprits such as:
here are a few small working examples.
script1.py
script 2, variant a:
script 2, variant b:
I emptied the file between all tests.
test 1: running script1 on node1, then quickly script2a on node2:
script1 prints
8\nstart script2 and continue
, script2 prints0
, and the file containsBBBBude,AAAA
test 2: running script1 on node1, then quickly script2b on node2:
script2 prints
0\n0\n0\n12
. (although the results can change with the delay set in sleep())test 3: running script1 on node1, waiting, and running script2b on node2:
script2 prints
8\n8\n8\n12
or8\n8\n12\n12
.test 4: running script1 on node1, waiting, and running script2b on node2:
script2 prints
8\n12\n12\n12
?I have yet to make tests where one or more of those scripts run on the controller node (which is the target of the NFS mount involved)
I am aware this might not be an issue on your side specifically, but like… yeah.
I hope I'm not wasting your time with this.
The text was updated successfully, but these errors were encountered: