-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XrdCl: Fix race condition in PostMaster initialization #251
Conversation
This won't work on platforms without atomic writes of 64-bit pointers. Other threads may pull half-written pointer addresses from the static. |
That's true but it's still better than what we have now. Unless we take the lock all the time it will be a race condition no matter what. But the penalty of taking the lock is not worth it given that this only happens briefly in the beginning. |
@esindril - locks aren't necessary here; we can use std::atomics to verify the read/write is done correctly. On x86 platforms, the generated assembler turns out to be the same - IIRC, the potentially problematic platforms are PowerPC and ARM (CMSSW maintains a build for ARM which I try to help keep shiny). |
I suppose a silly question but why can't the postmaster be initialized before we start all the other threads? Why did this start showing up now? I do agree that an atomic read/write would work here but it's iffy whether or not an atomic read/write will be more efficient than getting a lock. In general, a full memory barrier has to be established around the memory access. For most (though not all) X86 architecture that is automatic for properly aligned values, which we have here. I agree, it's usually a problem for non-X86 architectures. |
1b6b853
to
6889ed1
Compare
In case several threads start doing transfers in parallel some can fail with the following error: [FATAL] Initialization error because the post master object is created but not initialized.
I have updated this pull request. Indeed, as Lukasz noticed I removed by mistake the wrong check. What I actually wanted to do was to remove the check of pInitialized as it should never happen that you get a PostMaster object without it being initialized. And as far as the "half-written" pointer issue is concerned, we are no better off in the current situation without this patch. The only bullet-proof solution would be a lock or atomics - which I don't believe is worth given this happens only when the XrdCl library is initialized and the lock (or any flavour of synchronisation) would be an unnecessary overhead after the start-up phase. |
XrdCl: Fix race condition in PostMaster initialization
Hi @esindril - Just to clear up misconceptions. There is no overhead for the atomic I suggested on platforms besides ARM (and there it is pretty small compared to a lock). Not all synchronization primitives are built the same! Brian |
For this particular case since we only set the variable once and then only do get operations I totally agree. No misconception here. I will add it as an atomic so that all concerns are addressed - the target of this patch was the initialization bit. Thanks for your input @bbockelm. |
Just to add to he noise. Quite true not all atomics are built the same but fence(); store(); fence() The fence() is the problematic thing here. It all depends on the Andy On Wed, 15 Jul 2015, Brian Bockelman wrote:
|
This will only be triggered in places where networking is involved, so were talking of the latencies in the order of tens (hundreds) of miliseconds. I don't think that couple hundred nanoseconds delay caused by a memory barrier will be much of an issue here. Alternatively you may move the PostMaster initialization to the static initializer, where it will be done on the app start up in a single-threaded context, but this may have other consequences for people who expect to do some things in a single-threaded mode before they do any data access. Your call. @esindril |
I guess my point is that atomics should not be looked at as a general On Wed, 15 Jul 2015, Lukasz Janyst wrote:
|
Your call. Bear in mind that it may break the apps that need single-threaded context before doing the IO (ie. some forking stuff). |
In case several threads start doing transfers in parallel some can
fail with the following error: [FATAL] Initialization error
because the post master object is created but not initialized.
This only happens in the beginning when the PostMaster is not set up. The initial idea was probably to make the GetPostMaster code as efficient as possible by not taking the lock on the most frequent case. But this leads to a situation where a thread creates the sPostMaster object but does not get to initialize it, and another thread picks it up as the object exists and tries to use it.