-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XrdCl] Avoid possibility of Channel unregistering the wrong task #1947
Conversation
Hi. What shall we do with this PR? Maybe there are some open questions: (a) is this effective in helping with #1883 In the issue I outlined why I think this change would be of help. Unfortunately, for purposes of checking, the original problem looks rare enough that even running with a change does not make a clear case, expect in a failing case where the problem reoccurs. As an addition I could instrument a test with extra logging (and probably artificial delays) to more directly show this avoiding a reproducible problem with similar symptoms. On the other hand during discussion of #1883 it was said that a larger change would improve the code. My subjective feeling is this is a larger change should be done only in release 6; but certainly that's debatable. [Actually the most recent EOS packaged eos-xrootd did cherry-pick this patch, but if we decide to abandon for xroot it I guess we'd drop it again on the eos side in future]. |
The problem that we found with this patch is that it appeared to just move the problem. By avoiding the call to UnregisterTask() may leave the task hanging around with a reference to the new deleted channel object. That simply appeared to create a SEGV in the opposite direction (i.e. task to channel vs channel to task). Given that the problem doesn't happen often makes this one really hard to solve and will likely require some redesign to avoid a fatal task/channel dependency. Frankly, I'm surprised that the EOS folks picked up this patch given its problematic nature. |
Hi. Thanks, I didn't realise there was the specific concern that this change adds a problem. My reading of this is that: (i) In ~Channel(), Invalidate() is called on the pTickGenerator. This removes the associated between this TickGeneratorTask and the Channel. If the Run() method of the task is called it no longer invokes any method of the Channel, and returns (time_t)0. So I'd think that removing UnregisterTask() is the appropriate, aside from issue 1883. I haven't clearly shown this is the specific cause of issue #1883, although I could try to do so. And even if that's done it doesn't rule out the possibility of other bugs around this area; so your point about redesign here is taken. But I wanted to double check the evaluation this is change is actually introducing a bug, as that is certainly not wanted! |
Well, thank you for making me dive further into this. Indeed, it looks like abandoning the task will actually work here. So, I am merging the change. Sorry for causing you some grief by misunderstanding the logic flow (or miss-flow as it is). |
This is (hopefully) a fix for issue #1883