gproc_pool:claim doesn't return worker when calling process is killed #109

netDalek · 2016-04-01T19:15:29Z

I've found gproc_pool very flexible vs other pool libraries, but found this behaviour

48>  gproc_pool:claim(pool, fun(_, P) ->  P end).
{true,<0.185.0>}
49> Pid = spawn_link(fun() -> gproc_pool:claim(pool, fun(_, P) -> timer:sleep(10000), P en
d) end).
<0.223.0>
50> exit(Pid, kill).
** exception exit: killed
51>  gproc_pool:claim(pool, fun(_, P) ->  P end).
false
69> gproc_pool:active_workers(pool).
[{a,<0.185.0>}]

After killing the process, worker stay busy forever. I look through the code and found that it is expected behaviour. Am I right? Are there any plans to improve it?

The text was updated successfully, but these errors were encountered:

uwiger · 2016-04-02T09:55:11Z

It's true that the code doesn't handle the case of a process inside a claim operation gets killed from the outside. Strictly speaking, this is a bug.

Fixing it will cause a general slowdown, but I think I found a way that is still cheap enough. See PR #110.
Let me know what you think.

netDalek · 2016-04-02T20:45:31Z

Looks good and simple, can't find disadvantages

uwiger · 2016-04-03T07:45:37Z

Describing it to my son, I came up with some failure scenarios that seem very hard to address.

Basically, the approach is inherently vulnerable to the calling process being murdered while running inside the claim. Starting from the (correct) assumption that the process could be killed after any given call, at least the following scenarios, however unlikely, come to mind:

Line 509, just after getting the lock, but before spawning the monitoring process. A solution could be to start monitoring before trying the lock, but this seems to forfeit the purpose of the approach.
Line 551, if the calling process is killed just after killing the monitoring process, but before resetting the counter, the counter will never be reset.
If lines 551 & 552 are switched, and the calling process resets the counter then dies before killing the monitoring process, the monitor will reset the counter again, possibly after another process has claimed it.

Considering the above, one could either conclude that the claim() operation is broken and shouldn't be used, or decide to live with the probability that it sometimes breaks, and then decide which types of breakage are least desirable (I'd wager that the counter never getting reset is the most serious).

Alternatively, figure out a different approach that's safe and not so much slower that the function becomes unfit for its purpose.

uwiger · 2016-04-03T07:53:04Z

(This would be a good time to play around with QuickCheck/Pulse ...)

uwiger · 2016-04-03T19:28:09Z

Since it certainly is an improvement over the previous implementation, I decided to merge. We can try to eliminate the (very slight) holes in the algorithm later.

netDalek · 2016-04-04T09:14:09Z

Line 551, if the calling process is killed just after killing the monitoring process, but before resetting the counter, the counter will never be reset.

May be it's better in place of kill + reset_counter send a message to spawned process.

execute_claim(F, K, Pid) ->
     Parent = self(),
     Mon = spawn(
             fun() ->
                     Ref = erlang:monitor(process, Parent),
                     receive
                         {'DOWN', Ref, _, _, _} ->
                             gproc:reset_counter(K)
                         exit ->
                             gproc:reset_counter(K)
                     end
             end),
     try begin
             Res = F(K, Pid),
             {true, Res}
         end
     after
         Pid ! exit
     end.

uwiger · 2016-04-04T09:18:04Z

I thought about that, but it complicates things, is not inherently safe, and also introduces significant latency (assuming the monitoring process always does the reset).

It would probably at least be a good idea to switch lines 551 and 552, though. I forgot to do that.

uwiger mentioned this issue Apr 2, 2016

release claim even if claimant killed #110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gproc_pool:claim doesn't return worker when calling process is killed #109

gproc_pool:claim doesn't return worker when calling process is killed #109

netDalek commented Apr 1, 2016

uwiger commented Apr 2, 2016

netDalek commented Apr 2, 2016

uwiger commented Apr 3, 2016

uwiger commented Apr 3, 2016

uwiger commented Apr 3, 2016

netDalek commented Apr 4, 2016

uwiger commented Apr 4, 2016

gproc_pool:claim doesn't return worker when calling process is killed #109

gproc_pool:claim doesn't return worker when calling process is killed #109

Comments

netDalek commented Apr 1, 2016

uwiger commented Apr 2, 2016

netDalek commented Apr 2, 2016

uwiger commented Apr 3, 2016

uwiger commented Apr 3, 2016

uwiger commented Apr 3, 2016

netDalek commented Apr 4, 2016

uwiger commented Apr 4, 2016